Back in October, we had the version 1.0.0 release of Tplyr, which brought to life some of the original vision of the package with the completion of the traceability framework. Now, welcoming 2023 and bringing in the new year, in our 1.1.0 release we’ve made a handful of other enhancements and new capabilities, primarily focused on Tplyr’s formatting capabilities.

Parenthesis Hugging Format

One of the highlights and important parts of Tplyr’s design has always been control around string formatting. Using our format strings, we’ve tried to make it as simple as possible for users to specify how they’d like a string to appear on the page. For example, consider this simple table:

tplyr_table(adsl, TRT01P) %>% 
  add_layer(
    group_count(RACE) %>% 
      set_format_strings(
        f_str("xx (xx.x%)", n, pct)
      )
  ) %>% 
  build() %>% 
  select(1:3)
#> # A tibble: 3 × 3
#>   row_label1                       var1_Placebo `var1_Xanomeline High Dose`
#>   <chr>                            <chr>        <chr>                      
#> 1 AMERICAN INDIAN OR ALASKA NATIVE " 0 ( 0.0%)" " 1 ( 1.2%)"               
#> 2 BLACK OR AFRICAN AMERICAN        " 8 ( 9.3%)" " 9 (10.7%)"               
#> 3 WHITE                            "78 (90.7%)" "74 (88.1%)"

While the formatting keeps everything aligned, depending on preference the fact that the opening parenthesis of each result stays fixed may be an issue. From the table above, you may want the string " 1 ( 1.2%)" to instead appear as " 1 (1.2%)", keeping the parenthesis bound to the integer within the percent field. Up until this release, resolving this natively in Tplyr wasn’t possible – but now we’ve added a new capability we’re calling “parenthesis hugging”.

Triggering parenthesis hugging is still controlled directly within format strings. To trigger this, you use a capital X instead of a lowercase x in the portion of the string where you’d like the parenthesis to hug. To fix the example above:

tplyr_table(adsl, TRT01P) %>% 
  add_layer(
    group_count(RACE) %>% 
      set_format_strings(
        f_str("xx (XX.x%)", n, pct)
      )
  ) %>% 
  build() %>% 
  select(1:3)
#> # A tibble: 3 × 3
#>   row_label1                       var1_Placebo `var1_Xanomeline High Dose`
#>   <chr>                            <chr>        <chr>                      
#> 1 AMERICAN INDIAN OR ALASKA NATIVE " 0  (0.0%)" " 1  (1.2%)"               
#> 2 BLACK OR AFRICAN AMERICAN        " 8  (9.3%)" " 9 (10.7%)"               
#> 3 WHITE                            "78 (90.7%)" "74 (88.1%)"

Another benefit of format strings within Tplyr is auto-precision, where instead of x in the format string you use a and Tplyr will automatically figure out the necessary width for the integer or decimal field precision. The same concept works here, where you use A instead of a.

tplyr_table(adae, TRTA) %>% 
  set_pop_data(adsl) %>% 
  set_pop_treat_var(TRT01A) %>% 
  add_layer(
    group_count(AEDECOD) %>% 
      set_format_strings(f_str("a (XX.x%) [A]", distinct_n, distinct_pct, n)) %>% 
      set_distinct_by(USUBJID)
  ) %>% 
  build() %>% 
  select(1:3)
#> # A tibble: 21 × 3
#>    row_label1         var1_Placebo      `var1_Xanomeline High Dose`
#>    <chr>              <chr>             <chr>                      
#>  1 ACTINIC KERATOSIS  " 0  (0.0%)  [0]" " 1  (1.2%)  [1]"          
#>  2 ALOPECIA           " 1  (1.2%)  [1]" " 0  (0.0%)  [0]"          
#>  3 BLISTER            " 0  (0.0%)  [0]" " 1  (1.2%)  [2]"          
#>  4 COLD SWEAT         " 1  (1.2%)  [3]" " 0  (0.0%)  [0]"          
#>  5 DERMATITIS ATOPIC  " 1  (1.2%)  [1]" " 0  (0.0%)  [0]"          
#>  6 DERMATITIS CONTACT " 0  (0.0%)  [0]" " 0  (0.0%)  [0]"          
#>  7 DRUG ERUPTION      " 1  (1.2%)  [1]" " 0  (0.0%)  [0]"          
#>  8 ERYTHEMA           " 9 (10.5%) [13]" "14 (16.7%) [22]"          
#>  9 HYPERHIDROSIS      " 2  (2.3%)  [2]" " 8  (9.5%) [10]"          
#> 10 PRURITUS           " 8  (9.3%) [11]" "26 (31.0%) [38]"          
#> # … with 11 more rows

Given the amount of control and detail that’s packed into format strings and Tplyr’s general formatting capabilities, so we’ve additionally beefed up our documentation. To read format strings and general formatting in Tplyr, check out our new General String Formatting vignette.

Conditional Formats

Another new capability we’ve added in 1.1.0 is the ability to specify conditional formats. Examples of conditional formats may be not reporting a percentage fields in the n count of an event was 0, or if the incidence of an adverse event was <5%, displaying percents as (<5%) instead of the actual percent value. The new function apply_conditional_format() adds this capability by letting you update results conditional on values within a “format group”.

Within a result string, multiple numbers may be present, and the portions of the string representing these numbers are referred to as “format groups”. So in the result 8 (9.3%), there are two format groups. The value of the first is 8. The value of the second is (9.3%). In apply_conditional_format(), you can update the value of the result based on the value within those individual format groups. For example:

string <- c(" 0  (0.0%)", " 8  (9.3%)", "78 (90.7%)")

apply_conditional_format(string, 2, x == 0, " 0        ", full_string=TRUE)
#> [1] " 0        " " 8  (9.3%)" "78 (90.7%)"

apply_conditional_format(string, 2, x < 1, "(<1%)")
#> [1] " 0        " " 8  (9.3%)" "78 (90.7%)"

Note that apply_conditional_format() works as a post-processing function on the built data frame. Read more about it in the Post-processing vignette here.

Extracting Format Groups and Numbers

In addition to conditional formats, we’ve added two additional new functions that work with format groups. str_extract_fmt_group() let’s you pull out the string of an individual format group, and str_extract_num() allows you to pull out the numeric value.

Check out the Tplyr package website right here.

string <- c(" 0  (0.0%)", " 8  (9.3%)", "78 (90.7%)")

str_extract_fmt_group(string, 2)
#> [1] "(0.0%)"  "(9.3%)"  "(90.7%)"

str_extract_num(string, 2)
#> [1]  0.0  9.3 90.7

These functions have some interesting applications, such as splitting cells so an n and percent value can be represented in separate columns, or pulling out result values to make highly customized sorting sequences.

Other Notes

Within this release we’ve also generally tried to improve some documentation and reorganize portions of the package website, focusing on the reference page and our vignettes.

Back to Blog