I’m absolutely thrilled to announce that Tplyr v1.0.1 has officially made it to CRAN. Since releasing Tplyr back in August of 2020, this is the single biggest release that we had since the original, both in terms of lines of code edited and new features. We made the decision mark this as the version 1 release because Tplyr is officially what we intended it to be – a traceability minded grammar of data format and summary.
The biggest addition we added in version 1 is traceability metadata. The design of Tplyr has always made a conscious effort to maintain metadata about the data being summarized. For every Tplyr layer, we know the subset of data being summarized, what summaries are performed, grouping variables, and more.
In version 1, the goal was to make this data more accessible in a useful way by tying a result directly back to its source. Essentially, any result produced by Tplyr should be able to return the originating data from which that result was derived. Or even backing up a step – the instructions by which that data can be obtained.
A bit confusing? To put it into perspective, take a look at the minimal Shiny application right here. Click one of the result cells. The table that pops out below will contain the original input data to Tplyr that was used to derive the result.
An important part of developing the metadata functionality was that deriving the metadata should take no additional effort from the user. To build the metadata, we added a single parameter in the function
t <- tplyr_table(mtcars, gear, where = qsec > 16) %>% add_layer( group_count(cyl) ) %>% add_layer( group_desc(wt, where = mg >= 15) ) dat <- t %>% build(metadata=TRUE)
Read about the traceability metadata functionality in the vignette right here.
Read about how the traceability metadata can be expanded to results that are not produced by Tplyr, or results used outside of Tplyr itself in the vignette right here.
Another big addition in the version 1 release is the addition of layer templates. From the inception of Tplyr, we tried to try to eliminate the need for redundant code. Examples of this include the ability to set layer formats across all layer types for a Tplyr table, or setting layer defaults using Tplyr options. But admittedly, this did not solve the whole problem. String formatting is only one component of a layer. When you have several configuration options, the code required for a Tplyr table can still become quite long.
To address this, we created the concept of Tplyr layer templates. A layer template allows you to create and reuse a Tplyr layer of any type, with any number of configuration options set.
Layer templates are created using
Tplyr::new_layer_template(), like so:
new_layer_template( "example_template", group_count(...) %>% set_format_strings(f_str("xx (xx%)", n, pct)) )
Once you create the layer template, you can call it when creating a Tplyr table using
tplyr_table(mtcars, gear) %>% add_layer( use_template("example_template", cyl, by=carb) ) %>% build()
After calling a Tplyr layer template, the template is expandable using Tplyr modifier functions. Templates can also have additional variables passed into Tplyr modifier functions to make them more flexible.
You can read more about Tplyr templates in the vignette right here.
With Tplyr traceability metadata and layer templates as the largest additions in the new release, there are a number of new functions that we’re quite excited to introduce as well.
Tplyr descriptive statistics layers now allow you to present descriptive statistics as columns using
We externalized Tplyr’s string formatting capabilities so you can use the benefits provided by
Tplyr::f_str() objects more generally with the function
As a post processing function, we added the capability to apply hyphen-enabled break at word string wrapping in the new function
Tplyr::str_indent_wrap() (I swear I’ll finish this blog post eventually, but this function is admittedly the result of it).
We introduced a new experimental function
Tplyr::set_numeric_threshold() which allows you to filter results presented in count layers based on their values.
And lastly, there are a few more noteworthy updates to share as well.
- Tplyr is now fully compatible with the native pipe (
|>). Previously, this would break inside the context of
- Tplyr now automatically loads the magrittr pipe. This is admittedly far overdue…
- Denominators are now formattable values within count layers.
- Descriptive statistics now allow you to provide external precision data, which works within the same context as auto-precision. You can read about that right here.