It’s been over a year now since we released Tplyr 1.1.0. In the version 1.0.0 we marked the package as stable, with some of the core features we intended to build into Tplyr realized. Version 1.1.0 was also a large addition, with new string formatting capabilities added, like parenthesis hugging and conditional formats. For version 1.2.1, our core focus was to clean up the code base, resolve some long persisting issues, and add a couple new features along the way. In the process, this turned out to be a relatively large release that truly improves Tplyr as a package.

New Features

We added a handful of new functions that add some core functionality that Tplyr was missing, and additionally help with some post-processing to make presenting a table easier.

  • add_missing_subjects_row() allows you to create a row that counts the number of subjects who are missing data within a particular group. This different from set_missing_count(), which looks for values present in the data that have values which are specified as missing in some way.
    • This new feature also impacted how Tplyr handles the result metadata. add_missing_subjects_row() is unique in that the results presented actually refer to subjects in the population data and not the target dataset. As such, the metadata need to be able to provide instruction of how to retrieve that information. This is done using a new field called anti_join. The anti-join provides an instruction of the metadata to extract from the population data, and then a merge variable to use when conducting the anti-join to grab the correct records from the population data. You can read about this within the Metadata vignette.
    • Additionally, to support independent creation of similar metadata, we have created the add_anti_join() function.
  • set_limit_data_by() allows you to specify by variables that should be limited to results present within the input data, rather than potential combinations of by variables. Tplyr will by default offer a cartesian join of combinations of values present within by variables. Using this function, you can limit those combinations by your own specified grouping.
  • collapse_row_labels() is a new post processing function that consolidates chosen row labels into a single column. This is similar to the behavior of set_nest_count(), but inserts blank result rows for the specified row labels. This is a convenient helper to convert Tplyr tables into a common format used to consolidate space in wide tables.
  • replace_leading_whitespace() is another new post-processing function that’s helpful when working with HTML tables. HTML likes to strip leading whitespace from text, in many cases that indentation is very necessary. This function will replace that leading whitespace (including tabs to a pre-specified width) to the HTML   string, allowing for proper presentation within HTML outputs.

Another helpful update that we made for this release was the addition mock data into Tplyr itself. With this included, all vignette examples are directly executable within your own personal installation, without referring to any external sources.

Bug fixes

Just as significant as the features added for this release are the budges we fixed. We tackled some long persisting issues.

  • Within nested count layers, Tplyr will now properly handle when an inner layer values exists in multiple outer layer values. This additionally eliminated a restriction that was in place to not allow the outer layer to have more levels than the inner layer.
  • Inf and -Inf results produced by min and max will not automatically be handled as if they were NAs.
  • Sorting for nested count layers has been corrected for when by variables are in use.
  • The handling of namespace scoping has been fixed, which helps is very specific situations where Tplyr is executing within a non-global environment and user specified functions are being passed in. Quite a specific scenario, but this is a very helpful improvement (not to mention I gave up on it two or three times before actually figuring it out).

Lastly, we did some code clean up in this release that tosses out some dead code and things deemed unnecessary. This helps improve the overall code coverage of the package.

Back to Blog