R Package Validation Maintenance Lessons

By Mark Sellors and Ashley Tarasiewicz

December 18, 2024

As R gains wider acceptance in clinical programming many companies are tackling the task of R package validation. While there’s plenty of guidance on the initial validation of R packages, far less attention has been paid to the change management process required for long-term maintenance. Here, we’ll explore the successes, challenges, and lessons learned of maintaining validated packages in the ever-evolving open-source landscape.

INTRODUCTION

In this blog, we explore lessons learned from our OpenVal^® product development in the hopes that it will assist you and your organization in your own validation journey.

First, we need to quickly level set for readers who are unfamiliar with the idea of validation.

WHAT IS R PACKAGE VALIDATION?

Historically, proprietary programming languages were commonly used in clinical submissions and other related biostatistics areas. However, in recent years there has been increased interest in the adoption of open-source technologies for these tasks.

In proprietary language environments, there is a single source of truth — the vendor — for the language’s “correctness.” With the increasing adoption of open-source tooling, organizations are relying on multiple “products” (both the language and the add-on packages) from multiple sources (open-source developers) — all potentially using slightly different approaches and standards.

Open-source tooling, by nature, varies greatly in quality of code, amount of testing (performed by the authors), and the level of rigor applied to the development process. Despite these drawbacks, we can at least see the code, which allows us to inspect it for ourselves and judge its suitability for a particular task. This is a crucial advantage over proprietary systems: Too often, vendors must be taken at their word with precious little supporting evidence.

Validation is all about confidence. You need confidence that your tools will work the same way again and again — whether running on different computers, several years in the future, or in another research environment where results must be replicated.

Scientific computational work demands accurate results, and the processes used to obtain them must be repeatable. Rigor matters in science, and when working on tasks like a clinical submission, we must approach our work with due diligence.

Programming can often feel like a solo pursuit: just you, your programming knowledge, and your wits versus your data and the problem you’re trying to solve — or the insights you’re trying to extract. But this is a somewhat romantic and naïve vision.

In reality, even when you’re programming alone, you’re involved in a careful balancing act; it’s a team activity whether you realize it or not. Instead of trusted team members you’re already familiar with, your teammates are strangers. Unfortunately, you have no real idea whether they’re fit for the task at hand.

When you use any programming language or package — open source or commercial — you’re inviting unknown developers into your work and trusting that they’re up to the task.

The great thing about open source is that we can generally find out who those other programmers are, so we’re not flying totally blind. We can see their work, make judgements about its efficacy and fitness for purpose, and decide how to proceed from there.

Why do we validate? To gain confidence in these shadow contributions to our work. We document that validation so that a) it can be shared with QA and used in audits, and b) to prove that we took the necessary steps to ensure these contributions met our expectations, in terms of their methodological correctness.

INITIAL VALIDATION

So, what does our initial validation look like?

This is a challenge that has been discussed elsewhere at length. But in short, it means:

Setting up test framework and acceptance criteria for the packages you wish to validate
Running them through that process
Calling what comes out of the other end, “validated” (assuming things pass)

You must develop and adopt a reliable and repeatable means for accepting or rejecting a package into your validated set. This is generally done in conjunction with your organization’s Quality team to ensure that your validation approach meets the requirements of the wider business. It’s fundamentally about bringing rigor to the process of package selection.

In essence, you choose a methodology, define your validation criteria, test whether the packages meet the criteria, and document. There are obviously many nuanced considerations, but these are the broad strokes of the process.

LONG-TERM MAINTENANCE

Once your initial validation is complete, you’re into the long-term maintenance phase, where many validation projects fail.

Now that the march of the package validation has begun, it will never stop:

Package updates will keep coming. R will evolve. Operating systems will update or change. New packages will emerge. Old ones will be archived. Users will demand new tools. The list goes on.

It’s a never-ending stream of change, and we’ve found that many organizations are unprepared for the ongoing work of maintaining a validated package set.

Organizations often take on the challenge of providing a validated set of packages without realizing that it’s a continually evolving landscape, and that they must continually adapt along with it. This is not a project you do for a few months; rather, it’s a sustained effort that will continue for as long as you’re using open-source tools for GxP work.

While you can hopefully use your validated packages indefinitely, life in the R world never stands still. You’ll have to contend with updates and bug fixes, new packages, changes in maintainers, and so on. To handle them appropriately and consistently within the context of your validated set, you need a robust change management process.

You need to be prepared to respond to this evolving environment in each subsequent release. At this point, you’ve only just begun a process with no end. The field of data science and the tools we use do not stand still, and neither will you. Once you’ve completed your first validation, you must move your attention to the second.

However, it’s worth noting that the process of creating a second release can be quite different from the first.

CHANGE MANAGEMENT FOR VALIDATED R PACKAGES

Now that your first release is complete, we turn our attention to all subsequent releases.

We need robust protocols in place to manage the full package life cycle within our validated set. But how can we maintain the rigorous approach we developed in the initial validation throughout ongoing maintenance? It’s a challenge.

For example, for a package version update the risk level and testing need to be reevaluated to ensure our additional validation remains sufficient. Several methods exist to assess scope:

Package documentation
Naming conventions that indicate major/minor/patch updates
Tools like diffify
And so on

Effective testing, however, demands a comprehensive reevaluation of the package with each update to ensure appropriate testing. A package might get removed from CRAN, which potentially increases its risk, depending on the reason it was removed.

Interestingly, packages with higher download scores (more downloads within a year) could potentially be less risky.

When R is updated, you’ll need to ensure the packages work with the version of R you’re using and continually reevaluate as you adopt new versions. Some packages rely on certain R versions, and this has broken our testing in the past.

If your organization adopts a new operating system, you’ll need to verify that your validation will work equally well on that platform. You may find yourself supporting multiple operating systems with your validation suite.

Table 1. Package updates/changes to OpenVal releases over time.

Release	Description	Released	Packages				Snapshot Date	R
			Total	New	Updated	Removed
2023.09	Sept 2023 Major	2023-10-11	263	N/A	N/A	0	2023-07-31	4.2.1
2023.09.01	Dec 2023 Minor	2023-12-29	316	53	0	0	2023-07-31	4.2.1
2023.09.02	Jan 2024 Minor	2024-01-31	316	0	0	0	2023-07-31	4.2.1
2024.03	Mar 2024 Major	2024-03-29	355	39	117	0	2023-11-20	4.3.2
2024.03.00.01	Mar 2024 Patch	2024-05-06	355	0	0	0	2023-11-20	4.3.2
2024.03.01	June 2024 Minor	2024-07-01	391	36	1	0	2023-11-20	4.3.2
2024.09	Sept 2024 Major	2024-10-02	407	17	217	1	2024-06-22	4.4.1
2024.09.01	Dec 2024 Minor	Under active development					2024-06-22	4.4.1

The above table details the level of change that has occurred within the OpenVal validated R project at Atorus Research since its inception. As you can see, many new packages have been added, others updated, and most recently, one has been removed — all within a single year.

You must also consider how end users might be impacted by changes to your validated set. For example, there exists the potential to introduce a regression to company specific displays, so care must be taken to ensure that the validation effort does not unexpectedly impact the business, and that tasks that should be reproducible remain that way.

So, how should you communicate changes and updates to the wider business? What do programmers and statisticians need to know about function-specific changes? What should they investigate on their own? For OpenVal, it’s simple:

We provide the information on which packages are new, which have had version changes, which have been removed, etc. This is all part of the release notes.

Update frequency requires careful consideration. Open source evolves quickly, so updating less than once a year might mean missing out on significantly improved package versions. On the other hand, updating too frequently can be hard on project teams. In the case of OpenVal, we update the snapshot twice a year, allowing customers to opt-out if the frequency doesn’t suit their needs, with the idea that customers can ignore versions if that’s too often for them.

CONCLUSIONS

Maintenance of package validation is a huge time commitment. Those attempting to implement such a process must comprehend the full magnitude of the task they’re undertaking. But with good planning and sufficient resourcing, success is achievable.

The foundation of this process is both programmatic and human. You can write programs to evaluate the risk and use tools to compare package release notes and even package code. But the most important element remains human expertise — someone who opens the package code, examines it and unit tests, and decides if the changed functionality of the package needs a different testing approach for validation (more tests, different tests, etc.).

Programmatic approaches can tell you the scope; humans are the ones who must verify the scope AND do the work to meet that scope.

Multilayered validation is also crucial. For example, you must ensure you’re not relying on one individual to make these testing decisions in isolation. No one person should determine if package changes require new testing, as validation is rarely clear-cut. Consider implementing a process that includes:

At least one tester
At least one independent reviewer
Potentially a senior reviewer

Let’s say a release includes a total of 50 packages. One effective approach might be:

Assign testers to evaluate a subset of packages (e.g., testing 10)
Have reviewers examine a different subset (e.g., reviewing 10 they didn’t test)
Have a senior reviewer conduct a high-level review of all 50 packages to ensure consistency

This approach introduces multiple perspectives — beneficial for validation! — while the senior reviewer adds cohesion to what could otherwise be a highly variable process.

Only through a proper and full understanding of the challenges of producing and maintaining a validated package set can an organization achieve long-term success.

Or you can use OpenVal. The choice is yours.

R Package Validation Maintenance: We Didn’t Know What We Didn’t Know