{workr}: Simple Open-Source Analytics Pipeline for R

By Zelos Zhu, Data Solutions Engineer

May 5, 2026

I had the opportunity to present alongside Jeremy Wildfire of Gilead Sciences, where our presentation earned Best in Stream honors. Our presentation focused on something the open-source analytics community has needed for a while: a workflow engine simple enough to actually use, and scalable enough to matter. The package is called {workr}, and I want to walk you through why we built it and what it can do for clinical R teams.

The Problem With Custom Scripts in Clinical R Programming

As more clinical programming teams adopt R and pharmaverse tools, a familiar problem of scripts that work but don’t scale appears. A team managing 30 active studies, running monthly snapshots, with 15 metrics and 5 pipeline steps, is looking at roughly 27,000 metrics per year. Custom study scripts (each slightly different and manually coordinated) can’t sustain that.

What we needed was a pipeline architecture that holds together at scale without needing a software engineering team to maintain it.

What {workr} Is

{workr} is a YAML-driven R workflow engine. Workflows are defined in plain YAML files with three sections:

Meta — descriptive metadata and configuration
Spec — required data inputs and their structure
Steps — the execution sequence, where each step is a named function call with its parameters

At runtime, RunWorkflow() parses those step definitions and executes them sequentially. This reproduces what a pipe-based R script would do, but in a declarative, human-readable format that’s easy to audit, debug, and reuse.

The mental model is intentionally minimal. Each step is a function call. Each output from one step is available to the next. The entire workflow is a YAML file that can be read, version-controlled, and reviewed like any other document.

Minimal mental model, easy to read and debug, and surprisingly scalable.

Connecting the Full Open-Source Analytics Pipeline: {workr} and the pharmaverse

In the PHUSE presentation, we demonstrated {workr} as an orchestration layer across the full clinical reporting pipeline (raw eCRF data to final deliverables) using standard pharmaverse packages at each stage:

Raw to SDTM: {workr} drives {sdtm.oak} workflows for SDTM transformations, with each YAML step equivalent to a pipe-based function call in standard R programming
SDTM to ADaM: SDTM domains feed directly into {admiral} derivations through the same workflow engine; functions like derive_vars_merged() and derive_param_map() run as named steps, keeping derivation logic transparent and reproducible
ADaM to TFLs: ADaM datasets feed into {gtsummary} for tabular summaries standard visualization libraries like {ggplot2} for graphical outputs, with rendered Rmarkdown documents assembled using a modular child-document approach (a “LEGO^® set” style that lets teams pick and combine tables and figures for different deliverable formats)
ADaM to ARS: {cards} extends the pipeline to CDISC Analysis Results Data (ARD), enabling early adoption of the emerging ARS standard within existing, reproducible workflows

Each package preserved its modular design throughout. {workr} provided the connective tissue, centralizing execution, audit trails, and reproducibility at every stage.

GxP Readiness and Audit Trails

We designed {workr} with regulated environments in mind. Qualification is managed via {qcthat}, which generates qualification reports from the test suite. A {workr}-driven action log allows subject matter experts to assess and address detected risks. The framework is currently in use on 20+ active studies at Gilead. For teams working within a validated statistical computing environment like Ageirein, {workr} integrates directly into the GxP-compliant pipeline already in place.

What This Means for Clinical R Teams

The promise of open-source analytics in clinical programming has always been capability combined with reproducibility. What’s held teams back is the infrastructure to connect them reliably.

{workr} addresses that gap without requiring a complex build system or deep engineering expertise. It also positions teams well for the shift to agentic AI, where breaking complex tasks into modular, executable steps is exactly what AI-driven workflows require.

The {workr} package is open-source and available at gilead-biostats.github.io/workr/. For clinical analytics teams exploring how to build this kind of pipeline within a GxP-validated environment, Ageirein provides the infrastructure that makes it deployable. And it’s where I’d point you to get started.

Frequently Asked Questions

{workr} is an open-source R package that provides a simple YAML-driven workflow engine for building reproducible, modular data pipelines. Workflows are defined as YAML files with metadata, input specifications, and sequential function-call steps, then executed via RunWorkflow().

{workr} serves as an orchestration layer connecting pharmaverse packages across the clinical reporting pipeline. It can sequence {sdtm.oak} for SDTM, {admiral} for ADaM, {gtsummary} and your team’s preferred visualization libraries for TFLs, and {cards} for ARS-aligned outputs — all within a single, traceable workflow.

Yes. {workr} was designed with GxP use in mind and is currently deployed on 20+ clinical studies. Qualification is supported via {qcthat}, and a {workr}-driven action log enables risk assessment and documentation. It integrates with validated statistical computing environments.

{workr} prioritizes simplicity and clinical usability over maximum capability. Its YAML-based design requires minimal programming knowledge to read and debug, making it accessible to the statisticians and programmers who need to maintain and audit pipelines rather than build them from scratch.

ARS is an evolving CDISC standard for representing clinical analysis results in a structured, machine-readable format. The {cards} R package supports ARS-aligned Analysis Results Data (ARD) creation, and the {workr} framework can integrate {cards} workflows into the same reproducible pipeline that produces SDTM, ADaM, and TFL outputs.

About the Author

Zelos Zhu is an R package developer focused on modernizing statistical programming for clinical trials. He is a core contributor to open-source pharmaverse tools, including admiral, with additional experience across other popular packages such as gtsummary, cards, and cardx.

His work centers on reproducible clinical trial workflows, CDISC SDTM/ADaM standards, R package development, and scalable reporting infrastructure. Zelos has also worked extensively with the gsm ecosystem, a suite of packages that provide the analytical foundation for a standardized Risk Based Quality Monitoring (RBQRM) framework for clinical trials.

References

¹ Zhu, Z. and Wildfire, J. (2026). Harmonizing gsm and the Pharmaverse: Orchestrating SDTM, ADaM, TFLs, and ARS in a Unified R Workflow. Paper OS17, PHUSE US Connect 2026. Best in Stream Award recipient. https://gilead-biostats.github.io/workr/dev/slides/#/title-slide

² Zhu, Z. and Wildfire, J. (2026). {workr}: a very simple R data pipeline. PHUSE Connect 2026. gilead-biostats.github.io/workr/

Back to Blog

A Simple Open-Source Analytics Pipeline for Clinical R Teams: Introducing {workr}

The Problem With Custom Scripts in Clinical R Programming

What {workr} Is

Connecting the Full Open-Source Analytics Pipeline: {workr} and the pharmaverse

GxP Readiness and Audit Trails

What This Means for Clinical R Teams

Frequently Asked Questions

Make Your Data Insightful

A Simple Open-Source Analytics Pipeline for Clinical R Teams: Introducing {workr}

The Problem With Custom Scripts in Clinical R Programming

What {workr} Is

Connecting the Full Open-Source Analytics Pipeline: {workr} and the pharmaverse

GxP Readiness and Audit Trails

What This Means for Clinical R Teams

Frequently Asked Questions

Shiny App Development. Your App Works, So Why Won’t Anyone Use It?

Is Your Clinical Data Really AI-Ready Data? Start With Your Process, Not Your Data

Make Your Data Insightful