I had the opportunity to present alongside Jeremy Wildfire of Gilead Sciences, where our presentation earned Best in Stream honors. Our presentation focused on something the open-source analytics community has needed for a while: a workflow engine simple enough to actually use, and scalable enough to matter. The package is called {workr}, and I want to walk you through why we built it and what it can do for clinical R teams.
The Problem With Custom Scripts in Clinical R Programming
As more clinical programming teams adopt R and pharmaverse tools, a familiar problem of scripts that work but don’t scale appears. A team managing 30 active studies, running monthly snapshots, with 15 metrics and 5 pipeline steps, is looking at roughly 27,000 metrics per year. Custom study scripts (each slightly different and manually coordinated) can’t sustain that.
What we needed was a pipeline architecture that holds together at scale without needing a software engineering team to maintain it.
What {workr} Is
{workr} is a YAML-driven R workflow engine. Workflows are defined in plain YAML files with three sections:
- Meta — descriptive metadata and configuration
- Spec — required data inputs and their structure
- Steps — the execution sequence, where each step is a named function call with its parameters
At runtime, RunWorkflow() parses those step definitions and executes them sequentially. This reproduces what a pipe-based R script would do, but in a declarative, human-readable format that’s easy to audit, debug, and reuse.
The mental model is intentionally minimal. Each step is a function call. Each output from one step is available to the next. The entire workflow is a YAML file that can be read, version-controlled, and reviewed like any other document.
Minimal mental model, easy to read and debug, and surprisingly scalable.
Connecting the Full Open-Source Analytics Pipeline: {workr} and the pharmaverse
In the PHUSE presentation, we demonstrated {workr} as an orchestration layer across the full clinical reporting pipeline (raw eCRF data to final deliverables) using standard pharmaverse packages at each stage:
- Raw to SDTM: {workr} drives {sdtm.oak} workflows for SDTM transformations, with each YAML step equivalent to a pipe-based function call in standard R programming
- SDTM to ADaM: SDTM domains feed directly into {admiral} derivations through the same workflow engine; functions like derive_vars_merged() and derive_param_map() run as named steps, keeping derivation logic transparent and reproducible
- ADaM to TFLs: ADaM datasets feed into {gtsummary} for tabular summaries standard visualization libraries like {ggplot2} for graphical outputs, with rendered Rmarkdown documents assembled using a modular child-document approach (a “LEGO® set” style that lets teams pick and combine tables and figures for different deliverable formats)
- ADaM to ARS: {cards} extends the pipeline to CDISC Analysis Results Data (ARD), enabling early adoption of the emerging ARS standard within existing, reproducible workflows
Each package preserved its modular design throughout. {workr} provided the connective tissue, centralizing execution, audit trails, and reproducibility at every stage.
GxP Readiness and Audit Trails
We designed {workr} with regulated environments in mind. Qualification is managed via {qcthat}, which generates qualification reports from the test suite. A {workr}-driven action log allows subject matter experts to assess and address detected risks. The framework is currently in use on 20+ active studies at Gilead. For teams working within a validated statistical computing environment like Ageirein, {workr} integrates directly into the GxP-compliant pipeline already in place.
What This Means for Clinical R Teams
The promise of open-source analytics in clinical programming has always been capability combined with reproducibility. What’s held teams back is the infrastructure to connect them reliably.
{workr} addresses that gap without requiring a complex build system or deep engineering expertise. It also positions teams well for the shift to agentic AI, where breaking complex tasks into modular, executable steps is exactly what AI-driven workflows require.
The {workr} package is open-source and available at gilead-biostats.github.io/workr/. For clinical analytics teams exploring how to build this kind of pipeline within a GxP-validated environment, Ageirein provides the infrastructure that makes it deployable. And it’s where I’d point you to get started.
Frequently Asked Questions
About the Author

Zelos Zhu is an R package developer focused on modernizing statistical programming for clinical trials. He is a core contributor to open-source pharmaverse tools, including admiral, with additional experience across other popular packages such as gtsummary, cards, and cardx.
His work centers on reproducible clinical trial workflows, CDISC SDTM/ADaM standards, R package development, and scalable reporting infrastructure. Zelos has also worked extensively with the gsm ecosystem, a suite of packages that provide the analytical foundation for a standardized Risk Based Quality Monitoring (RBQRM) framework for clinical trials.
References
1 Zhu, Z. and Wildfire, J. (2026). Harmonizing gsm and the Pharmaverse: Orchestrating SDTM, ADaM, TFLs, and ARS in a Unified R Workflow. Paper OS17, PHUSE US Connect 2026. Best in Stream Award recipient. https://gilead-biostats.github.io/workr/dev/slides/#/title-slide
2 Zhu, Z. and Wildfire, J. (2026). {workr}: a very simple R data pipeline. PHUSE Connect 2026. gilead-biostats.github.io/workr/