PHUSE Automation SDE

Enabling end-to-end Automation in Analysis and Reporting

PHUSE US Single Day Event 21-MAY-2021


The PHUSE US Spring 2021 SDE was an interesting cross-section of perspectives on automation of Analysis and Reporting. My takeaway messages were:

  • End-to-end machine readable standards are still a work-in-progress, and there is not a clear path to standardisation around e-Protocol and e-SAP
  • CDISC aim to make it simpler to implement software that automates standards (rather than publishing standards as ‘600-page pdf documents’); this is based on the proprietary CDISC Library
  • Key implementation challenges include: Managing change, Budgets (it takes longer than you think) and Silo business processes
  • PHUSE have a mature set of safety reporting deliverables, including SAP definitions, statistical methods and visualizations. This can provide a solid basis for automation projects.

DISCLAIMER: I missed one session by Farha Feroze (Symbiance) on Automated TFL Moch Shell Generation using AI Techniques (ML/NL), so is not included in this report!

Future State

The day was (excellently!) chaired by Bhavin Busa, and Gurubaran Veeravel, who kicked-off with reference to the CDISC 360 ‘Future State – Analysis and Reporting’, noting that “we are not there yet!”

Future state analysis and reporting

This is a future-state vision that is based on automating the current reporting pipeline. Currently ADaM and TFL programs are written manually.

CDISC Standards

Anthony Chow presented an overview of the CDISC Library, and current/future projects related to Analysis & Reporting

CDISC aim to make it simpler to build software that automates standards-based processes.

  • The CDISC Library provides an application programming interface (API) to ‘normative metadata’ on all CDISC standards
  • CDISC is working with open source projects on tools to access CDISC Library
  • CDISC have Analysis & Results projects ongoing and about to start recruiting members: MACE+ project, Safety User Guide, and Analysis Results (ARM) Standard development.

Path to Automation

Andre Couturier & Dante Di Tommaso (Sanofi) gave an insightful presentation on the challenges facing when implementing end-to-end automation (for a start.. don’t call it automation!)

  • It is key to communicate a clear vision and ‘sell upwards’
  • Key challenges include change management, Silo business processes, and budget (it takes longer than you think!)
  • Decide whether existing processes are ‘Deterministic’ or ‘Intelligent’ – Deterministic processes are candidates for automation, and Intelligent processes can be ‘facilitated’
  • Sanofi use the acronym MAP (Metadata Assisted Programming) to describe the change in approach

Safety Reporting

Mary Nilsson (Eli Lilly) provided a comprehensive overview of the work that PHUSE has done on Safety Reporting. A comprehensive set of deliverables are available

  • The two working groups are: Standard Analyses and Code Sharing (pre-2020) and also Safety Analytics (post-2020)
  • In addition there are training videos covering pooling data, safety analytics and for integrated reporting
  • The deliverables include SAP text, statistical methods, and visualisations
  • This is a body of knowledge that can provide a solid set of ‘source documentation’ for automation projects

Traceability and dependency

Gurubaran Veeravel walked-through how Merck perform impact analysis when standards change – specifically to determine program dependencies and variables used.

R demo – TFL Generation

Jem Chang & Vikram Karasala (AstraZeneca) presented how R programs can create RTF outputs with the same layout as existing (SAS) outputs. Alternative/existing R packages are available, and the pros/cons of each were discussed.


What is SDMX and why use it for clinical trials?

SDMX is a comprehensive, domain-neutral, ISO standard for Statistical Data and Metadata exchange, first released in 2004.

Let’s clear-up one misconception straight away – although SDMX stands for Statistical Data and Metadata eXchange – you should not let the “eXchange” part fool you into thinking that this is simply a file format – it is so much more!

The SDMX standard provides:

  • Technical standards (including the Information Model)
  • Statistical guidelines
  • an IT architecture and tools

Taken together, the technical standards, the statistical guidelines and the IT architecture and tools can support improved business processes for any statistical organisation as well as the harmonisation and standardisation of statistical metadata.

Domain neutral

Although SDMX was established by international banking and government organisations, the information model is domain neutral, and because it is based on W3C Semantic Web standards, it aligns with the CDISC vision of linked data and biomedical concepts.

SDMX defines a vocabulary for describing Statistical data using W3C Data Cube which and so all domain-specific metadata is described using OWL ontologies – if this is new to you, then have a look at – The world’s most comprehensive repository of biomedical ontologies!

For this reason alone, SDMX provides a pathway to the CDISC vision of clinical trials analyses based on linked-data and biomedical concepts.


In addition to the information model, SDMX contains statistical guidelines which cover the collection, processing, analysis and reporting of statistical data across organisations and are based on the Generic Statistical Business Process Model (GSPBM)

The statistical guidelines aim at providing general statistical governance as well as common (“cross-domain”) concepts and code lists, a common classification of statistical domains and a common terminology.

Clinical trials typically involve data exchange across a network of Sponsors, Regulators, Vendors, Labs, CRO’s, etc. Each with different roles as data produces and consumers, different agreements on who can access what data when. These are all scenarios covered by the SDMX Statistical Guidelines and GSBPM.

Metadata repository

SDMX provides the specification for the logical registry interfaces, including subscription/notification, registration of data and metadata, submission of structural metadata, and querying, which are accessed using either REST or SOAP interfaces.

Metadata Repositories (MDR) are at early-stage adoption within Clinical Trials, so a key benefit of a standard interface is that it allows a period of experimentation/evolution on how the MDR is implemented with limited impact the rest of your analytics platform.

File interchange format

Well, yes! SDMX does also include file format standards for the exchange of data – XML, CSV, JSON are probably the key ones for use in clinical trials, allowing data transfer between languages and systems with minimal change.

Validation and Transformation Language

SDMX also includes a fully specified Validation and Transformation Language (VTL) which allows statisticians and data managers to express logical validation rules and transformations on data can be converted into specific programming languages for execution (SAS, R, Java, SQL, etc.)

Although the VTL language originated under the governance of SDMX, it was recognised that other communities could benefit and so VTL was designed to be usable in SDMX, DDI and GSIM


Why consider SDMX for use in clinical trials?

In short, because it’s a comprehensive, established standard which can be applied to any Statistical domain, and it ‘plays nicely’ with many other standards.

But what problem will it help solve? Typical use-cases might include:

  • Creating vizualisations for Blind Review that take place before ADaM and TFL programming is completed,
  • Implementation of CDISC standards using linked-data and biomedical concepts,
  • Improved governance of data transfers between Sponsors, Regulators, Vendors and CRO’s
  • Validation of open-source technologies and new languages such as R, Python or Julia,
  • A standards standards-based Metadata Repository (MDR) interface that can remain constant from pilot through deployment regardless of implementation technology or vendor.

Even if you do not include SDMX, there are certainly parts that are worth consideration.

Useful Links