Clinical Trial Automation Roadmap

Why is it so hard to automate the analysis of a regulated clinical trial?

Sure, you might be able to setup an EDC in the blink of an eye, but lets be honest – it still takes a team of people years of effort to analyse the results and produce the tables, figures and listings (TFL) that go into the Clinical Study Report.

You’ve done a proof-of-concept or two, maybe piloted automation of SDTM datasets, and have bought into the end-to-end automation vision.. but how do you get there from here? And why does it seem so hard?

Automation is the tip of the iceberg

The first thing to recognize is that automation is not standalone. To achieve end-to-end automation in a production environment means that all the necessary metadata is available at each step of the process; the key people have the tools and skills to curate and manage that metadata; and they need a robust and validated software platform that meets regulatory submission and GCP quality requirements.

Automation requires a foundation of people, skills and tools

Ok, so lets looks as each of the areas than are needed to support successful automation of analysis results.

Don’t build a horse-less carriage

Before Henry Ford started building the Model T using a production line, cars were build in the same was as horse-drawn carriages has been – by a small team of highly skilled people that worked on the product from beginning to end.

Sound familiar?

Change and disruption caused by the introduction of automation is well understood in manufacturing, however it is a new phenomenon in knowledge-based industries.

Don’t re-invent the wheel either

The software industry pioneered Model-Driven Development (MDD) twenty years ago, and there now exist ISO Standards for Model-Driven Architecture, methodologies and tools that have been used successfully in regulated, safety-critical industries such as avionics, space, energy, etc.

Adoption of Model Driven Development can be mapped out using the Capability Maturity Model (see: MDD Maturity Model)

1Ad-HocAnalyses are not model-driven
2BasicBasic use of models in organization.
3InitialThe organization starts developing systems
more according to model model-driven
4IntegratedModels at different abstraction levels are built and they are fully integrated within a
comprehensive modelling framework
5UltimateThe role of coding will disappear and the
transformations between models are

So, while most organizations will be starting at level 1 (i.e. Analyses are created manually), the question is: What level of adoptions do you aspire to? and over what timescale?

Then you can plan how to transition to level 2 – i.e. basic metadata-driven automation running end-to-end through the analysis.

Model-driven architecture

Possibly the clearest way of describing the architecture required to support a fully automated analysis using a Four layered metamodel architecture:

M3Meta-metamodelMetadata standards, e.g. W3C Semantic Web, XML, UML, ISO11179, etc.
M2MetamodelClinical Standards, e.g. CDISC, MEdDRA, SNOMED, etc.
M1ModelTrial-specific Metadata
M0DataClinical, Rando, Labs Data, etc.

Each level of the Metadata Architecture requires its own set of tools and processes, and people with the skillset to be able to work at the relevant level of abstraction.

Taking the first step

While a full-blown end-to-end level-5 CMM Ultimate Meta-Model Architecture may remain a powerpoint vision, there is more than enough work involved in getting from level-1 to level-2.

A full list is clearly out of the scope of this blog post(!), however, some things to consider include:

  • Statisticians will need to be able to create a SAP in a machine-readable format. What tools will they use? who will train and support them?
  • Programmers will need to be able to work with technologies such as XML, RDF,/OWL, Relational and Graph databases.. and probably using languages other than SAS
  • How is study-specific data handled? How do you work around partial and dirty data?
  • Software infrastructure to support version control, continuous integration/deployment, metadata repositories
  • How will validation be done without ‘mainline’ and ‘qc’ programs?
  • Business processes will be needed to support a software development and DevOps
  • How will Industry Standards (e.g. CDISC) be managed? What about Corporate and Study/Drug/Disease-Specific standards? What happens when a new version of a standard is issued?

None of these issues are insurmountable, but equally they add up to more than an quick fix.


The promise of end-to-end automated analysis of clinical trials is not just that it will be faster and easier, but also the new and innovative applications that will be unlocked (CMM Level 3 and beyond).

Key points to consider in planning an automation strategy are:

  • Automation requires metadata, and that means new tools, skills and business processes.
  • This is mostly a solved problem, but will require learning lessons from other industries
  • While implementing end-to-end automation requires work, ultimately is means that you spend less time on grunt-work and more time on activities that add value and ultimately help patients!

Intelligent clinical trials

Transforming through AI-enabled engagement

This Deloitte Insights report, published in Feb 2020, examines the AI technologies in Clinical Trials and is the third in a series, the first is an overview of AI in biopharma and the second is on AI in drug discovery.

If you haven’t read the Intelligent Clinical Trials report then don’t worry! This article aims to provide the key points.

AI has the potential to transform key steps of clinical trial design from study preparation to execution towards improving trial success rates, thus lowering the pharma R&D burden.

Artificial Intelligence for Clinical Trial Design

Main application areas

  1. Protocol design
  2. Patient selection
  3. Recruitment and retention

..Where the use of Real World Evidence (RWE) is used to enrich trial-specific data to optimise patient search, recruitment and retention.

The use of real-world data brings challenges including data interoperability and adoption of open and secure platforms, and consumer-driven care.

FDA guidance to industry

The FDA have published guidance for industry entitled “Enrichment Strategies for Clinical Trials to Support Demonstration of Effectiveness of Human Drugs and Biological Products.” The purpose of this guidance is to assist industry in developing enrichment strategies that can be used in clinical trials.

Clinical trials of the future

FDA is already planning for a future in which more than half of all clinical trial data will come from computer simulations.

This is a future where phase 1 trials are done in-silico i.e. using a simulation of a human body, and phase II/III trials become remote decentralised clinical trials (RDCT) which use AI-enabled technologies to allow bigger, more diverse and remote populations to participate – as envisioned by the European Innovative Medicines Initiative Trials@Home project, launched in December 2019.

Real-world data (RWD)

In April 2020, Apple and Google announced a partnership to enable interoperability between Android and iOS devices using apps from public health authorities.

In the coming months, Apple and Google will work to enable a broader Bluetooth-based contact tracing platform by building this functionality into the underlying platforms.

This is an indication of the scale of real-time, real-world data that will be available to enrich regulated clinical trials in the future.


  • The benefits of AI for Clinical Trials centre around the optimisation of protocols, patient recruitment and retention
  • This change will involve enriching clinical data with new sources of real-world data (RWD)
  • Phase I trials will be run as in-silico computer simulations, and phase II/III will become virtual, decentralised clinical trials
  • Regulatory authorities have already started publishing guidance to industry.
  • For the next few years, RCT’s are likely to remain the gold standard for validating the efficacy and safety of new compounds in large populations.