2017-02-17

SDTM Data in RDF: Activities in Clinical Trials

PhUSE has approved a new project to evaluate and demonstrate the potential value of using RDF for SDTM data. It's called SDTM Data as RDF.  The project is headed by Tim Williams and myself and it will kick off at the upcoming PhUSE Computational Science Symposium just outside Washington DC.  As many of you know, I'm an advocate for using the RDF for study data. One of the goals of this new project is to develop a simple Study Ontology that, when combined with study data in RDF, can be used to generate high quality, highly standardized and valid SDTM datasets. If successful, it will address a major ongoing problem: high variability in SDTM implementation across studies and applications. 

To achieve that goal, we will develop a simple study ontology using OWL that will support SDTM dataset creation using standard SPARQL queries. It will leverage existing BRIDG classes as needed. We are starting with two domains: DM and VS and, if successful, the ontology will be extended to support other SDTM domains as well as non-standard data that currently wind up in SUPPQUAL or custom SDTM domains. If successful, the project outcome can provide a compelling reason to use RDF for study data today to solve a major SDTM implementation challenge that sponsors currently face. As the project progresses, I plan to discuss modeling challenges and how RDF/OWL can address them. Today I discuss Activities in clinical trials.   

A clinical trial is at its most fundamental construct a collection of activities and the rules that describe when those activities are performed. There are also rules that describe how those activities are grouped (e.g. into arms, visits, epochs, etc.) to facilitate study conduct and analysis.

Our mini Study Ontology divides study Activities into these subClasses:
  1. Observations -- symptoms, signs, tests, etc. that measure the physical, mental, or physiological state of a subject
  2. Analyses -- activities that take as input one or more Observations and generates analysis results
  3. Interventions -- activities that are performed on a subject with the usual intent of modifying or identifying a medical condition (e.g. drug administration, device implantation, surgery)
  4. Administrative Activities: e.g. informed consent, randomization, etc.
The Analysis class has a couple of interesting subclasses: 
  1. Assessment - this activity analyzes Observations and their results to identify (e.g. diagnose) and/or characterize (e.g. severity assessment) a Medical Condition.
  2. Rule - this activity analyzes Observations and their results to determine the start of another Activity. This includes eligibility criteria, which takes screening data and determines whether the subject advances to the next Activity, usually randomization. It also includes more generic start rules such as "take study medication within two hours of headache onset." 
All Activities have outcomes, so there is a class ActivityOutcome. For Observations, the outcome is the result. For Assessments, the outcome is the identification and characterization of a Medical Condition. Many tests in medicine combine Observation outcomes with Assessment outcomes. For example, consider a CT scan of the head. The Observation might be a 2 cm lesion in the right frontal lobe that enhances with contrast, and has mass effect (obliteration of the sulci and a shift of midline structures). The Assessment, performed by a trained Neuroradiologist, establishes the presence of a cerebral tumor. Additional observations and assessments are then needed to confirm the diagnosis and further characterize the tumor (e.g. Grade 3 Astrocytoma). 

The MedicalCondition class contains all the medical conditions that afflict the Subject (including past conditions). By medical condition I mean a disease (e.g. Epilepsy) or a disorder (e.g. Seizure) or a transient physiologic state that benefits from medical Interventions (e.g. Pregnancy). There are two main subclasses: Indication (the reason an Intervention is performed, such as a Drug Administration), and AdverseEvent (a medical condition that begins or worsens after an Intervention is performed). 

So the "core" study ontology looks like this (class view only):

--Entity
     -- Person
               --HumanStudySubject
--Activity
     --AdministrativeActivity
     --Analysis
          --Assessment
          --Rule
     --Observation
     --Intervention
--ActivityOutcome
--MedicalCondition
     --Indication
     --AdverseEvent

You can imagine a host of properties/predicates that link these together, e.g. HumanStudySubject participatesIn Activity. Activity hasOutcome ActivityOutcome are just two.  

As the PhUSE project advances, we will be testing to see if all study activities will fit in this model. In the meantime, I welcome your comments here but please consider getting involved in the project. We can guarantee all of us will learn a lot and maybe find a better way of implementing the SDTM. And finally, come to the PhUSE CSS if you can! I hope to see you there.