2020-01-17

Copernicus and Elegant Models

Nicolaus Copernicus (1473-1543)
In 1543, the year of his death,  Nicolaus Copernicus published De Revolutionibus Orbium Coelestium. In English, it roughly translates to mean On The Revolution of Celestial Spheres.  Now considered his greatest scientific achievement, it established that the Sun as the center of our solar system and that all the planets including Earth, revolve around it. We now call this the heliocentric model of our solar system, or simply heliocentrism.

The prevailing model at the time was that the Earth was at the center of the universe. Also accepted was that the Earth was stationary and other spheres. including the sun, moved around it. This geocentric model prevailed for well over a millennium: since the 2nd century A.D. , and is most often attributed to Ptolemy,  a Greek philosopher, astronomer and mathematician from what is now Alexandria, Egypt.

The transition to the new model didn't occur quickly or easily. Those opposed to Copernicanism often did so for religious or philosophical reasons, but eventually it became clear that the Copernican model more accurately, and more simply, predicted the movements of the heavenly bodies across the sky.  It had become clear some time before that the Ptolemaic model could not explain certain observations, such as the fact that planets often seemed to stop moving completely or sometimes travel backwards across the sky. For this reason, fudge
The Ptolemaic Model of the Solar System
factors were added to the Ptolemaic model such as epicycles, deferents, and equants so that the model more closely aligned with reality.

An epicycle was a circle whose center travels along the circumference of a larger circle, the latter of which is known as a deferent. Together they helped explain why planets sometimes move backwards across the sky. An equant is the point from which each body sweeps out equal angles along the deferent in equal times. The center of the deferent is midway between the equant and Earth.

These adjustments, or corrections, were important to make the most use of the Ptolemaic model. In addition, church opposition to Copernicanism was often fierce, and initially its acceptance was slow. The argument went something like this:   

How can God the Almighty not place Earth at the center of the universe, since Earth was the most important celestial body of them all? 

But eventually astronomers retired the Ptolemaic model and Copernicanism prevailed. This ushered in  a golden age of astronomy during which scientific discoveries quickly took off and our understanding of the solar system greatly increased.

Fundamentals of the Ptolemaic Model
The take home lesson from this 16th century story is clear. Sometimes you have to retire older, inaccurate models for newer ones that are both simpler to implement, and more accurate. Modelers call these elegant models. So let's fast forward to the 21st century. Could we be experiencing a similar moment with regard to clinical trials data models? I think we are.

Today, we regard the experimental subject in a clinical trial to be the most important entity of a trial. In fact he/she is the most important entity....and all the effort to collect and understand the effects of new therapeutic agent exist with this entity in mind. The Subject is and has been for decades the center of the clinical trials universe, with all other concepts being divided, misunderstood, sometimes neglected, in the model's periphery. But there is a problem with subjects at the center. Subjects are unpredictable. Odd, unexpected things happen to them and often new, unexpected data are collected but they have no room in the subject-centric model. New data often don't fit in this model and have nowhere to go. New constructs are developed to contain and explain them, things like Supplemental Qualifiers, Related Records, and Findings About domains (see Study Data Tabulation Model). These are the epicycles, deferents, and equants of the subject-centric clinical trials universe. In addition, we must recognize that the current model for observations (being made up of findings, interventions, and events) to be fundamentally flawed. We need to fix this in the same way the Ptolemaic model was fundamentally flawed and needed to be redone (see my previous post on this topic).  When we do so, the need for these fudge factors disappears as you will see below.

"Copernican" View of Study Data
So what is the answer? We remove Subjects from the center of a clinical trial universe and replace them with Study Activities (an activity-centric model).  Study activities are, after all, critical for a study's success, and are easily linked to the subject to which the activity pertains.  They become the unit of record for any study.  Study Activities may be observations, assessments, or administrative activities (informed consent, randomization), Study Activities are more predictable, the metadata needed to fully describe the activities are generally more stable, and, by placing them at the center of the data model, we make them easier to represent, understand, and analyze. We also correct mistakes in how observations are modeled. The end result is a model that is not only simpler to implement, but more capable of representing study data and their true meaning. Yes, we have a more elegant model that is closer to reality. 

But clinical trial data aren't spheres that orbit the sun. How do we make this more practical and implementable? The answer is we describe these new definitions and relationships as an ontology that is understandable to both subject matter experts and information systems. It goes something like this.

At the center of our universe is the Study Activity, defined as any activity associated with the planning, conduct, analysis, and interpretation of a study.   We recognize different sub-types, such as study design activities (Arms, Epochs, Visits, etc.), administrative activities (informed consent, randomization), intervention activities (clinical observations, therapeutic interventions), analysis activities. One highly important type of study activity is Assessments, and they need to be added to the model. More on this later. Activities can be defined, protocol specified, planned, scheduled, performed. Every activity has an outcome, which in the case of observations, is the result. Activities also have start rules that link it to other activities (e.g. High Dose Drug administration activity begins on the same day that the Randomization activity is completed and the activity outcome equals High Dose group. Or Blood Pressure Activity begins 3 minutes after completion of Change in Body Position activity from supine to sitting.) Activities may also have sub-activities. For example, a simple Blood Pressure is made up of a Systolic BP measurement and a Diastolic BP measurement.

So our model, viewed as a mind map, starts looking like this. The core classes are shown along with the important relationships.
Core Study Ontology



The next step is to correct the current modeling of observations, which in the Study Data Tabulation Model includes findings, interventions, and events. The following new definitions are proposed, which I believe align more closely with the way these terms are used in clinical medicine.  The definitions are written as Aristotelian Definitions whenever possible to facilitate development of machine readable/executable software.

Intervention:  A Study Activity that involves an interruption to the study subject's normal daily routine for the purpose of observing the subject's physical, psychological, or physiological state, and/or mitigating the effects of a medical condition.

There are three main sub-types of Interventions: 

(Clinical) Observation:   An Intervention whose intent is to measure the physical, physiological, or psychological state of a Study Subject.

Please see my previous post for more thoughts about Observations.

Therapeutic Intervention:   An Intervention whose intent is to have an effect (e.g. treat, cure, mitigate, prevent) on a Medical Condition. (A drug administration is an example of a therapeutic intervention.)

There are other types of Interventions, but for the sake of brevity they are not covered here. Then we have a markedly updated definition of Events:

Event: An occurrence; something that happens. An event persists in time.

Followed by important types of Events:

Medical Condition:  An event that is a disease, injury, disorder, or transient physiologic state that interferes or may interfere with well-being. A medical condition persists in time.

Adverse Event:  A Medical Condition that emerges or worsens following a Medical Intervention, including the use of a drug. Note: there is no presumption of causality. Please refer to a previous post for more information about modeling Adverse Events.

There are other types of events, such as Diagnoses, Indications, Underlying Conditions, but for the sake of brevity we can stop here.

As previously stated, Assessments are an extremely important Study Activity that is notably missing from today's models, and which is sorely needed:

Assessment:  An Examination or Analysis of one or more clinical Observations to identify and/or characterize a Medical Condition.  (When formalized, this is also called an Adjudication.) 

One can easily spend a great deal more time discussing assessments and how to distinguish them from observations, but instead I refer you to a previous post on this topic.

These critically important concepts are modeled as follows in our core ontology. We must recognize that Adverse Events are not Observations. We collect observations and then perform an assessment to determine whether an Adverse Event (a Medical Condition) is related to or helps explain the presence of the AE.  Often this assessment is not captured in writing... it is done by the subject via self assessment, or by the Investigator in the clinic, but an assessment must occur. Not having a place in the current model is a notable omission, in my opinion. So the new high level model looks something like this:

Modeling Observations and Assessments

Consider a Subject who has a measured Blood Pressure in the clinic of 160/110 (a clinical observation). Does this mean that the Subject has a Medical Condition (Essential Hypertension) that requires treatment? No. Additional observations need to be recorded and analyzed (Assessed) to make that link. Is the person obese and the wrong BP cuff size used? Is the person nervous and has transient elevated BP due to anxiety? Very often, serial blood pressures in various settings can help sort this out. If the BP was recorded shortly after taking an experimental treatment, is this an Adverse Event of the treatment? No, for the same reasons. An Assessment needs to be done.  Read more about Assessment in a previous post.

By moving to an activity-centric model, expressing a protocol in a machine-readable/executable format is fairly simple. The protocol essentially becomes a list of all protocol-specified activities, each of which is richly linked to entities and other activities. For example, consider a simple trial consisting of the following screening activities: informed consent, serum RPR (test for syphilis), and fasting blood sugar (FBS). The protocol states that informed consent comes first, and once it is obtained, the RPR and  FBS are obtained within 3 days but not to exceed 7 days. The informed consent activity is linked to a Default Start Rule saying that it starts anytime. The RPR and FBS each have a start rule that is linked to the successful completion of the informed consent activity. Appropriate delays (3 days) and maximum delays (7 days) are captured in the start rule. Epochs and other study design activities are defined by start and stop rules of the sub-activities within each study design concept. In the case of the Screening Epoch, it begins when Informed Consent begins, and ends when yet another activity: Eligibility Determination activity is conducted, Those who are eligible (Eligibility Determination outcome equals true) can begin the Randomization activity. The Eligibility Determination Activity in essence is the Start Rule for the Randomization activity, and so forth. Using this approach, complex protocol design features such as adaptive designs, interim analyses and early termination can be defined using precise start rules and can lead to automated assessments of study conduct and automated identification of protocol violations. For more details on this approach, see a previous post about StudyActivity Start Rules.

By taking a historical perspective, we must commit to retire antiquated, inaccurate models for clinical trials and transition to an elegant activity-centric model for clinical trials data. It would solve many of the problems we continue to encounter and can enable much more powerful techniques to automate the clinical trials enterprise. In the same manner that we retired epicycles, deferents, and equants in astronomy almost 500 years ago, it's time to retire Supplemental Qualifiers, Related Records, and Findings About domains in our quest for a more elegant model for clinical trials data.

In closing, I invite you to review and dissect and comment on this new model for study data, which is written in OWL (Web Ontology Language) and available for download on GitHub. Although it is still a draft, the core fundamental classes and relationships have remained quite stable throughout its development. Special thanks go to Tim Williams and PhUSE, who continue to play significant roles and provide tremendous support in the development of this ontology.

I look forward to hearing from you.











No comments:

Post a Comment