Copernicus and Elegant Models

Nicolaus Copernicus (1473-1543)
In 1543, the year of his death,  Nicolaus Copernicus published De Revolutionibus Orbium Coelestium. In English, it roughly translates to mean On The Revolution of Celestial Spheres.  Now considered his greatest scientific achievement, it established that the Sun as the center of our solar system and that all the planets including Earth, revolve around it. We now call this the heliocentric model of our solar system, or simply heliocentrism.

The prevailing model at the time was that the Earth was at the center of the universe. Also accepted was that the Earth was stationary and other spheres. including the sun, moved around it. This geocentric model prevailed for well over a millennium: since the 2nd century A.D. , and is most often attributed to Ptolemy,  a Greek philosopher, astronomer and mathematician from what is now Alexandria, Egypt.

The transition to the new model didn't occur quickly or easily. Those opposed to Copernicanism often did so for religious or philosophical reasons, but eventually it became clear that the Copernican model more accurately, and more simply, predicted the movements of the heavenly bodies across the sky.  It had become clear some time before that the Ptolemaic model could not explain certain observations, such as the fact that planets often seemed to stop moving completely or sometimes travel backwards across the sky. For this reason, fudge
The Ptolemaic Model of the Solar System
factors were added to the Ptolemaic model such as epicycles, deferents, and equants so that the model more closely aligned with reality.

An epicycle was a circle whose center travels along the circumference of a larger circle, the latter of which is known as a deferent. Together they helped explain why planets sometimes move backwards across the sky. An equant is the point from which each body sweeps out equal angles along the deferent in equal times. The center of the deferent is midway between the equant and Earth.

These adjustments, or corrections, were important to make the most use of the Ptolemaic model. In addition, church opposition to Copernicanism was often fierce, and initially its acceptance was slow. The argument went something like this:   

How can God the Almighty not place Earth at the center of the universe, since Earth was the most important celestial body of them all? 

But eventually astronomers retired the Ptolemaic model and Copernicanism prevailed. This ushered in  a golden age of astronomy during which scientific discoveries quickly took off and our understanding of the solar system greatly increased.

Fundamentals of the Ptolemaic Model
The take home lesson from this 16th century story is clear. Sometimes you have to retire older, inaccurate models for newer ones that are both simpler to implement, and more accurate. Modelers call these elegant models. So let's fast forward to the 21st century. Could we be experiencing a similar moment with regard to clinical trials data models? I think we are.

Today, we regard the experimental subject in a clinical trial to be the most important entity of a trial. In fact he/she is the most important entity....and all the effort to collect and understand the effects of new therapeutic agent exist with this entity in mind. The Subject is and has been for decades the center of the clinical trials universe, with all other concepts being divided, misunderstood, sometimes neglected, in the model's periphery. But there is a problem with subjects at the center. Subjects are unpredictable. Odd, unexpected things happen to them and often new, unexpected data are collected but they have no room in the subject-centric model. New data often don't fit in this model and have nowhere to go. New constructs are developed to contain and explain them, things like Supplemental Qualifiers, Related Records, and Findings About domains (see Study Data Tabulation Model). These are the epicycles, deferents, and equants of the subject-centric clinical trials universe. In addition, we must recognize that the current model for observations (being made up of findings, interventions, and events) to be fundamentally flawed. We need to fix this in the same way the Ptolemaic model was fundamentally flawed and needed to be redone (see my previous post on this topic).  When we do so, the need for these fudge factors disappears as you will see below.

"Copernican" View of Study Data
So what is the answer? We remove Subjects from the center of a clinical trial universe and replace them with Study Activities (an activity-centric model).  Study activities are, after all, critical for a study's success, and are easily linked to the subject to which the activity pertains.  They become the unit of record for any study.  Study Activities may be observations, assessments, or administrative activities (informed consent, randomization), Study Activities are more predictable, the metadata needed to fully describe the activities are generally more stable, and, by placing them at the center of the data model, we make them easier to represent, understand, and analyze. We also correct mistakes in how observations are modeled. The end result is a model that is not only simpler to implement, but more capable of representing study data and their true meaning. Yes, we have a more elegant model that is closer to reality. 

But clinical trial data aren't spheres that orbit the sun. How do we make this more practical and implementable? The answer is we describe these new definitions and relationships as an ontology that is understandable to both subject matter experts and information systems. It goes something like this.

At the center of our universe is the Study Activity, defined as any activity associated with the planning, conduct, analysis, and interpretation of a study.   We recognize different sub-types, such as study design activities (Arms, Epochs, Visits, etc.), administrative activities (informed consent, randomization), intervention activities (clinical observations, therapeutic interventions), analysis activities. One highly important type of study activity is Assessments, and they need to be added to the model. More on this later. Activities can be defined, protocol specified, planned, scheduled, performed. Every activity has an outcome, which in the case of observations, is the result. Activities also have start rules that link it to other activities (e.g. High Dose Drug administration activity begins on the same day that the Randomization activity is completed and the activity outcome equals High Dose group. Or Blood Pressure Activity begins 3 minutes after completion of Change in Body Position activity from supine to sitting.) Activities may also have sub-activities. For example, a simple Blood Pressure is made up of a Systolic BP measurement and a Diastolic BP measurement.

So our model, viewed as a mind map, starts looking like this. The core classes are shown along with the important relationships.
Core Study Ontology

The next step is to correct the current modeling of observations, which in the Study Data Tabulation Model includes findings, interventions, and events. The following new definitions are proposed, which I believe align more closely with the way these terms are used in clinical medicine.  The definitions are written as Aristotelian Definitions whenever possible to facilitate development of machine readable/executable software.

Intervention:  A Study Activity that involves an interruption to the study subject's normal daily routine for the purpose of observing the subject's physical, psychological, or physiological state, and/or mitigating the effects of a medical condition.

There are three main sub-types of Interventions: 

(Clinical) Observation:   An Intervention whose intent is to measure the physical, physiological, or psychological state of a Study Subject.

Please see my previous post for more thoughts about Observations.

Therapeutic Intervention:   An Intervention whose intent is to have an effect (e.g. treat, cure, mitigate, prevent) on a Medical Condition. (A drug administration is an example of a therapeutic intervention.)

There are other types of Interventions, but for the sake of brevity they are not covered here. Then we have a markedly updated definition of Events:

Event: An occurrence; something that happens. An event persists in time.

Followed by important types of Events:

Medical Condition:  An event that is a disease, injury, disorder, or transient physiologic state that interferes or may interfere with well-being. A medical condition persists in time.

Adverse Event:  A Medical Condition that emerges or worsens following a Medical Intervention, including the use of a drug. Note: there is no presumption of causality. Please refer to a previous post for more information about modeling Adverse Events.

There are other types of events, such as Diagnoses, Indications, Underlying Conditions, but for the sake of brevity we can stop here.

As previously stated, Assessments are an extremely important Study Activity that is notably missing from today's models, and which is sorely needed:

Assessment:  An Examination or Analysis of one or more clinical Observations to identify and/or characterize a Medical Condition.  (When formalized, this is also called an Adjudication.) 

One can easily spend a great deal more time discussing assessments and how to distinguish them from observations, but instead I refer you to a previous post on this topic.

These critically important concepts are modeled as follows in our core ontology. We must recognize that Adverse Events are not Observations. We collect observations and then perform an assessment to determine whether an Adverse Event (a Medical Condition) is related to or helps explain the presence of the AE.  Often this assessment is not captured in writing... it is done by the subject via self assessment, or by the Investigator in the clinic, but an assessment must occur. Not having a place in the current model is a notable omission, in my opinion. So the new high level model looks something like this:

Modeling Observations and Assessments

Consider a Subject who has a measured Blood Pressure in the clinic of 160/110 (a clinical observation). Does this mean that the Subject has a Medical Condition (Essential Hypertension) that requires treatment? No. Additional observations need to be recorded and analyzed (Assessed) to make that link. Is the person obese and the wrong BP cuff size used? Is the person nervous and has transient elevated BP due to anxiety? Very often, serial blood pressures in various settings can help sort this out. If the BP was recorded shortly after taking an experimental treatment, is this an Adverse Event of the treatment? No, for the same reasons. An Assessment needs to be done.  Read more about Assessment in a previous post.

By moving to an activity-centric model, expressing a protocol in a machine-readable/executable format is fairly simple. The protocol essentially becomes a list of all protocol-specified activities, each of which is richly linked to entities and other activities. For example, consider a simple trial consisting of the following screening activities: informed consent, serum RPR (test for syphilis), and fasting blood sugar (FBS). The protocol states that informed consent comes first, and once it is obtained, the RPR and  FBS are obtained within 3 days but not to exceed 7 days. The informed consent activity is linked to a Default Start Rule saying that it starts anytime. The RPR and FBS each have a start rule that is linked to the successful completion of the informed consent activity. Appropriate delays (3 days) and maximum delays (7 days) are captured in the start rule. Epochs and other study design activities are defined by start and stop rules of the sub-activities within each study design concept. In the case of the Screening Epoch, it begins when Informed Consent begins, and ends when yet another activity: Eligibility Determination activity is conducted, Those who are eligible (Eligibility Determination outcome equals true) can begin the Randomization activity. The Eligibility Determination Activity in essence is the Start Rule for the Randomization activity, and so forth. Using this approach, complex protocol design features such as adaptive designs, interim analyses and early termination can be defined using precise start rules and can lead to automated assessments of study conduct and automated identification of protocol violations. For more details on this approach, see a previous post about StudyActivity Start Rules.

By taking a historical perspective, we must commit to retire antiquated, inaccurate models for clinical trials and transition to an elegant activity-centric model for clinical trials data. It would solve many of the problems we continue to encounter and can enable much more powerful techniques to automate the clinical trials enterprise. In the same manner that we retired epicycles, deferents, and equants in astronomy almost 500 years ago, it's time to retire Supplemental Qualifiers, Related Records, and Findings About domains in our quest for a more elegant model for clinical trials data.

In closing, I invite you to review and dissect and comment on this new model for study data, which is written in OWL (Web Ontology Language) and available for download on GitHub. Although it is still a draft, the core fundamental classes and relationships have remained quite stable throughout its development. Special thanks go to Tim Williams and PhUSE, who continue to play significant roles and provide tremendous support in the development of this ontology.

I look forward to hearing from you.

Assessments (revisited)

Today I'd like to focus on Assessments in clinical medicine, including clinical research, with an emphasis on how to represent Assessment information optimally for data exchange. I've written about Assessments before. I'd like to revisit this topic again because, as standards continue to evolve and improve, we as an industry continue to fumble how we handle assessment information. I think this creates unnecessary challenges and limitations in how we document and exchange assessment information. I continue to see widespread confusion between what is an assessment vs. what is an observation. It remains common to see a 'schedule of assessments' in a study protocol when we really mean a schedule of observations. True assessments are not observations and yet they are critically important in understanding and analyzing adverse event reports or clinical trials. Standardizing assessment information remains a critical need in data standardization efforts to support automation.

First some working definitions. I use Aristotelian definitions whenever possible.

  • A (Clinical) Observation is an Intervention whose intent is to measure the physical, physiological, or psychological state of a Person. 
  • An Assessment is an Analysis of one or more Observations to identify and/or characterize (e.g, severity assessment) of a Medical Condition. 
  • A Medical Condition is an Event that is a Disease, Injury, Disorder, or transient physiologic state (e.g. Pregnancy) that interferes or may interfere with well-being. 

These definitions lead to various corollaries.

  • An Observation is a finding (as defined by CDISC) but it is also an intervention, since the observer must intervene in a person's normal routine to make and record the observation. Sometimes the details of the intervention (device used, method, etc) can affect the observation results, so are therefore worth recording. 
  • An Adverse Event is not an Observation (as defined by CDISC). It is instead a Medical Condition that is temporally related to a therapeutic intervention, such as a drug administration. The relationship between an AE and an Observation is as follows. One or more observations may support the presence of an AE after a proper assessment. I note that the U.S. regulations used the words "occurrence," which is my opinion is synonymous with an event. 

Getting back to Assessments, physician's are trained from the earliest days in medical school that first one observes, then one assesses (i.e. interprets the observations) before deciding whether to treat. This makes up the familiar template of a patient encounter known as SOAP (Subjective observations, Objective observations, Assessments, Plan).

Often times Assessments are not formally conducted or documented in clinical trials, so there are no assessment data to standardize. A temperature reading of 39C (an observation) is almost always assumed to be a fever (assessment result-->medical condition), unless there are additional data to suggest something else (faulty digital thermometer, or evidence of the bizarre Munchausen Syndrome). Sometimes Assessment information is critical to ensure proper diagnosis and treatment, and in the case of clinical trials, treatment with the appropriate investigational drug. The Assessment, when formalized and standardized in the protocol, is called an Adjudication. (note: this is not to be confused with the process of ensuring an accurate observation by having, say, an independent blinded reader looking at an imaging study or pathological specimen to determine the accuracy of the observation result. The latter is simply a feature of an observation method to ensure quality measurements).

Speaking of independent observers and assessors, often times third party assessors (e.g. radiologists, pathologists) provide an independent assessment of certain observations (e.g. x-ray, lab results) when the investigator is not qualified to make his or her own assessment of the findings. These reports typically contain two sections. The first one describe the findings/observations (e.g. brain histology showing cortical atrophy, neurofibrillary tangles, amyloid plaques) followed by the assessment: findings are consistent with Alzheimer's Disease.

Because we currently confuse observations and assessments, we have no standard way to report assessment information. Currently sponsors use three possible approaches:
  1. Include assessment information in findings domains
  2. Include assessment information in supplemental findings domain
  3. Include assessment information in custom domains
As one can imagine, the variability in reporting Assessment information currently is a significant problem. At the very least, we need an Assessment domain where one can find at a minimum:
  • an Assessment ID
  • what observations were used in the assessment
  • who is/are the assessor(s) with a link to their qualifications
  • date/time of assessment
  • outcome of the assessment: i.e. cause of the observations, usually a Medical Condition, with a link to a medical conditions domain. 
  • severity of the medical condition at the time of the assessment
  • what method was used for the assessment (e.g. diagnostic criteria)
  • what method/scale was used to document severity
It is important to note that one can have multiple assessments for the same set of observations. In health care this is known as a second opinion. Sometimes that second assessment uses different assessment criteria that the assessor believes is more relevant. A clearer picture now emerges of a Person/Subject having one or more medical conditions that are identified via an assessment of observations (some assessments are well documented, others not so much or at all), and changes in those medical conditions are documented via multiple observations and assessments at different points in time. This paradigm applies equally to the indication (i.e. the disease/condition being treated in the clinical trial) but also to new Medical Conditions (e.g. adverse events) that arise during the trial itself.

And let's remember, Death is not an adverse event, rather one of various possible outcomes for an  adverse event as the medical condition evolves over time. 

As an immediate short-term solution, SDTM should add an Assessment domain which links Assessment information to the observations that were used in the assessment. This would be a big step forward.


Activity Start Rules in Study Protocols

Imagine a software tool that reads a study protocol, can analyze the data collected on any subject to date and determines what activity to perform next. Imagine all of the data collection errors and protocol violations that can be avoided by following the instructions from such a tool. From a regulatory perspective, imagine running an analysis that describes in sufficient detail whether a protocol was conducted properly and can automatically determine where protocol violations occurred. This is possible if we have machine-readable start rules for study activities.

I recently wrote about Study Rules. Today I continue that discussion by looking at Start Rules and how to represent them computationally in the RDF (Resource Description Framework)  to enable the tools described above.

The start rules are rules that determine if an StudyActivity can begin or not.  The start rule describes the condition(s) present in the collected data that must be met for a StudyActivity to begin. If the condition is met, the StartRuleOutcome is true and the activity may begin. If the condition is not met, then the outcome is false.

All study activities have start rules, although they may not be explicitly stated in the protocol.  Often the start rule can be restated more explicitly. It is also useful to consider a "default" start rule which is always true (i.e. the condition to start the activity is always automatically met) and therefore the Activity can begin anytime. The very first activity in a study, usually "ObtainInformedConsent" can begin anytime. Its start rule is the default start rule. The start of the "ObtainInformedConsent" activity also marks the start of the Screening epoch (activity). Screening is a composite activity (i.e. made up of multiple sub-activities) whose start rule says begin Screening when the ObtainInformedConsent activity begins.

Start Rules can describe not only whether the activity can begin, but when it can or cannot begin relative to another activity. There are different types of start rules.

  1. A "Prerequisite Start" Rule (PRST) allows the target activity to begin when the prerequisite activity has started. 
  2. A "Prerequisite Complete" Rule (PRCO) allows the target activity to begin when the prerequisite activity is completed, regardless of the outcome. 
  3. A "Prerequisite Outcome" Rule (PROUT) allows the target activity to begin when the Prerequisite activity is complete and has a certain outcome or result. 
Start rules may be associated with a protocol-specified delay.

  1. A delay = 10 mins means wait 10 minutes after the start condition is met before performing the activity. 
  2. A maximum delay = 10 mins means wait no longer than 10 minutes before starting the activity. 
  3. A minimum delay = 10 mins means wait at least 10 minutes before performing the activity.

Certain planned activities can be skipped altogether if a certain condition exists. For example, if the sex data collection activity results in Sex=M, then a Pregnancy Test can be skipped. This could be captured in a separate "Skip Rule;" a topic for a future post (or better yet, a start rule for a start rule!). In this case, the start rule can be skipped and the activity becomes a "logically skipped"activity per protocol, and there is no violation.

So a start rule has the following properties/predicates
:prerequisite (links to the prerequisite activity)
:prerequisiteExpectedOutcome (the expected outcome/result for the prerequisite activity.)
:prerequisiteExpectedStatus (the expected status for the prerequisite activity.)
:skipActivity (link to the activity that determines if the target activity can be skipped)
:skipOutcome (the expected outcome of the skip activity that allows the target activity to be skipped
(note: this may be better coded in a separate skip rule)
:ruleDescription (a short textual description of the rule)
:ruleDescriptionLong (a long description)
:subRule (a link to another subordinate rule)
three timing attributes: :delay, :delayMin; delayMax (as discussed above).

How does this look using RDF (the Resource Description Framework)?

I have created a "dummy" clinical trial for migraine prevention that starts with the InformedConsent activity and ends up with a randomized treatment activity. During screening, subjects undergo [1] recording the sex of the subject, [2] an RPR test for syphilis, and [3] a pregnancy test, if female.  To be eligible for continued participation, one must have a negative RPR and a negative pregnancy test, if female. Those that are eligible are Randomized to two treatments: LowDose 10mg daily or HighDose 20mg daily.  Those randomized are given study medication to prevent their migraines.

First we link the activities to the subjects:
        rdf:type smm:Person ;
                  data:Observation_Sex_1 , 
                  data:Observation_RPRTest_1 ,               
                  data:randomization_BAL2_1 , data:ProductAdministration_1 , 
                  data:InformedConsent_ADULT_1 , data:Observation_PregnancyTest_1. 

Then we link a start rule for each activity. In the table below the predicate (not shown) is smm:startRule.

Except for the default start rule, which says the activity may begin at anytime, each start rule has a prerequisite activity (one that must take place before the target activity. In our example, the rule for the pregnancy test checks to see if the patient is Female (i.e. does the Sex recording activity document Sex = F? If yes, then the rule is triggered (outcome = true) and the test is tagged for execution.  If the Sex = M then the rule outcome is "not applicable" since the test is not typically performed on male subjects.

Notice that the RPR Test and the Sex observation Test both have the same start rule, which requires that InformedConsent be complete with Outcome InformedConsent_GRANTED. Both tests become eligible for execution simultaneously.

Once all the screening activities have been conducted, the next activity "Eligibility Determination" is triggered and has a binary outcome: True (subject is eligible to continue) or False (subject may not continue). This Eligibility Determination activity is in fact the Start Rule for the Randomization activity. This web of inter-related activities is ideally represented in a graph and makes it quite easy to manage study conduct and monitor the trial for protocol violations. 


Modeling Adverse Event Information

What is an event? It's an occurrence; something that happens. There are countless examples: a baseball game, a wedding, a picnic. All are examples of an event. Let's also include adverse events. One common feature of events is that they persist in time. They have a beginning and an end. Are they observations? No. But we use observations to determine that an event is taking place: one makes numerous observations: the sunny day, the location in a park, the presence of a cooler full of food and drinks, a blanket to lie on the grass, a charcoal grill. Add them all up and you come up with a picnic. We then observe the absence of many of these observations to conclude the event is over. Pretty straightforward.

Adverse Events are no different. Symptoms or signs (observations) begin on a certain date and end on a certain date. One must interpret multiple observations over time to determine that an AE is taking place, or that an AE has resolved. This interpretation in clinical medicine is known as an Assessment, and it is the "A" in the medical encounter record known as as SOAP note. I wrote about modeling clinical data and the relevance of SOAP in a previous post. Although observations also technically have a beginning and an end (e.g. a venipuncture does not occur instantaneously), they should be considered for practical reasons to occur instantaneously. They are a "snapshot in time" of the subject's well being, or lack thereof.

Another thing to keep in mind is that Adverse Events are Medical Conditions (disease, disorder, injury, transient physiological state that can impair health) that are temporally associated with an intervention of some kind (e.g. drug administration), and if noted for the first time in a Subject, it is called a Diagnosis. It's also important to have a qualified Assessor to establish the presence of the correct Adverse Event. Sometimes the Assessor is the patient, who assesses, for example, their headache patterns and concludes they have tension headaches and self-administers an over the counter analgesic. Being able to self-assess your medical condition is in fact a regulatory requirement for a drug to be sold over the counter. Makes sense. But often one needs a trained Assessor, e.g. physician, nurse, to determine that the correct AE is present. Sometimes that assessment is not done properly (or not documented properly) and then problems occur, and second opinions (re-assessments by new assessors) are necessary. Assessments are often associated with Assessment Criteria. These are rules that describe how observations are analyzed and interpreted to determine the presence and severity of a medical condition. Another useful example is a simple blood pressure measurement that is abnormally high, say 150/100 mmHg. Does a single high BP measurement imply that the person has an underlying medical condition known as hypertension? The answer is clearly NO. The proper assessment requires that serial BP measurements are conducted over a period of time to establish the persistence of a clinical event (in this case a disorder) known as hypertension.

So currently, Adverse Event reporting, whether it's in clinical trials or post-marketing safety monitoring, is fraught with the fact the observations (that are used to assess the presence of an AE) and the AE itself are often mixed together, and the analyst must do his or her own Assessment after the fact. Take, for example, the following report of a patient who takes a dose of drug X and then 2 days later develops a sore throat, runny nose, nasal congestion, cough, sinus pain, and viral nasopharyngitis. Not all of these are AEs. The first five are in fact observations that support the presence of the sixth, the true medical condition at play here. Sometimes the observations don't clearly support the presence of a medical condition, in which case a "differential diagnosis" is developed, which is essentially a list of all the medical conditions that could possible cause the observations, followed by a systematic collection of more observations to identify the correct diagnosis.

There is a strong desire within FDA and elsewhere to automate the detection of adverse events. This is quite a challenging task, but it should be made clear that the following must take place before any system or tool can succeed in adverse event detection.

  1. We need to distinguish observations from events
  2. We need qualified assessors to analyze/interpret the observation results
  3. As much as possible, we need to standardize the assessment process by documenting the assessment criteria necessary to identify an AE with high confidence. 

Adverse Event Identification and Characterization
Here is my proposal for a workable data model that can be used to automate AE detection some day. It should be made clear that it deviates from the SDTM and BRIDG notion of an event, as I don't believe these models have it quite right. Remember that observations must undergo an Assessment to determine if a medical condition / AE is present. Sometimes more than one Assesments are done (e.g. second opinions). Finally observations don't get treated, rather the medical condition(s) that are the cause of the abnormal observations are the targets of treatment.


On Observations in Clinical Trials, or, "Did I get that observation right?"

I live in Florida, a state almost surrounded by water. How long is its coastline? How does it compare with the coastline of other states? So, like many others, I turn to ... Google. In a few seconds, I find these results posted on Wikipedia:

You can predict my reaction. How can the method make such a big difference in the results? The web site provides detailed information about each method and it becomes a relatively easy, though highly manual, task to determine which method is more appropriate for one's use case. The take home lesson is clear: the method of observation may affect the results.

Then there is the famous Heisenberg Uncertainty Principle in Physics, which states that the position and velocity of an object cannot both be measured exactly at the same time, even in theory. For large objects, like an automobile, the uncertainty is negligible, but for sub-atomic particles, this is a big deal. The fundamental reason behind the uncertainty is due to in part to the act of making the observation, i.e. the method of observation. In other words, any attempt to measure precisely the velocity of an electron, for example, "will knock it about in an unpredictable way, so that a simultaneous measurement of its position has no validity."

Just so you don't think this concern is limited to physics and geography, consider this well-known medical school fact. A standard blood pressure cuff, when used on significantly obese individuals will typically provide a falsely high reading when compared to the same observation performed using an over-sized cuff. So take note:

The method of observation may affect the results.

This give rise to another "aha!" moment: Observations are Interventions. The observer must intervene in the subjects normal daily routine and execute a specific method of observation to obtain the observation result. Sometimes the method is innocuous like answering a question on a questionnaire, but sometimes can be quite invasive, like a cardiac catheterization to measure coronary artery diameters. Often times the observation and results are combined with an interpretation of the observation(s) (i.e. an Assessment), to establish the presence and severity of a Medical Condition (e.g. Coronary Artery Disease) and its severity. More and more an Intervention to make an observation is combined with an attempt to alter the natural history of the Medical Condition (i.e. a "Therapeutic Intervention") as in the case of a diagnostic cardiac catheterization during which a drug-eluting stent is inserted.

The bottom line is we need to recognize that observations are interventions whose main purpose is to measure the physical, physiological, or psychological state of an individual, and that the details of the method used to make the observation can be very important and may introduce bias in the results.

The take home message of this blog is: An observation doesn't just happen. Someone intervened to make it happen and the method of intervention can affect the results.

Both the SDTM and BRIDG consider observations (called "findings") as different than interventions. It's time to update that thinking. "Findings" are a type of interventions. Furthermore, SDTM considers findings, interventions, and events as different types of observations. I disagree. Events, for example, are not observations. This last statement is a topic of a future blog.

Thank you for your comments.


Do We Need a Study Data Reviewer's Guide?

As part of a robust study data standardization program, the U.S. FDA publishes the Study Data Technical Conformance Guide. The purpose of this document is to provide "technical specifications for sponsors for the submission of animal and human study data and related information in a standardized electronic format" for investigational and marketing applications. Section 2.2 of the guide recommends the submission of a Study Data Reviewer's Guide to "describe any special considerations or directions or conformance issues that may facilitate and FDA reviewer's use of the submitted data and may help the reviewer understand the relationships between the study report and the data." Although FDA doesn't recommend the use of any specific SDRG template, it references a standard template developed by the Pharmaceutical Users Software exchange (PhUSE). 

Let's take a close look at this template. The Purpose of the document is to provide "context for tabulation datasets and terminology that benefit from additional explanation beyond the Data Definitions document (define.xml).  In addition, this document provides a summary of SDTM conformance findings." 

Here is some of the information suggested for inclusion in the SDRG

  1. Is the study ongoing? If so, describe the data cut or database status?
  2. Were SDTM used as sources for the analysis datasets?
  3. Do the submission datasets include screen failures?
  4. Were any domains planned but not submitted because no data were collected?
  5. A tabular listing of eligibility criteria that are not included in IE domain
Before we tackle the question posed at the top of this post, let's ponder a broader question: why do we need standardized data? This one is easy. Standardized data enable process efficiencies and automation. In the case of clinical trials data, reviewers are instantly familiar with the structure of the data, because it is the same across all SDTM-based study datasets. This immediate familiarity with the data structure certainly leads to review process efficiencies. But it only starts there. A common structure and common vocabularies lead to the development of standard analyses that can be automated and reused across studies.

If standardized data leads to increased familiarity with data structures then this should lead to a decrease in additional materials needed to explain the data. But we now have yet another document to explain the data that we didn't have before. The fact that a document like the SDRG is needed at all implies that there are additional data, or additional meaning behind the data, that are not captured in the datasets. 

If we had a truly semantically interoperable data exchange, there would be no need for an SDRG. The meaning behind the data would be with the data, not locked up somewhere else in a human-readable text document. In other words, the need for a Study Data Reviewers Guide represents a failure of the data standards and/or the implementation of the data standards in achieving an adequate degree of semantically interoperable data exchange.  

Sounds harsh? I believe this last statement is true. Let's look at some examples. The SDRG should describe if the study is ongoing. The data contain a study start date and study end date. If a study is ongoing, the study end date should be null. Because a null value for this variable could be due to other reasons, a separate variable (similar to the HL7 null flavor) can describe why the end date is null. Controlled vocabulary can describe the various possible reasons. This approach provides both a standard machine readable approach and human interpretable way of knowing if the study is ongoing. One could even add a 'ongoing study' flag in the trial summary (TS) domain if desired.

Here's another one: Were SDTM used as the source for analysis datasets? If one described each data point as a resource, each with a unique resource identifier (URI), then a system can easily determine where that resource came from. One could see that a data point in an analysis dataset is the same data point (i.e. resource) as what is in the SDTM. These URIs make traceability/data provenance analyses so much easier.

How about this one: Do the submission datasets include screen failures? Each subject should be linked to an administrative study activity called 'EligibilityTest" (or something similar) and the possible outcomes of which are TRUE or FALSE. A subject with EligibilityTest=TRUE means they passed screening and are eligible to continue in the study. EligibilityTest=FALSE means they failed screening. A quick scan of the data would determine if there are any subjects with EligibilityTest=FALSE indicates screen failures are present in the datasets. (Note that the rules for determining TRUE or FALSE are the eligibility criteria themselves, which have a bearing on the next example)

Another example is: the SDRG should contain a tabular listing of eligibility criteria not found in IE (inclusion/exclusion criteria dataset). All study activities should have well-described start rules. The study activity start rules for determining whether a study activity = Randomization can begin are themselves the eligibility criteria. A description of the Randomization start rule is incomplete without a listing of these rules. Their presence in the data would make it unnecessary to repeat them in an SDRG.

So what is the answer to our initial question? If data standards and their implementation were adequate, we would not need an SDRG. That fact that we need SDRGs today should be a sign our study datasets still lack important meaning that analysts need to interpret/analyze the data. It implies that more standards development and better implementation of standards are needed to increase semantically interoperable data exchange. The SDRG should eventually not be needed and should disappear. I think I'm not alone in wishing for the SDRG's eventual demise.

Please share your thoughts and ideas. 



Rules in Study Protocols

When you read a study protocol, your are bombarded by rules. Some are explicitly stated. Many are implicit and must be teased out. Rules are extremely important in ensuring that protocols are conducted correctly. Rules are critical for a good study outcome. Unfortunately, we don't have a good way to standardize protocol rules. This makes it challenging to automate study conduct activities and quickly analyze if a study "followed the rules."

Let's dissect the components of a rule. A rule basically looks like this:

IF {condition A} THEN {Do B} ELSE {Do C}

the ELSE clause is optional and it is assumed to default to "do nothing" if condition A is not met. 
Rules can be complex :

IF {condition A} THEN {
(IF {condition E} THEN {Do F} ELSE {Do G} }

Evaluating a Rule is an Activity whose outcome is binary:  either the condition(s) is/are met ("true") or not met ("false"). One could argue for a 3rd category, not applicable, for cases where the reason to have a rule in the first place doesn't apply (e.g. when to conduct a pregnancy test in a male) 

In clinical studies, rules often depend on other activities. I call these prerequisite activities (PA). For example:
IF {migraine occurs} THEN {take study drug}
or a more precise way of expressing it:
IF {headache assessment = MIGRAINE} THEN {take study drug} 

In this case the prerequisite activity is a headache assessment and the condition is met when the headache assessment outcome indicates that a migraine is present. 

Regarding how prerequisite activities are evaluated, sometimes it is sufficient that the PA simply has begun (PA status = started) or completed (PA status = complete) or, more commonly, completed with a specific expected outcome (PA expected outcome = migraine). 

When looking at rules more closely, they can be expressed as start rules for other activities. Let's call these target activities (TA). 

Target Activity: Study Drug Administration_01
Start Rule: MigrainePresent_01 -- Prerequisite Activity:  HeadacheAssessment_01
                                                       PA Expected ActivityOutcome:  MIGRAINE

StudyDrugAdministration_01 is a planned activity that just sits there, waiting to be performed. As soon as the headache assessment is performed and whose outcome is a documented MIGRAINE, the rule outcome is set to TRUE and the target activity can begin. 

One can add qualifiers in the rule to describe exactly when the target activity is performed. for example a delay = 30 minutes means wait exactly thirty minutes after the condition is met before starting the activity. maxdelay = 60 minutes means wait no more than 60 minutes before starting the activity. mindelay = 30 minutes means wait a minimum of 30 minutes before starting the activity. 

I have tried this paradigm in multiple scenarios and so far it seems to work (randomization, eligibility determination, delayed start, assessments). 

In a future post, I'd like to explore how these rules can be expressed computationally using the RDF.