Extensible Code Lists: an RDF Solution

We are all familiar with code lists, or value sets as they are also called. These are permissible values for a variable. For example Race, as described by the U.S. Office of Management and Budget (OMB), can have 5 permissible values: White, Black or African American, Asian, American Indian or Alaska Native, and Native Hawaiian or other Pacific Islander.

Some standard code lists are incomplete, i.e. they don't capture the universe of possible values for a variable. These code lists are called extensible. The Sponsor may create custom terms and add them to the code list. Managing these is a challenge. Here is an idea that we are proposing for the PhUSE SDTM Data in RDF project that we are kicking off next week at the PhUSE Computational Science Symposium.  RDF has a unique advantage over other solutions in that it is designed to work with data that are distributed across the web. It can be used to integrate multiple dictionaries from multiple sources. Here's one way it can work. 

First one creates a study terminology ontology containing all the standard terminology concepts needed for clinical trials. It looks something like this:

One can see how to leverage other terminologies. For example, the Vital Signs class links to SDTM terminology expressed in RDF. In this case the resources shown here are for Diastolic Blood Pressure.

Now you create a second ontology for custom terms, which looks very similar to the first one:

In this example, the sponsor performed three custom flags for subjects who completed 8, 16, and 24 weeks of treatment, respectively. These are entered as custom:PopulationFlag analyses. Next, one imports the standard terminology ontology and specifies using the rdfs:subClassOf property that the custom terms are sub-classes of the standard concepts. So now it looks like this:

Looking at the code:PopulationFlag example, there are three standard population flags specified: Efficacy (EFF), Safety (SAF), and Intent to Treat (ITT). Furthermore there are the three custom flags as previously described.

The nice thing about this approach is that the custom terms exist independently from the standard terms and can be easily removed/ignored for the next study, yet they can be linked in this way to the standard terms so tools treat them the same. A SPARQL query looking for all members of the code:PopulationFlag class will return 6 individuals. For the next study, one can create a different set of custom terms. The "web" of study terminologies begins to look like the figure below. One can imagine a diverse library of controlled terms all available for implementation almost literally at one's fingertips.

One can link to other terminologies in the same way. Ideally, all the standard ontologies exist on the web and one merely links to them, thereby taking advantage of Linked Data principles.

I appreciate your comments. 


Temporal Concepts in Clinical Trials

As the saying goes, "timing is everything." This is no less true in clinical trials because knowing when activities occurred or how long they last often holds the key to proper interpretation of the data. Documenting temporal elements for activities in clinical trials is therefore crucial.

In the RDF world, we can leverage the work of others who have thought about this issue in great detail. It turns out that the World Wide Web Consortium (www.w3c.org), has developed a time ontology in OWL for anyone to use. It's rather simple and elegant and has useful application for our Study Ontology. Linking our study ontology with the w3c time ontology is a nice example of the benefits of Linked Data. The ontology goes like this....

A temporal entity can be either a time:Instant (a single point in time) or a time:Interval (has a duration). Intervals have properties like time:hasBeginning and time:hasEnd. These are not totally disjoint because one can consider an Instant as an Interval where the start and end Instants are the same, but this is a minor point.

For many Activities, such as a blood test or a vital signs measurement, all we really care about is the date/time it occurred. For all practical purposes, a time:Instant.  Some activities do have a duration worth knowing about, so one can attach a time:Interval to them. The nice thing about Intervals is that it links the beginning instant to the end instant ... they go together. The time:Interval resource is the link that holds them together.

So let's look at some examples taken from the SDTM of some important Intervals and how they might look in the RDF when we link to the w3c time ontology. As always, I use Turtle syntax as it's very human-readable:

study:ReferenceStudyInterval rdf:type time:Interval;
     time:hasBeginning sdtm:RFSTDTC ;
     time:hasEnd sdtm:RFENDTC .

study:ReferenceExpsosureInterval rdf:type time:Interval;
     time:hasBeginning sdtm:RFXSTDTC ;
     time:hasEnd sdtm:RFXENDTC .

and another important one:

study:Lifespan rdf:type time:Interval;
     time:hasBeginning sdtm:BRTHDTC ;
     time:hasEnd sdtm:DTHDTC .

Now here is where it gets fun. Let's say you want to derive RFXSTDTC and RFXENDTC (first and last day of exposure). Imagine your database has various time:Interval triples for each subject, each describing a fixed dose interval. Imagine in this example, Person1 participates in 3 fixed dosing intervals, as shown in the RDF as follows:

study:Person1 study:participatesIn study:DrugAdministration1, study:DrugAdministration2,

Each administration is associated with an interval: Interval1, Interval2, Interval3, each of which has a time:Beginning and time:End date. One can write a SPARQL query to pull out the minimum (earliest) time:hasBeginning date and the maximum (latest) time:hasEnd date for all the drug administration intervals and thereby derive automatically the two SDTM dates of interest.  The same can be done for RFPENDTC (reference participation end date). I can't tell you how often this date is wrong in actual study data submissions. A SPARQL query can identify all dates for all study activities associated with a Subject and pick out the maximum date, which happens to be the RFPENDTC. Best of all, these standard queries can exist as a resource on the web using SPIN (SPARQL Inference Notation) for anyone to use.

But first, you need study data in the RDF and a Study Ontology.


What's in a Name?

Standardizing clinical trial data is all about automation. Standard data enable automated processes that bring efficiency and less human error. But automating a process, for example, an analysis of a lab test across multiple subjects in a trial, requires computers and information systems to be able to unambiguously identify that lab test. This is called computable semantic interoperability (CSI). The key is "computable." It's not enough that a human can identify the lab test of interest, but computers need to do the same.  I previously wrote about the interoperability problem and I revisit it here today, focusing on test names.

There are two situations that impede CSI: [1] when the same Thing goes by two different names, or even more troublesome [2] when two different Things go by the same name.  When I say Mustang do I mean the car, or the horse? Some describe the term Mustang is "overloaded" because it can represent more than one Thing. Issue #1 is addressed by controlled terminology. Synonyms can then be mapped to a controlled term that all agree to use. Issue #2 is more challenging, but it is avoidable by assigning different names to different things. I consider this a best practice to promote CSI.

As an example, let's look at the CDISC controlled term "glucose" (code C105585). The definition is "a measurement of the glucose in a biological specimen." The reality is that a serum glucose and a urine glucose are two completely different tests, having different clinical meaning and interpretation. I have been advocating for more granular lab test names for a long time so that computers can easily distinguish different tests. The counter-argument is that serum glucose is really two concepts: the specimen and the "thing" being measured (known as the component, or analyte in LOINC), and therefore should be represented as two different variables. In fact, the SDTM does have a separate field for specimen information (LBSPEC), and don't get me wrong, there is value is separate specimen information, but that doesn't diminish the need for different test names. The problem is, one has to tell or program a computer "if test=glucose, look at specimen information to pick out the correct glucose test." But what about another observation, say "Occurrence Indicator" (an FATEST as described in the Malaria Therapeutic Area User's Guide). One must know to look at another field (FAOBJ) to understand that the occurrence is a fever, or a chill. Where to look for that additional data is not always obvious and varies by test. In the Malaria example, we have two different occurrences and they should each have their own name: Fever Indicator, Chills Indicator.

There are two problems with relying on other data fields to disambiguate an overloaded concept: [1] keeping track of which field to disambiguate which test is onerous, and [2] new lab tests are being added all the time. (By the way, LOINC avoids this problem by assigning different codes to different tests and providing separate data fields for analyte, source, method, etc.)

This problem became clear to me when a colleague at FDA, who was using an automated analysis tool and was analyzing serum glucose levels among thousands of patients and was getting funny results. After quite some digging, she realized the tool was pooling serum and urine glucoses. She and I knew to look at LBSPEC. The tool, however, wasn't smart enough to do so. I wonder how many other analyses of other tests have this problem and go unrecognized.

So, in the interest of promoting true computable semantic interoperability without burdening data recipients with unnecessary algorithms to disambiguate overloaded terms, please remember to name different things differently. It can be that simple.


SDTM Data in RDF: Activities in Clinical Trials

PhUSE has approved a new project to evaluate and demonstrate the potential value of using RDF for SDTM data. It's called SDTM Data as RDF.  The project is headed by Tim Williams and myself and it will kick off at the upcoming PhUSE Computational Science Symposium just outside Washington DC.  As many of you know, I'm an advocate for using the RDF for study data. One of the goals of this new project is to develop a simple Study Ontology that, when combined with study data in RDF, can be used to generate high quality, highly standardized and valid SDTM datasets. If successful, it will address a major ongoing problem: high variability in SDTM implementation across studies and applications. 

To achieve that goal, we will develop a simple study ontology using OWL that will support SDTM dataset creation using standard SPARQL queries. It will leverage existing BRIDG classes as needed. We are starting with two domains: DM and VS and, if successful, the ontology will be extended to support other SDTM domains as well as non-standard data that currently wind up in SUPPQUAL or custom SDTM domains. If successful, the project outcome can provide a compelling reason to use RDF for study data today to solve a major SDTM implementation challenge that sponsors currently face. As the project progresses, I plan to discuss modeling challenges and how RDF/OWL can address them. Today I discuss Activities in clinical trials.   

A clinical trial is at its most fundamental construct a collection of activities and the rules that describe when those activities are performed. There are also rules that describe how those activities are grouped (e.g. into arms, visits, epochs, etc.) to facilitate study conduct and analysis.

Our mini Study Ontology divides study Activities into these subClasses:
  1. Observations -- symptoms, signs, tests, etc. that measure the physical, mental, or physiological state of a subject
  2. Analyses -- activities that take as input one or more Observations and generates analysis results
  3. Interventions -- activities that are performed on a subject with the usual intent of modifying or identifying a medical condition (e.g. drug administration, device implantation, surgery)
  4. Administrative Activities: e.g. informed consent, randomization, etc.
The Analysis class has a couple of interesting subclasses: 
  1. Assessment - this activity analyzes Observations and their results to identify (e.g. diagnose) and/or characterize (e.g. severity assessment) a Medical Condition.
  2. Rule - this activity analyzes Observations and their results to determine the start of another Activity. This includes eligibility criteria, which takes screening data and determines whether the subject advances to the next Activity, usually randomization. It also includes more generic start rules such as "take study medication within two hours of headache onset." 
All Activities have outcomes, so there is a class ActivityOutcome. For Observations, the outcome is the result. For Assessments, the outcome is the identification and characterization of a Medical Condition. Many tests in medicine combine Observation outcomes with Assessment outcomes. For example, consider a CT scan of the head. The Observation might be a 2 cm lesion in the right frontal lobe that enhances with contrast, and has mass effect (obliteration of the sulci and a shift of midline structures). The Assessment, performed by a trained Neuroradiologist, establishes the presence of a cerebral tumor. Additional observations and assessments are then needed to confirm the diagnosis and further characterize the tumor (e.g. Grade 3 Astrocytoma). 

The MedicalCondition class contains all the medical conditions that afflict the Subject (including past conditions). By medical condition I mean a disease (e.g. Epilepsy) or a disorder (e.g. Seizure) or a transient physiologic state that benefits from medical Interventions (e.g. Pregnancy). There are two main subclasses: Indication (the reason an Intervention is performed, such as a Drug Administration), and AdverseEvent (a medical condition that begins or worsens after an Intervention is performed). 

So the "core" study ontology looks like this (class view only):

     -- Person

You can imagine a host of properties/predicates that link these together, e.g. HumanStudySubject participatesIn Activity. Activity hasOutcome ActivityOutcome are just two.  

As the PhUSE project advances, we will be testing to see if all study activities will fit in this model. In the meantime, I welcome your comments here but please consider getting involved in the project. We can guarantee all of us will learn a lot and maybe find a better way of implementing the SDTM. And finally, come to the PhUSE CSS if you can! I hope to see you there. 


On Capturing Information about Medical Conditions

In today's post, I discuss Medical Conditions in some detail, with a focus on an important question: how do we best capture information about medical conditions as they evolve over time? This is an important question because understanding how medical conditions change over time is key to understanding how medical interventions affect those conditions. As usual, I focus on subject level clinical data collected in clinical trials but I think this is equally valid for other use cases.

I define a Medical Condition as a disease, injury, disorder, or transient physiologic state that interferes or may interfere with well-being. A medical condition persists in time. Medical conditions also evolve over time. The practice of medicine focuses on minimizing the impact of medical conditions to one's health. 

So how do we best document the evolution of medical conditions over time? My thinking on this topic is heavily influenced by a very useful paper, which I encourage you to read. It's titled "Toward an Ontological Treatment of Disease and Diagnosis," by Richard H. Scheuermann, Ph.D., et. al. It provides precise definitions for common terms such as a Disorder, Pathological Process, Disease, etc. This precision the authors argue is important to enable automated analysis and reasoning across aggregated clinical data from multiple sources. 

Their definition of Disease is: A disposition to undergo pathological processes that exists in an organism because of one ore more disorders in that organism. A disorder in turn is something that is wrong with the body and is associated with a pathological process. For example, Epilepsy is a disease that disposes the individual to recurrent seizures (disorder/pathological process). As another example, consider Systemic Lupus Erythematosis, a disease that disposes the individual to multi-organ autoimmune damage that may be manifested by multiple disorders: dermatitis, arthritis, pericarditis, nephritis, etc. One has to understand the underlying disorders in order to fully understand the disease. 

Notice that my definition of a medical condition includes both diseases and disorders, because sometimes one doesn't know the disease, rather just the disorder that is manifest, but it's useful to draw that distinction when one can. Understanding the disease can help select the best treatment for the underlying disorder. For example, the disorder may be a bone fracture from an injury, but additional observations may disclose the disease Osteoporosis, which would affect the treatment plan.  

So the clinical data flow in my mind goes something like this:  clinical observations give rise to an assessment to identify/characterize one or more disorders, which may enable the identification of a disease. 

One begins to see how to organize the data for maximal use downstream. Clinical observations are grouped together during an assessment to identify and categorize a disorder and possibly a disease. Both disorders and diseases are events that persist in time, so each is associated with a start and end date. Each also has a diagnosis date, i.e. the date an assessment first identified the condition. Severity is an observation that can be associated with both disorders and diseases, and that changes over time. 

What are the implications for the SDTM? In a previous post I argue for a single Medical Condition domain that describes each medical condition, past and present, for each subject. Each record is a medical condition (e.g. a disease or disorder) and has standard attributes such as start date, end date, diagnosis date, etc. One needs to group and link all the disorders that pertain to a given disease. So for schizophrenia, one would list all the known psychotic episodes for that subject and link them to the schizophrenia disease record. Same thing with Multiple Sclerosis. The M.S. record would link to all the known relapses (disorders). There would also be links to the observations that were used to characterize the disorder, and an optional link to the assessment (i.e. adjudication) record that contains details of that assessment. Each disease/disorder should have standard outcome measures (clinical observations) and validated methods to observe and document severity at given points in time. For example, Parkinson's Disease has the UPDRS (Unified Parkinson's Disease Rating Scale). Diabetes has the Hemoglobin A1C and others. Once again we need universal resource identifiers (URI's) to facilitate linking all these data. 

What we have now is the ability to plot all of a subject's clinical observations in a clinical trial over time, i.e. the patient profile. But this lacks important information in how assessments were performed to link observations to disorders and diseases to assess changes to the disease over time. When one looks at the various Therapeutic Area User Guides each takes a different approach in documenting changes to medical conditions over time. This proposed single approach I think is applicable for all therapeutic areas. Once we have a clear, standard representation of the changes to a medical condition over time, then I think it will be easier to automate analyses that look at the effects of various interventions, including experimental interventions of course. 

As usual, I welcome your comments. 


Definition of Common Clinical Terms v2

Back in May I proposed some working definitions of common clinical terms. Since then additional thinking and feedback has led to some refinements. I'm reposting and cross-referencing the two versions. The changes are highlighted in bold and red. 

We all use these words in clinical medicine: observations, assessments, diagnosis, medical condition, adverse event, outcome measure, endpoint. But what do they really mean? The published definitions are all over the map, often imprecise and inconsistent. I have conducted informal polls of medical reviewers at FDA and, guess what, they mean different things to different people. These are highly educated, highly experienced people. The same problem exists in academia and industry. I see this in the wide variability in how these words are used in study protocols.

Standardizing the definitions of these common term has been a topic of interest to me. How can we automate the processing of clinical data if we can't all agree on definitions for these basic terms in clinical medicine? Using best practices for how to define things, I have come up with the following "working definitions" that I think are unambiguous and internally consistent...i.e. enabling humans and information systems to clearly distinguish them apart. I'd also like to think they are accurate in how they're used or should be used in clinical medicine. I present them here in no particular order other than some naturally flow from others.

1. Clinical Observation: a measure of the physical, physiological, or psychological state of a Person or individual. (Synonym=finding)

A Clinical Observation is ideally observed by a qualified individual, following a standard process, but without implying a cause. Many clinical observations simply reflect a normal physiological state. e.g. BP 120/80 mmHg.

1(a). Symptom: a Clinical Observation that can only be observed by the individual to whom the observation belongs (e.g. pain). Synonym: Subjective Observation

1(b). Sign: a Clinical Observation that can be observed by someone other than the individual (e.g. blood pressure). Synonym: Objective Observation.

Note: Signs can also be self-observed, for example, fingerstick glucose or blood pressure using an appropriate home monitoring device.

1(c). Outcome Measure: A Clinical Observation that is of interest for some research activity (e.g. clinical or epidemiological study). The outcome measure is intended to support one or more objectives in a research project.  e.g.: Hemoglobin A1C in a diabetes study.

1(d). Patient Reported Outcome (PRO) is an Outcome Measure that is also a Symptom.

2. Endpoint: A combination of 3 concepts: [1] one or more Outcome Measures, [2] a time element describing when the outcome measure is collected, [3] and an algorithm describing how the Outcome Measures are combined for analysis (optional). (Credit goes to Roomi Nusrat, M.D. for this one) Example: Percent change from baseline in HgbA1C measured at 12 weeks.

Note: I find Outcome Measure and Endpoint often used interchangeably. The former is a clinical concept (what's measured by the clinician), the latter is an analysis concept (what's plugged into a formula. Sometimes they are very close: e.g. Viral Load (outcome measure); Viral Load at 6 weeks (endpoint).

2(a). Composite Endpoint: an Endpoint with two or more distinct Outcome Measures.

3. Medical Condition: a disease, injury, disorder, or transient physiologic state that interferes or may interfere with well-being. A medical condition persists in time. 

Medical conditions are the target of medical interventions.  Medical conditions explain the presence of clinical observations. We often confuse a clinical observation (e.g. low serum sodium at a single point in time) with the medical condition that gives rise to it (e.g. hyponatremia). I write about this distinction in more detail in a previous post.

3(a). Adverse Event: An adverse Medical Condition that emerges or worsens following a Medical Intervention. Note: there is no presumption of causality. (Some medical conditions may not be considered adverse; see the Pregnancy discussion below.)

3(b). Adverse Reaction: An Adverse Event that is caused or worsened by a Medical Intervention. Here causality is presumed.

3(c). Treatment Emergent Adverse Event: An Adverse Event temporally associated with a specific Medical Intervention. It assumes that some algorithm or rule is defined to establish the temporal association.

3(d). Indication: A Medical Condition that is the target of a Medical Intervention; i.e. the reason the Medical Intervention is performed.

4. Medical Intervention: An activity intended to affect (e.g. treat, cure, prevent, diagnose, mitigate) a Medical Condition. e.g. Drug Administration, Surgery, Device implantation.

5. Assessment: An analysis of one or more Clinical Observations to characterize a Medical Condition.
Note: Sometimes assessment is used to mean the collection of a clinical observation. I would like to see us move away from this use as it is confusing. The clinical process is first observe or measure clinical observations then assess one or more clinical observations to identify/characterize medical conditions.

6. Diagnosis: this is an overloaded term in clinical medicine. It has two definitions depending on whether it's used to mean a process or a thing.

Diagnosis (the process): An Assessment to identify the presence of a Medical Condition.
Diagnosis (the thing): A Medical Condition identified for the first time via an Assessment.

So we can say: Q. What did the diagnosis (diagnostic assessment process) show? A: Adult Onset Diabetes Mellitus.  OR
Q. What is the diagnosis? A: Adult Onset Diabetes Mellitus.

Note: Because Diagnosis (the thing) is a Medical Condition, one can define a start/onset date as the date of the first Clinical Observation associated with the condition, as well as the diagnosis date, which is the date the diagnostic assessment was complete, i.e. the date the Medical Condition was first identified via an assessment. As we all know, these are often not the same.

One interesting question is can a Medical Condition be an Outcome Measure in a study? The way they are defined here, Outcome Measures are always Clinical Observations, not Medical Conditions. A stroke prevention study might define Stroke as the primary Outcome Measure, but is it really? A close inspection reveals that it's really the symptoms and signs of the stroke that are important (i.e. what we can observe/measure). Stroke is clinically very heterogenous and can present in many different ways, so we need to describe what Clinical Observations are most likely indicative of a stroke? (e.g. paralysis, numbness, visual loss, etc.) These, then, are the true Outcome Measures. So when we see a Medical Condition as an Outcome Measure, there needs to be an adjudication process (i.e. Assessment) to define the Clinical Observations that need to be measured and analyzed to assure the Medical Condition is present. 

Pregnancy deserves special mention. Is it a medical condition? The definition of medical condition has been expanded to include pregnancy by adding the language "transient physiologic state." I think most clinicians will agree that pregnancy is a medical condition because it benefits from medical intervention (e.g. prenatal care) to minimize complications to the mother and unborn child. Is pregnancy an adverse event in a trial? It depends on whether it's considered "adverse" as in unexpected and undesired. That is a judgment call made by the assessor. 

I welcome comments to refine these and make them more useful. I think the interesting part comes in converting these definitions into OWL representations, to enable computers to reason across the data. This remains a research interest of mine. Maybe I'll get into that in a future post.


Rethinking the Three CDISC General Observation Classes

*Please note this post has been updated to reflect updates to some clinical definitions referenced in this post. The links have been updated as well. These do not change the overall message and conclusions. 

The CDISC Study Data Tabulation Model (SDTM) is now on version 1.4 with future versions already in design to accommodate data for additional therapeutic area requirements. As these data from more and more therapeutic areas are standardized, I see an almost endless cycle of new variables, new domains, version upgrades and their associated implementation costs and challenges. I think it is worth exploring improvements to the model so that new requirements could be incorporated more easily, perhaps as easily as adding new terms to dictionaries and decreasing the need for changes to the model or for nonstandard variables. But what does that improved model look like? Here are some thoughts. I certainly welcome comments.

First let's look at where we are today. The SDTM has been quite consistent over time in defining three general observation classes in clinical studies: Interventions, Findings, and Events. Here is how they are described in SDTM 1.4:

  • The Interventions class ... captures investigational, therapeutic and other treatments that are administered to the subject (with some actual or expected physiological effect) either as specified by the study protocol (e.g., “exposure”), coincident with the study assessment period (e.g., “concomitant medications”), or other substances self-administered by the subject (such as alcohol, tobacco, or caffeine). 
  • The Events class ... captures planned protocol milestones such as randomization and study completion, and occurrences, conditions, or incidents independent of planned study evaluations occurring during the trial (e.g., adverse events) or prior to the trial (e.g., medical history). 
  • The Findings class ... captures the observations resulting from planned evaluations to address specific tests or questions such as laboratory tests, ECG testing, and questions listed on questionnaires. The Findings class also includes a sub-type “Findings About” which is used to record findings related to observations in the Interventions or Events class. 
It turns out these definitions do not completely align with the way clinicians generally think about observations. Furthermore, this categorization does not follow well-established conventions for documenting, storing, and using clinical data in practice. I think it is useful to re-examine these concepts, because I believe it leads to a better and more useful data model.

In another post, I discuss definitions of common clinical terms. Here are two I'd like to revisit. 

Clinical Observation: a measure of the physical, physiological, or psychological state of a Person or individual. 
    Medical Condition:  a disease, injury, disorder, or transient physiologic state that interferes or may interfere with well-being. A medical condition persists in time. 

    How do these definitions work? Health care processes focus on identifying Medical Conditions that afflict patients. Once the Medical Condition is identified, one can then determine how best to deal with it. Sadly, patients don't walk into a clinic or hospital with a sign on their forehead saying "I have Multiple Sclerosis." The clinician acts as a detective, documenting clues that can lead to the correct diagnosis. Those clues are clinical observations. The clues must be put together, like a jigsaw puzzle, to determine the most likely diagnosis (i.e. medical condition) that afflicts the patient. This in turn determines the plan (interventions) to make the patient better or keep them from getting sick. 

    This process gives rise to the clinical data lifecycle that in turn, and over many decades, is routinely documented in patient records. It goes by the mnemonic SOAP. Here are the SOAP components:

    Subjective observations - what are the observations that the patient reports?
      Objective observations - what are the observations that the clinician observes (which may include the use of tools, such as a BP cuff, ophthalmoscope, laboratory diagnostic device, imaging device)
        Assessment - what medical condition is mostly likely associated with the observations? What are important attributes of the medical condition, such as severity, measured at the time the assessment is made?
          Plan - how should the medical condition be treated? This usually involves some interventions (drug administration, surgery, device implantation etc.)

          If one applies the working definition of a clinical observation to the CDISC general observation classes, only the Findings class fits as a true clinical observation that would typically be documented in the "O" section of a SOAP note in a patient's record. 

          Let's now look at Interventions. The word Intervention comes from the verb to intervene, which is defined by Merriam Webster Dictionary as "to become involved in something ... in order to have an influence on what happens."As it is commonly used in health care, an intervention is some activity that intends to change or alter or affect in some way a Medical Condition. Usually the intent is to treat, but other purposes of Interventions can be to prevent, cure, diagnose, or mitigate. Examples of Interventions include a drug administration, surgery, or device implantation. 

          It turns out that observations and interventions are very similar from a process perspective. Both have a performer, are associated with some process for carrying out the activity, both may involve one or more devices, and both may involve collecting and analyzing a biospecimen. Observation and Intervention records therefore need to link to information about these other classes as needed. In fact an observation is a type of intervention because the observation wouldn't occur unless the observer takes some intervening action. From a modeling perspective, it makes sense to treat observations as interventions. They are distinguished by the purpose or intent of the intervention: affect/identify a disease vs. observing the state of an individual. 

          So what about CDISC Events? Except for certain administrative events in the SDTM, events are in fact Medical Conditions. Think of Medical History (MH) data: Hypertension, Diabetes Mellitus, Hypothyroidism. All medical conditions. Think of events that go in the Clinical Events (CE) domain or the Adverse Events (AE) domain, all Medical Conditions (or at least they should be. Sometimes in practice, an observation about an event is confused with the event itself. More on this later.)

          Medical Conditions all share common attributes: They persist in time, i.e. they have a start date and an end date (which is null if the condition is ongoing or its status is unknown). For practical reasons, the start date would be the date of the first clinical observation associated with the condition, although in reality the pathophysiology is usually well underway by the time the first symptom or sign appears. It also has a diagnosis date, which corresponds with the date the assessment was completed that first identified the medical condition. This can be much later than the start date. 

          Let's look at adverse events briefly. An AE is a medical condition that is temporally associated with an Intervention. It is identified after an assessment of one or more clinical observations. The assessor concludes that a medical condition is present. It is not an observation! In reality, the assessment is not often documented so many people don't think of it. When the onset of an AE is critical information, e.g. renal graft failure following transplant, then a formal adjudication process may exist to ensure the right clinical observations were collected and the correct diagnostic criteria were applied during the assessment to conclude that renal graft failure has occurred.

          In other cases, an observation about an AE is mistaken for the AE itself. Take the example of "hyponatremia." This is a clinical disorder characterized by low serum sodium, normal serum osmolality, and possibly a constellation of other clinical observations, such as change in mental status, seizures etc. Does a low serum sodium (observation) mean the patient has hyponatremia (the medical condition)? Maybe, maybe not. It requires an assessment by a qualified assessor to determine that association. Maybe the serum osmolality was abnormal; maybe it's lab error. One doesn't always need details of the assessment, but sometimes it's important.

          Our high level concept maps looks like this:

          Now we're starting to see a picture of what new and improved SDTM domains look like. These were discussed in a previous post
          1. Medical Condition domain...can contain everything you may want to know about the medical conditions that afflicts the subject, e.g. start date, link to the observation record considered the first sign/symptom of the condition, date of diagnosis (with a link to the assessment record that led to the diagnosis), severity, toxicity grade, seriousness, end date, etc. Because medical conditions commonly fluctuate in severity or toxicity over time, a severity or toxicity rating is typically the severity or toxicity measured at the time the assessment is made. It is a finding about the medical condition.   
          2. Assessment Domain: can contain everything you may want to know about an assessment, date of the assessment, assessor (link to qualifications), observations used for the assessment, link to diagnostic criteria used for the assessment, link to the rating scale used, link to the outcome of the assessment, i.e. medical condition record.
          3. New and improved Interventions domain containing everything you may want to know about the intervention: date performed, purpose of the intervention (observe; affect), performer (link to qualifications), device used (link to device info); procedure/process done (link to procedure info); biospecimen collected and/or analyzed (link to biosopecimen information), observation result (if observation). 
          4. Every medical condition is associated with a Plan (i.e. care plan). One can consider a Planned Interventions domain where each record is linked to the Medical Condition for which the planned intervention is intended. There should be an ability to link from the planned intervention to the actual intervention record(s). 
          There are lots of links across these new domains. These would be facilitated by having unique resource identifiers for each record in each new domain. 

          If enough attributes are present for each suggested new domain, I hypothesize that new clinical data requirements can be more easily met using this model. As a next step, I would like to create some domains based on dummy data and explore what adding new data requirements might look like.

          Thank you for reading and for your comments.