Thoughts on Medical Informatics: May 2016

This post has been superseded. Please refer to the updated post on the same topic.

We all use these words in clinical medicine: observations, assessments, diagnosis, medical condition, adverse event, outcome measure, endpoint. But what do they really mean? The published definitions are all over the map, often imprecise and inconsistent. I have conducted informal polls of medical reviewers at FDA and, guess what, they mean different things to different people. These are highly educated, highly experienced people. The same problem exists in academia and industry. I see this in the wide variability in how these words are used in study protocols.

Standardizing the definitions of these common term has been a topic of interest to me. How can we automate the processing of clinical data if we can't all agree on definitions for these basic terms in clinical medicine? Using best practices for how to define things, I have come up with the following "working definitions" that I think are unambiguous and internally consistent...i.e. enables humans and information systems to clearly distinguish them apart. I'd also like to think they are accurate in how they're used or should be used in clinical medicine. I present them here in no particular order other than some naturally flow from others.

1. Clinical Observation: a measure of the physical, physiological, or psychological state of an individual.

Note: A Clinical Observation is ideally observed by a qualified individual, following a standard process, but without implying a cause. Many clinical observations simply reflect a normal physiological state. e.g. BP 120/80 mmHg.

1(a). Symptom: a Clinical Observation that can only be observed by the patient (e.g. pain). Synonym: Subjective Observation

1(b). Sign: a Clinical Observation that can be observed by someone other than the patient (e.g. blood pressure). Synonym: Objective Observation.

Note: Signs can also be self-observed, for example, fingerstick glucose or blood pressure using an appropriate home monitoring device.

1(c). Outcome Measure: A Clinical Observation that is of interest for some research activity (e.g. clinical or epidemiological study). The outcome measure is intended to support one or more objectives in a research project. e.g.: Hemoglobin A1C in a diabetes study.

1(d). Patient Reported Outcome (PRO) is an Outcome Measure that is also a Symptom.

2. Endpoint: A combination of 3 concepts: [1] one or more Outcome Measures, [2] a time element describing when the outcome measure is collected, [3] and an algorithm describing how the Outcome Measures are combined for analysis (optional). (Credit goes to Roomi Nusrat, M.D. for this one) Example: Percent change from baseline in HgbA1C measured at 12 weeks.

Note: I find Outcome Measure and Endpoint often used interchangeably. Sometimes they are very close: e.g. Viral Load (outcome measure); Viral Load at 6 weeks (endpoint).

2(a). Composite Endpoint: an Endpoint with two or more distinct Outcome Measures.

3. Medical Condition: a disease, injury, or disorder that interferes with well-being. It is associated with a pathophysiology. It is also associated with a duration (i.e. an event).

Note: Medical conditions are the target of medical interventions (one notable exception is Pregnancy; though not a disorder it is the target of prenatal care to help prevent pregnancy-related complications). Medical conditions explain the presence of abnormal clinical observations. We often confuse a clinical observation (e.g. low serum sodium at a single point in time) with the medical condition that gives rise to it (e.g. hyponatremia). I write about this distinction in more detail in a previous post.

3(a). Adverse Event: A Medical Condition that emerges or worsens following a Medical Intervention. Note: there is no presumption of causality.

3(b). Adverse Reaction: An Adverse Event that is caused or worsened by a Medical Intervention. Here causality is presumed.

3(c). Treatment Emergent Adverse Event: [I'm putting this here as a placeholder as I'm still looking for a good definition. I think the key features of a TE AE is that it is an Adverse Event associated with a specific Medical Intervention, and some algorithm is defined to establish the temporal association. I welcome suggestions].

3(d). Indication: A Medical Condition that is the target of a Medical Intervention; i.e. the reason the Medical Intervention is performed.

4. Medical Intervention: An activity intended to affect (e.g. treat, cure, prevent, diagnose, mitigate) a Medical Condition. e.g. Drug Administration, Surgery, Device implantation.

5. Assessment: An analysis of one or more Clinical Observations to characterize a Medical Condition.
Note: Sometimes assessment is used to mean the collection of a clinical observation. I would like to see us move away from this use as it is confusing. The clinical process is first observe or measure clinical observations then assess one or more clinical observations to identify/characterize medical conditions.

6. Diagnosis: this is an overloaded term in clinical medicine. It has two definitions depending on whether it's used to mean a process or a thing.

Diagnosis (the process): An Assessment to identify the presence of a Medical Condition.
Diagnosis (the thing): A Medical Condition identified for the first time via an Assessment.

So we can say: Q. What did the diagnosis (diagnostic assessment process) show? A: Adult Onset Diabetes Mellitus. OR
Q. What is the diagnosis? A: Adult Onset Diabetes Mellitus.

Note: Because Diagnosis (the thing) is a Medical Condition, one can define a start/onset date as the date of the first Clinical Observation associated with the condition, as well as the diagnosis date, which is the date the diagnostic assessment was complete, i.e. the date the Medical Condition was first identified via an assessment. As we all know, these are often not the same.

One interesting question is can a Medical Condition be an Outcome Measure in a study? The way they are defined here, Outcome Measures are always Clinical Observations, not Medical Conditions. A stroke prevention study might define Stroke as the primary Outcome Measure, but is it really? A close inspection reveals that it's really the symptoms and signs of the stroke that are important (i.e. what we can observe/measure). Stroke is clinically very heterogenous and can present in many different ways, so we need to describe what Clinical Observations are most likely indicative of a stroke? (e.g. paralysis, numbness, visual loss, etc.) These, then, are the true Outcome Measures. So when we see a Medical Condition as an Outcome Measure, there needs to be an adjudication process (i.e. Assessment) to define the Clinical Observations that need to be measured and analyzed to assure the Medical Condition is present.

I welcome comments to refine these and make them more useful. I think the interesting part comes in converting these definitions into OWL representations, to enable computers to reason across the data. This remains a research interest of mine. Maybe I'll get into that in a future post.

This post looks at some ways of improving the Study Data Tabulation Model (SDTM). Why? The Therapeutic Area (TA) standardization initiative has shown that new domains and variables are needed to standardize TA-specific clinical data, yet new domains and variables pose significant implementation challenges. An almost endless cycle of new domains and variables seem inevitable as more and more TAs are tackled. We need to step back and critically look at changes to make implementation easier and more consistent across studies and sponsors. Others are expressing the need to slow down and make some corrections (see this post by W. Kubick).

SDTM in its current state is too brittle. It is not flexible enough to accommodate new clinical data requirements efficiently. I have suggested in a previous post that we need a new exchange standard, one that is based on a highly relational information model for study data. A more robust exchange standard can handle new clinical data requirements more easily, often with a simple additions of new terms to a standard terminology. However, getting there won't be easy. The suggestions here describe a possible interim solution. Some of the changes are minor. Others are quite major and likely to be controversial. I don't know if this is the right approach, but it makes sense to me as a consumer of clinical data for over 25 years. I believe the status quo is not sustainable. We need to do something.

In another post, I described the "clinical data lifecycle" that reflects how clinical data are generated and used in clinical practice, and how they are documented in a patient's medical record. I described the 4 parts of the traditional SOAP note of a clinical encounter, analogous to a subject visit in a clinical trial:

Subjective Observations
Objective Observations
Assessment
Plan

These describe the major categories of data for a new and improved SDTM. Let's examine each in some detail.

Clinical observations provide a snapshot in time of the physical, physiological, or psychological state of an individual. Subjective observations are those that only the individual can make (e.g. pain). Objective observations can be made by others. By themselves, observations are not attributed to any one specific cause or underlying medical condition. Many, in fact, simply measure the normal physiological state. Observations must undergo an assessment to identify and characterize one or more medical conditions that best explain the observations. The medical condition(s) form the basis for the care plan and its execution. The care plan is designed to address (e.g. treat, cure) the medical condition(s). When the care plan involves an experimental intervention as part of clinical research, then this forms the basis of a study protocol. This simple model for clinical data forms a continuous feedback loop with various stopping rules (not shown here).

But exactly how do we leverage this model to improve the SDTM? The goal is to make SDTM more stable and flexible, i.e. enable it to incorporate new clinical data content requirements more easily and (hopefully) greatly lower the need for new variables and new domains. This will make implementation easier and SDTM datasets more useful. By its very nature, this is a high level discussion that only scratches the surface of the changes that could be made. I admit that the devil is in the details.

We start with a proposal to improve cross-references.

1. SDTM should incorporate unique identifiers for each record in each domain.

Clinical data are richly inter-related and unique record identifiers simplify the ability to reference other relevant records. The --SEQ variable is currently used for this purpose but by itself is not unique. (Ideally, the identifier is globally unique, so that a record in one study can reference a record in another study. This brings us closer to creating a web of clinical data, which is an important component of the semantic web. But I digress.)

We need to refine the definition of observations. SDTM describes observations as findings, interventions, and events, and each corresponds to an individual row in a tabulation. If we agree that a clinical observation is a measure of the physical, physiological, or psychological state of an individual, interventions and events are not observations. Clinical events are instead medical conditions that explain the clinical observations, not the observations themselves. For example, a low serum sodium measured via a lab test (observation) may indicate, after a proper assessment, the presence of hyponatremia due to a Syndrome of Inappropriate Anti-Diuretic Hormone (SIADH) secretion (medical condition). Medical conditions are identified and characterized through assessments of clinical observations. I discuss the importance of distinguishing clinical observations and medical conditions in another post. So what about interventions? In the purest sense, all clinical observations are "interventions," since the act of observing is always associated with a process, protocol, or procedure. This process of observing "intervenes" in the subject's normal set of activities and can therefore be considered an "intervention" in the strictest sense. However, we generally limit the use of the term medical intervention to an activity whose intent is to affect in some way (e.g. treat) a medical condition. Therefore, the important information classes to support the clinical data lifecycle are similar to what we have now: observations (i.e. findings in SDTM), interventions, and medical conditions (types of events), but they are defined somewhat differently and than the current SDTM. The next needed change is:

2. SDTM should add an assessment domain to capture important assessment information.

We've discussed that clinical observations serve as input to an assessment. Each assessment record should link to the clinical observations that serve as input, and the medical condition(s) that serve as output. Practically speaking, this is important because sometimes one needs information about that assessment to ensure the assessment is reliable, i.e. performed correctly and without bias. Who made the assessment? What are their qualifications? What observations were used in the assessment? Were the observations the appropriate ones? Are there any important observations that are missing and should have been done? When was the assessment performed? What process was followed (e.g. was there a formal adjudication process)? What criteria (e.g. diagnostic criteria, rating scale) were used? Currently there is no single, systematic way in the SDTM to capture the information associated with assessments. The end result is a patchwork of variables (many of them custom variables that wind up in SUPPQUAL's) that try to fill this gap. There is a recognized need to document adjudication data and this approach I think does that. As an aside, adverse events are adverse medical conditions that emerge or worsen following a medical intervention. These are identified by an assessment (often not documented) of relevant observations. Currently we routinely confuse an adverse event and the observations that support the presence of an adverse event. This can lead to erroneous interpretations of clinical observations.

The next suggested change is ....

3. SDTM needs a single approach to describe clinical observations.

This is a big one, because newly-identified clinical observations for therapeutic areas are triggering an ever increasing number of new variables and domains and this is not sustainable. Each observation is recorded ideally without bias (and standard processes or protocols may exist to minimize bias) and without interpretation by the observer regarding its cause. The standard components of a clinical observation are well known and they define the metadata needed for each observation. These can help guide changes to the generic findings domain in the SDTM (the model, not the I.G.). For example, is the observation planned or protocol-specified? Is there a documented process or method for making the observation (and a link to that process). Was a device used (link to information about the device), was a biospecimen collected and analyzed? (link to biospecimen information). Observations are often grouped or nested, so grouping and nesting information is needed. Regarding terminology for clinical observations, the Logical Observation Identifier Names and Codes (LOINC) should be adopted as the single terminology to describe a clinical observation. The LOINC was developed for this purpose. The lab portion of the LOINC is quite robust, but the clinical portion will likely need to grow over time to accommodate the range of clinical observations needed for clinical research. This means creating a robust process to register new clinical observations with the LOINC as new codes are needed. Using the LOINC will also help harmonize clinical data used for research with clinical data used for other purposes, since LOINC is used widely in health care. Ideally there is a single clinical observation domain containing sufficient metadata to allow subsetting in multiple ways for analysis purposes (e.g. lab, EKG, CT, MRI, etc). Clearly size limitations of an observations domain quickly become an issue, but these should diminish as data are routinely loaded and stored in databases prior to use, which can then deliver observation data to the analyst in manageable chunks. In the meantime, continued splitting into multiple clinical observation domains may be necessary, but only as necessary to meet data exchange and data management requirements.

4. SDTM should have a single approach to describe all Medical Conditions.

By medical condition I mean a disease, injury, or disorder that interferes with well-being. A medical condition is associated with an underlying pathophysiological process. It has a beginning and an end. Medical conditions are the target of medical interventions, whereas clinical observations are not. (Pregnancy is a notable exception in that it is not a disorder but is the target of medical intervention, i.e. prenatal care, to minimize complications.) The medical conditions domain contains past and current medical conditions including medical conditions that emerge during the study. Important information about each medical condition include the following (it is worth noting that not all this information will be known about every medical condition, but there should exist a standard way of describing it if it is available). Date of first symptom/sign (onset date), date of diagnosis (date the medical condition was first identified via an assessment), a link to the assessment record if one exists, a link to the clinical observations that were used for the assessment, date of resolution (end date, which for ongoing medical conditions will be null). Date of reporting (cutoff date). Severity at the time of the last assessment, change in severity since the previous assessment. Is the condition ongoing at the time of reporting? Is the condition the reason for the study (the indication). Is the medical condition considered an adverse event, i.e. any adverse medical condition that emerges or worsens following a medical intervention? Is there a plan to address/treat the medical condition (link to the plan). In the case of the indication, the plan is documented in the protocol. The plan links to the clinical observations and medical interventions performed following the execution of plan. From the medical conditions domain, medical history, adverse events, and clinical events domains can be derived using standard algorithms.

5. SDTM needs a single approach to describe planned medical interventions.

This includes a link to the medical condition that triggers the intervention, the reason for the intervention (treat, cure, mitigate, diagnose, prevent), the type of intervention (drug administration, device, surgery, etc.) and a link to the planned clinical observations to assess the effect of the intervention on the medical condition.

Following its execution, the outcome of the care plan is documented as additional clinical observations that lead to additional assessments in a continuous loop. The cycle ends according to stopping rules. Reasons for stopping include [1] death, [2] resolution of the medical condition, [3] arbitrarily (protocol specified).

Some of these suggestions constitute major changes. But by organizing the data more closely with the way they are generated and used in clinical practice, the data become more useful and the ability to incorporate new clinical data content is improved without needing frequent upgrades to the model. Adding new clinical content should be as simple as adding new controlled terms to a dictionary. This is critically needed as clinical data content requirements continue to expand with no end in sight.

I appreciate your comments.

Thoughts on Medical Informatics

2016-05-23

Definitions of Common Clinical Terms

2016-05-18

Improving the Study Data Tabulation Model