Thoughts on Medical Informatics

2015-08-05

Modeling Clinical Data

There is a great deal of interest in how to model clinical data for numerous use cases. I'm very interested in clinical data modeling. I start by saying that I am not a modeler. I have worked with modelers for many years so I understand a little bit about how modeling works. I'm still learning.

The purpose of this post is to review how clinical data are generated and used, to help inform best clinical modeling practices and facilitate the management and use of clinical data across multiple use cases, such as patient care, clinical research, and public health.

Any medical student will agree: a major component of the standard medical school curriculum is devoted to understanding, organizing, documenting, and using clinical data. Clinical data are grouped by dates/patient encounters. The well-known mnemonic every medical student learns to document those encounters is SOAP:

Subjective Observations
Objective Observations
Assessment
Plan (and its execution)

These 4 steps are then repeated for the next encounter in an almost endless cycle. One can imagine the stopping rules, but I won't go over them here. This cycle describes what I call the “clinical data lifecyle.” The healthcare provider first collects subjective and objective observations on the patient; the provider then analyzes, interprets, or assesses the observations. This assessment identifies of one or more medical conditions and their important attributes (e.g. severity, change from last assessment). The provider then develops a plan to address the medical conditions, which is then executed. The cycle is then repeated for the next encounter to determine the effect of the plan on the medical condition, and to identify any new conditions if necessary.

The Clinical Data Lifecycle

From an information management perspective, one can state the following.

Observations

Observations are collected and recorded, ideally without interpretation or bias
The target of the observation is the patient or a part of the patient (e.g. biospecimen)
The observer can be a person (investigator, patient, caregiver) or a device (EKG, MRI)
Sometimes the observer uses a device to perform an observation (e.g. blood pressure)
Sometimes information about the observer is important (expertise, reliability, etc.)
Sometimes information about the device is important (e.g. accuracy, precision, blood pressure cuff size)
The observer always follows a procedure, process, or “protocol” associated with the observation
Sometimes the procedure is not important to document (e.g. visually examine the skin)
Sometimes information about the procedure is important (e.g. rules for measuring tumor size on a medical image)
A more formal “protocol” to conduct the observation is sometimes developed to decrease variability and minimize bias
Sometimes a biospecimen is collected and/or processed
Sometimes information about the biospecimen is important (e.g. hemolyzed blood sample)

Assessments

Observations are analyzed/interpreted/assessed by qualified entities
Often the assessor is a physician, but not always
Sometimes information about the entity is important (qualifications, etc.)
Entity could be a device (EKG machines can perform automated interpretations)
The results of the assessments are generally medical conditions and their properties (severity, change from previous assessment, etc.)
The assessment may consist of a formal adjudication process
Currently, we sometimes confuse observations and assessments
Adverse events (AEs) are not observations, rather they are medical conditions identified following an assessment of observations. A temporal association with an intervention is a necessary component of an AE
An abnormal laboratory finding is not an adverse event; an assessment is necessary to make that determination; often it involves looking at other related observations
As an example, the observations might be: low serum sodium; previous exposure to drug X; serum osmolality, urine specific gravity, urine sodium.
The assessment would be a new Medical Condition (an AE) – Syndrome of Inappropriate Anti-diuretic Hormone Secretion (SIADH), possibly due to drug X

Plan

In healthcare, this is the patient care plan
In investigational studies, this is the protocol
The plan is intended to affect in some way the medical conditions identified from the assessment
Often it involves the administration of a medical product
Often the intent is to treat, but can be prevent, diagnose, mitigate, cure, or even induce medication conditions

The cycle now repeats. Additional observations are then collected to assess:

How well the plan was executed and
The effect of the executed plan on the Medical Conditions

The following mind map captures these concepts and relationships. I think they hold true for all patient care settings, including clinical trials. I suggest it form a core “backbone” of any information model involving clinical data about a patient. In my experience, this captures how clinical data are generated and used in practice.

Clinical Data Concept Map

I recently reviewed the BRIDG 4.0 model, which was balloted in HL7 this past May. I plan to discuss BRIDG in more detail in future posts, but suffice it to say that I identified deficiencies in how clinical data are modeled in BRIDG. For example, the results of assessments are modeled as additional observations, which in clinical medicine are quite distinct. Assessment results lead to interventions in a care plan; observations do not. There is an ongoing ballot reconciliation process to address these and other concerns identified during the ballot. I encourage those interested to participate in those calls. We particularly need individuals with clinical subject matter expertise. Details of the teleconferences are available on the HL7 website.

2015-08-04

Clinical Research and Healthcare Information Silos

As a Neurologist, I was a healthcare provider for over 25 years. As a medical officer at the Food and Drug Administration, I reviewed clinical trials and was involved in the standardization of clinical trial data for almost 20 years. These two roles gave me a unique perspective of the relationship between clinical research and healthcare.

Many view clinical research and healthcare as two separate enterprises. There are numerous reasons one might reach this conclusion. I believe this view is fundamentally incorrect. Clinical research is part of healthcare. I consider it similar to another medical specialty. I’d like to examine this point further, and explore its implications.

Healthcare has very consistent, repeatable processes. From a patient’s perspective, one goes to the hospital or clinic to receive medical care. The healthcare provider takes a history and physical, may order tests, renders a diagnosis, develops a care plan, administers treatment, and collects additional clinical data to measure the effects of the treatment. Occasionally, a patient may be advised or elects to enroll in a clinical trial. These fundamental high level processes remain the same. The care plan is the study protocol, which functions as a detailed care plan for investigational subjects. The clinical investigator functions as another medical specialist (a ‘clinical trial’ specialist) within the healthcare enterprise. The important difference, of course, is that the patient consents to have their clinical data used to inform a population-based risk/benefit analysis of the investigational intervention.

I often hear “but clinical research is different. It has different, often more detailed, data requirements.” My reply is this is no different than the varying clinical data requirements that exist among the numerous clinical specialties. An ophthalmologist has different, more detailed data requirements pertaining to ocular health than does, for example, the family physician. The clinical investigator, as another specialist, is no different. In all use cases, the clinical data that are collected are fundamentally the same. The meaning or semantics of the data are identical. A blood pressure, hemoglobin measurement, electrocardiogram, or medical imaging report mean the same regardless of the main reason for its collection.

From a physician’s perspective, the data collected for a clinical trial is just as relevant for the patient’s care as clinical data collected outside a trial setting. The current reality is that healthcare and clinical research are silos. The clinical trial data are often physically separate from other healthcare data. As a physician, I want access to all the clinical data for a patient, regardless of why it was collected, to provide the best possible care (recognizing that some data may initially be blinded per protocol).

Consider this scenario: a patient is enrolled in a clinical trial and has a protocol-specified complete blood count (CBC) drawn on Thursday. The following Saturday, the patient arrives in the emergency room (ER) with acute gastrointestinal bleeding. The ER physician needs access to the CBC results done two days previously to help assess the severity of the bleeding. She also needs as much information as possible about the experimental treatment the patient is on (or may be on, in the case of a blinded study) to assess the potential effects of the investigational treatment on the bleeding episode. The clinical investigator, on the other hand, needs information about the ER visit to fully assess potential causality from the investigational treatment. These are all relevant clinical questions that can only be answered with complete integration of the clinical data from the investigational study with the patient’s health record.

As electronic health records (EHRs) are deployed nationwide for every American, all clinical data about a patient, regardless of why it’s collected, should be accessible in the patient’s EHR. This way, all clinical information is available to providers to help make the best informed medical decisions for the patient.

This gets me to my main point. All clinical data exchanged about a patient should adhere to a single set of clinical data standards, particularly data standards supported by EHR systems, to achieve full and useful integration of all clinical data. In the U.S., the Department of Health and Human Services Office of the National Coordinator (ONC) for Health I.T. and the Health Information Technology Standards Committee (HITSC) establishes nationwide interoperability standards for EHR systems.

There are many ongoing clinical data standardization efforts; some duplicative and competing. I believe now is the time for all implementers of standardized clinical data in the U.S. to converge on HHS standards. This will maximize the reuse of all clinical data about a patient. This is in the best interests of healthcare, clinical research, and public health. Not doing so will only perpetuate and worsen the existing silos of information, making clinical data less useful. This is detrimental to patients everywhere.

2015-08-02

The Future of Study Data Exchange

The CDISC Study Data Tabulation Model (SDTM) is the FDA-supported exchange standard for subject level clinical trials tabulation data. I have been working with the SDTM since it was first developed by CDISC. I think I know it pretty well.

It has served the Agency well over the years but its limitations are becoming increasingly problematic. No data standard is perfect but SDTM has some serious limitations. One of the biggest problems with the SDTM is that it is used as both an exchange standard and an analysis standard. As I elaborate below, the requirements for the two use cases are different and often competing. The result is the SDTM is pulled in two opposite directions and cannot do both optimally.

What do I mean by an exchange standard? A reasonable working definition is a standard way of exchanging data between computer systems. An exchange standard often describes standard data elements, and relationships necessary to achieve the unambiguous exchange of information between different information systems.

Exchange standards exist to support interoperability. The HIMSS (Healthcare Information Management Systems Society) defines interoperability as:

The ability of different information technology systems and software application to communicate, exchange data, and use the information that has been exchanged.

An analysis standard describes a standard presentation or view of the data to support analysis. It includes extraction, transformation, and derivations of the exchanged (i.e. submitted) data.

A file format (e.g. sas transport file, XML, MS Word, pdf) is not an exchange standard. Data based on an exchange standard need to be serialized using a file format, e.g. SDTM+XPT, Structured Product Labeling(SPL)+XML. Consolidated CDA + XML.

A good exchange standard promotes machine readability and process automation at the expense of end user (human) readability. The data is transformed to make it user friendly. A good analysis standard promotes human readability/usability and use of the data using analysis tools.

A useful analogy is to consider data as pieces of furniture. The exchange standard describes the standard container to move the furniture between two places. The size and shape of the container are dictated by what one is moving; the contents. The container is designed for efficiency in moving furniture and to avoid damage/loss. Everything you need is there; but unpacking is necessary for the furniture to be useful. The analysis standard describes the arrangement of the furnishings in the house. One may have to assemble certain pieces of furniture before they can be used. The arrangement maximizes the use of the furnishings (e.g. all cooking appliances go in the kitchen).

Here is a high level comparison of the requirements for each type of standard. One can see the differences. The challenges of meeting both are clear.

Exchange Standard	Analysis Standard
No duplication of data/records (minimize errors, avoid inconsistencies, minimize file size)	Duplicate data (e.g. demographics and treatment assignment in each dataset)
No derived data (minimize errors, avoid data inconsistencies, promote data integrity)	Derived data (facilitate analysis)
Very standard structure (highly normalized, facilitate data validation and storage)	Variable structure to fit the analysis
Numeric codes for coded terms (machine readable, facilitate data validation)	No numeric codes (human understandable)
Multi-dimensional (to facilitate creation of relational databases and reports of interest)	Two-dimensional (tabular) containing only relationships of interest for specific analysis (easier for analysis tools)
Standard character date fields (ISO 8601: ensures all information systems can understand them: semantic interoperability)	Numeric date fields (facilitates date calculations, intervals, time-to-event analyses)

As an exchange standard, SDTM is used to exchange data between the applicant and FDA. In CDER, it is also used to load study data in the Janus Clinical Trials Repository (CTR). As an analysis standard, SDTM is used to perform standard basic analyses of the data (demographics, adverse events, etc.) and to explore the data. CDER implements certain standard analyses as part of the CTR environment).

Because of the differing and competing requirements of data exchange and analysis, SDTM is being pulled in two directions and cannot perform either role optimally. We must choose which one is more important. (Already the FDA recognizes that standard SDTM datasets do not support standard analyses optimally and has developed modifications or “enhanced” SDTM views to improve support for the analysis use case.)

I believe there is an increasing view that for data exchange we need to move to a more modern, relational exchange standard for clinical trial data; one that is based on a more robust information model, e.g. BRIDG+XML; FHIR; or semantic web standards like RDF/OWL (information modeling is out of scope for today’s post but I plan to discuss this topic in the future). As one small example, it’s unnecessary that STUDYID should be repeated thousands of times in a study data submission, once for each record in a domain. The STUDYID should be provided once and referenced as needed. A more significant example is that SDTM lacks relationships between planned and performed observations, making it challenging to analyze the data for protocol compliance and violations (note: SDTM provides a work-around, the PV domain, but this is not ideal.)

So what do I think is the future of the SDTM?

SDTM as an exchange standard should be retired, and replaced by a “next generation” exchange standard described above. We need a broad conversation with multiple stakeholders to discuss what this should be. FDA can play a leadership role in that conversation, as it started to do by holding a public meeting on this topic in 2012.

SDTM as an analysis standard should remain and expand to make it more analysis friendly (i.e. follow the direction that the “enhanced SDTM” has already forged) and eventually merge with ADaM (the CDISC AnalysisData Model; there are already interesting discussions on how that might work). In the future, SDTM should be a standard report for analysis that a database (e.g. CTR) produces.

Once liberated from its role as an exchange standard, future SDTM versions can become even more useful: e.g. add core demographic and treatment variables to all domains; provide numeric dates. These are some of the changes that FDA is already implementing in its “enhanced SDTM” specification for internal use.

So for the future of SDTM, I see two options.

Option A. This is basically what we are doing today. I call this the bicycle approach.

Incremental improvements to SDTM
Add additional variables
Add additional domains
Update implementation guide
Update validation rules
Slow adoption by sponsors & FDA
Inconsistent implementations
Redesign databases, tools as needed
Repeat the above for the next set of requirements

This is slow and inefficient; like riding a bicycle from Washington DC to California. You can do it, but it is slow. There is a better way.

Option B. I call this the race car approach.

Adopt the next generation standard for study data exchange
BRIDG-based XML, FHIR, RDF/OWL?
Based on a well-structured information model: A better model that more realistically reflects clinical trial data and how they are related to each other and to the protocol; with a single standard representation for clinical observations
Incorporating new data and analysis requirements will be quicker, cheaper, more efficient both for sponsors and for FDA
Often involves just adding new controlled terms to a terminology
underlying data model, implementation guides, database structure, conformance validation rules, tools need not change (when they do, they should be infrequent, minor)
Invest in tools to transform the data any which way
Generate any data view of interest from the enterprise data warehouse: e.g. enhanced-SDTM, ADaM, future analysis views X,Y,Z

Having said this, we must recognize the tremendous amount of resources being devoted today towards implementing the SDTM as an exchange standard. Any transition strategy must take into account the natural lifecycle of systems and processes being used to support current operations. A new exchange standard needs to be introduced gradually, with a sufficient overlap period to minimize disruption. Any content standards (e.g. standard data elements, terminology) developed and implemented during Option A are readily reusable in Option B. They are not lost.

In conclusion, SDTM cannot fulfill both data exchange and data analysis roles optimally. SDTM as an exchange standard should be retired. Short term, it is too disruptive to replace SDTM as an exchange standard at this time. The exchange standard use case for SDTM must continue to be supported for now. Long-term, once liberated from its role as an exchange standard, SDTM’s future as an analysis standard is very bright. Long term planning, including a sensible transition strategy, should move towards a next generation exchange standard for study data.