The CDISC Study Data Tabulation Model (SDTM) is the FDA-supported exchange standard for subject level clinical trials tabulation data. I
have been working with the SDTM since it was first developed by CDISC. I think
I know it pretty well.
It has served the Agency well over the years but its
limitations are becoming increasingly problematic. No data standard is perfect but
SDTM has some serious limitations. One of the biggest problems with the SDTM is that it is used
as both an exchange standard and an analysis standard. As I elaborate below, the
requirements for the two use cases are different and often competing. The result is the
SDTM is pulled in two opposite directions and cannot do both optimally.
What do I mean by an exchange standard? A reasonable working
definition is a standard way of exchanging data between
computer systems. An exchange standard often describes standard data elements,
and relationships necessary to achieve the unambiguous exchange of information
between different information systems.
Exchange standards exist to support interoperability. The
HIMSS (Healthcare Information Management Systems Society) defines interoperability as:
The ability of
different information technology systems and software application to
communicate, exchange data, and use the information that has been exchanged.
An analysis standard describes a standard presentation or
view of the data to support analysis. It includes extraction,
transformation, and derivations of the exchanged (i.e. submitted) data.
A file format (e.g.
sas transport file, XML, MS Word, pdf) is not an exchange standard. Data based on an exchange standard need to be serialized using a file format, e.g. SDTM+XPT, Structured Product Labeling(SPL)+XML. Consolidated CDA + XML.
A good exchange standard promotes machine readability and process automation at the expense of end user (human) readability. The data is transformed to make it user friendly. A good analysis
standard promotes human readability/usability and use of the data using
analysis tools.
A useful analogy is to consider data as pieces of furniture.
The exchange standard describes the standard container to move the furniture
between two places. The size and shape of the container are dictated by what one
is moving; the contents. The container is designed for efficiency in moving
furniture and to avoid damage/loss. Everything you need is there; but unpacking
is necessary for the furniture to be useful. The analysis standard describes the
arrangement of the furnishings in the house. One may have to assemble certain
pieces of furniture before they can be used. The arrangement maximizes the use
of the furnishings (e.g. all cooking appliances go in the kitchen).
Here is a high level comparison of the requirements for each
type of standard. One can see the differences. The challenges of meeting both are clear.
Exchange
Standard
|
Analysis
Standard
|
No duplication of data/records (minimize errors, avoid
inconsistencies, minimize file size)
|
Duplicate data (e.g. demographics and treatment
assignment in each dataset)
|
No derived data (minimize errors, avoid data
inconsistencies, promote data integrity)
|
Derived data (facilitate analysis)
|
Very standard structure (highly normalized,
facilitate data validation and storage)
|
Variable structure to fit the analysis
|
Numeric codes for coded terms (machine readable,
facilitate data validation)
|
No numeric codes (human understandable)
|
Multi-dimensional (to facilitate creation of
relational databases and reports of interest)
|
Two-dimensional
(tabular) containing only relationships of interest for specific
analysis (easier for analysis tools)
|
Standard character date fields (ISO 8601: ensures
all information systems can understand them: semantic interoperability)
|
Numeric date fields (facilitates date calculations,
intervals, time-to-event analyses)
|
As an exchange standard, SDTM is used to exchange data between
the applicant and FDA. In CDER, it is also used to load study data in the Janus
Clinical Trials Repository (CTR). As an analysis standard, SDTM is used to perform
standard basic analyses of the data (demographics, adverse events, etc.) and to explore the data. CDER
implements certain standard analyses as part of the CTR environment).
Because of the differing and competing requirements of data
exchange and analysis, SDTM is being pulled in two directions and cannot
perform either role optimally. We must choose which one is more important.
(Already the FDA recognizes that standard SDTM datasets do not support standard
analyses optimally and has developed modifications or “enhanced” SDTM views to improve support for the analysis use case.)
I believe there is an increasing view that for data exchange we need to move to a more
modern, relational exchange standard for clinical trial data; one that is based
on a more robust information model, e.g. BRIDG+XML; FHIR; or semantic web standards like RDF/OWL
(information modeling is out of scope for today’s post but I plan to discuss
this topic in the future). As one small example, it’s unnecessary that STUDYID should be repeated thousands of times in a study data submission,
once for each record in a domain. The STUDYID should be provided once and
referenced as needed. A more significant example is that SDTM lacks relationships between planned and performed observations, making it challenging to analyze the data for protocol compliance and violations (note: SDTM provides a work-around, the PV domain, but this is not ideal.)
So what do I think is the future of the SDTM?
SDTM as an exchange standard should be retired, and replaced
by a “next generation” exchange standard described above. We need a broad conversation with multiple stakeholders to discuss what this should be. FDA can
play a leadership role in that conversation, as it started to do by holding a
public meeting on this topic in 2012.
SDTM as an analysis standard should remain and expand to
make it more analysis friendly (i.e. follow the direction that the “enhanced
SDTM” has already forged) and eventually merge with ADaM (the CDISC AnalysisData Model; there are already interesting discussions on how that might work). In the future, SDTM should be a standard report
for analysis that a database (e.g. CTR) produces.
Once liberated from its role as an exchange standard, future
SDTM versions can become even more useful: e.g. add core demographic and
treatment variables to all domains; provide numeric dates. These are some of the
changes that FDA is already implementing in its “enhanced SDTM” specification
for internal use.
So for the future of SDTM, I see two options.
Option A. This is basically what we are doing today. I call this the bicycle
approach.
- Incremental improvements to SDTM
- Add additional variables
- Add additional domains
- Update implementation guide
- Update validation rules
- Slow adoption by sponsors & FDA
- Inconsistent implementations
- Redesign databases, tools as needed
- Repeat the above for the next set of requirements
This is slow and inefficient; like
riding a bicycle from Washington DC to California. You can do it, but it is
slow. There is a better way.
Option B. I call this the race car approach.
- Adopt the next generation standard for study data exchange
- BRIDG-based XML, FHIR, RDF/OWL?
- Based on a well-structured information model: A better model that more realistically reflects clinical trial data and how they are related to each other and to the protocol; with a single standard representation for clinical observations
- Incorporating new data and analysis requirements will be quicker, cheaper, more efficient both for sponsors and for FDA
- Often involves just adding new controlled terms to a terminology
- underlying data model, implementation guides, database structure, conformance validation rules, tools need not change (when they do, they should be infrequent, minor)
- Invest in tools to transform the data any which way
- Generate any data view of interest from the enterprise data warehouse: e.g. enhanced-SDTM, ADaM, future analysis views X,Y,Z
Having said this, we must recognize the tremendous amount of
resources being devoted today towards implementing the SDTM as an exchange
standard. Any transition strategy must take into account the natural
lifecycle of systems and processes being used to support current operations. A
new exchange standard needs to be introduced gradually, with a sufficient
overlap period to minimize disruption. Any content standards (e.g. standard
data elements, terminology) developed and implemented during Option A are readily
reusable in Option B. They are not lost.
In conclusion, SDTM cannot fulfill both data exchange and data
analysis roles optimally. SDTM as an exchange standard should be retired. Short term, it is too disruptive to replace SDTM as an exchange standard at this time. The exchange standard use case for SDTM must continue to be supported for now. Long-term, once liberated from its role as an exchange standard, SDTM’s future as an analysis standard is very bright. Long term
planning, including a sensible transition strategy, should move towards a next
generation exchange standard for study data.
Armando, it's interesting that you refer to the exchange and analysis roles as that is what was important in your role at the FDA. There is a whole other role within the sponsor organizations which revolves around being able to use the model operationally to get the work done on a day by day basis. Unfortunately, I believe SDTM also falls short of fulfilling this role as well. I'll quote something you presented many years ago - the world is not flat and neither is clinical data. One of my colleagues has begun calling the data multi-dimensional and we need a model to support that relationship. I think we can leverage the great work done within SDTM and the other standards to move it to the race car.
ReplyDeleteChris, thank you for your perspective. The operational use case is one that I had not considered. Would it be useful to document high level requirements for this use case? Has this already been done? I believe ODM was developed to support this use case and I'm curious to hear from others how well that is working. I have no personal experience in that regard. Others have correctly pointed out that there is no benefit in moving towards a more multi-dimensional, multi-relational data model if the relationships are not being captured at the point of data collection (e.g. CRF). This is very true...but I do think that a multi-relational data model will enable the design of smarter eCRFs that can capture those relationships automatically "behind the scene."
ReplyDeleteThis was recommended reading for ODM v2 discussion. Consider project DataSphere. A sponsor de-identified patient level data and is making them available to other researchers.
ReplyDelete