So here I propose an approach to limit certain types of changes. But first, I discuss fundamental principles about exchange and analysis standards. These principles explain the reasoning behind my suggested approach.
First of all, let's consider what is an exchange standard vs. an analysis standard? Here are some working definitions:
Exchange Standard
- A standard way of exchanging data between computer systems. It describes the information classes/attributes/variables/data elements, and relationships necessary to achieve the unambiguous exchange of information between different information systems (interoperability).
Analysis Standard
- A standard presentation of the data intended to support analysis. It includes extraction, transformation, and derivations of the exchanged data.
Exchange and analysis standards meet very specific and different needs:
- A good exchange standard promotes machine readability, understandability, and business process automation at the expense of end user (human) readability
- A good analysis standard promotes human readability and use/analysis of the data using analysis tools. It maintains strict traceability with the exchanged data
As an analogy, consider data as pieces of furniture. The exchange standard is the truck that moves the furniture from point A to point B. The size and configuration of the truck is dependent on the contents. The furniture is packed in a way so it can be moved efficiently. It cannot readily be used in its packed environment. The analysis standard describes how to organize the furniture at its destination to maximize its usefulness. For example, a dining room table goes in the dining room. A desk goes in the den, etc. Furniture may need to be assembled (i.e. transformed) from its packed state to its useful state.
Because of the different use case associated with each type of standard, it comes as no surprise that the individual requirements are very different. Often they are competing or contradictory. Here are some examples:
Exchange Standard
|
Analysis Standard
|
No
duplication of data/records
(minimize errors, minimize file size)
|
Duplicate
data
(e.g. demographics and treatment assignment in each dataset)
|
No
derived data
(minimize errors, avoid data inconsistencies, promote data integrity)
|
Derived
data
(facilitate analysis)
|
Very
standard structure
(highly normalized, facilitate data validation and storage)
|
Variable
structure to
fit the analysis
|
Numeric
codes for coded terms
(machine readable, facilitate data validation)
|
No
numeric codes
(human understandable)
|
Multi-dimensional
(to
facilitate creation of relational databases and reports of interest)
|
Two-dimensional (tabular) containing only relationships of
interest
for specific analysis (easier for analysis tools)
|
Standard
character date fields
(ISO 8601: ensures all information systems can understand them)
|
Numeric
date fields
(facilitates date calculations, intervals, time-to-event analyses)
|
So where does the SDTM fall? It was originally designed as an exchange standard for study data submissions and certainly that is its main purpose currently. However, because similar data from multiple subjects exist together into domains make them suitable for simple analyses as well. In fact, FDA reviewers have been performing simple analyses using SDTM datasets for years. More recently, the FDA has developed standard analysis panels that use native SDTM datasets as input. So the answer is: SDTM is both an exchange standard and an analysis standard (for simple, standard analyses; ADaM data are used for complex and study-specific analyses)
Here is the important observation from the above discussion. Since the requirements are different for exchange and analysis, SDTM is being pulled in two directions, and cannot perform either role optimally. We must choose which role is more important. How often have we heard the discussion: should SDTM have derived data? Well, it depends what use case you think is more important. As an analysis standard, of course it should have derived data. As an exchange standard, this is generally not a good idea.
In an ideal world, the SDTM as an exchange standard would be retired and replaced by a next generation exchange standard that is based on a more robust relational data model that strictly meets data exchange requirements (e.g. BRIDG+XML, HL7 FHIR, RDF/OWL). The data would come in, it would be loaded and stored in a data warehouse, and tools transform the data into useful, manageable chunks for the analysts. This future state would free the existing SDTM to take on more analysis/end-user friendly features (e.g. non-standard variables in parent domains, numeric dates, demographic/treatment arm information in each domain), but these enhanced SDTM views would be generated by tooling.
The reality is we do not live in an ideal world and SDTM as an exchange standard is here to stay for the foreseeable future. Therefore, for the foreseeable future, changes to the SDTM should stress data management requirements. Changes to make SDTM more end-user friendly for analysis should be resisted as they invariably will make data exchange more complicated and costly. Requirements to make SDTM data more user-friendly should be handled by better tooling by the recipient to facilitate post-submission data transformations needed for analysis. Going back to the furniture analogy, it's not efficient to ship the dining table and chairs in the exact arrangement they will be used at the destination; rather, we need more and better furniture unpackers at its destination.
So, I propose the following algorithm to evaluate proposed structural changes to the SDTM (i.e. those that do not add new content). As always, I welcome your thoughts.