Study Data Exchange: The Unsustainable Path

The current state of study data exchange, based on a tabular representation of study data, is in trouble. The CFAST experience is demonstrating that the requirements for new therapeutic areas (TAs) result in an ever increasing need for new variables and domains. The burden to industry and FDA to adapt to these changes is too great. Rapid changes in implementation guides  (IGs) and the data model itself are quickly exceeding an organization's ability to keep up. As new IG versions emerge, there will be a need to update the TA user guides, not to mention updates to validation rules, databases, etc. The resources needed to do that are quite formidable.

I believe the solution is to adopt a more robust, relational data model for study data exchange that is capable of incorporating new requirements easily, often by simply adding new terms to standard terminologies, rather than adding new variables and domains. The time to do this is now because the current approach is not sustainable. For example, the latest version supported by the most currently published FDA validation rules is SDTM v1.3/SDTM IG 3.1.3 yet the SDTM v1.4 is already published and work is well underway on SDTM v1.5. When I was at FDA, I was the chair of the Change Control Board (CCB) for the validation rules and I can say that it is extremely challenging to keep up with this rate of change, even if the Agency moves towards a paradigm of updating and maintaining just the business rules, as is currently planned.

Let me further illustrate the challenges of sustaining the current path with a simple analogy. Imagine that we want to exchange a person's contact information, similar to what is stored in an electronic address book. What should the exchange standard look like for these data? Here's a simple data model for contact information:

Assume that we add controlled terminology for certain concepts such as State, Zip Code, Area Code, and we have a perfectly reasonable and functional model. However, how do we handle a new requirement, such as exchanging both home and business addresses, phone numbers and email addresses? The current model doesn't support this requirement, so we update the model. We introduce new variables for home and business addresses, phone numbers, and emails. We get something like this:

Problem solved. Requirement met. We can group these data elements into logical groupings or "domains" and suddenly we start seeing the similarity to the SDTM:

However, validation rules, databases, and tools developed under the first model all need to be updated to accommodate the second model. This takes time and expense.

Yet new requirements continue to emerge. Now we want to exchange mobile phone information, so we update the model yet again to add new variables and a new MB domain:

And the cycle repeats for new TAs: update the IG, the validation rules, databases, tools.

There is a better approach: adopt a more robust relational data model as shown here:

Furthermore, we introduce controlled terminology for email type, address type, phone type (e.g. home, work, mobile), and we can accommodate all the requirements described thus far without a change in the model but rather simply adding new controlled terms to these concepts. More importantly, future requirements are also incorporated easily. Let's say many of the contacts are corporate executives with summer homes and we want to capture second home information? We add a new controlled term to the Address Type concept (i.e. second home) and we're done. No changes to IGs, validation rules, databases etc. are necessarily needed.

Does the more robust relational data model solve all our problems? No. New requirements may yet emerge that may necesitate changes to the underlying model (e.g. birth date, marital status), but the goal is to design the relational model as flexible as possible to make the need for such changes as infrequent as possible, to minimize the implementation burden.

Getting back to study data exchange. I believe we are in dire need of a new, more robust data model that represents all clinical observations and assessments in a single standard representation so that new clinical data requirements for additional therapeutic areas can be incorporated easily by adding new terms to a dictionary and without having to change the underlying model. The current approach is not sustainable and I foresee the entire therapeutic area standardization effort collapsing under its own weight. I think it is already starting to happen.


  1. Armando, I always appreciate your candid statements. I support the CFAST Mission; however, as you and others (http://waynekubick.com/2016/01/18/dear-cfast-please-slow-down-to-catch-up/) have noted, the CFAST initiative has exposed challenges with various aspects of the underlying model and as well as its distribution (i.e. versioning). You are correct, the resources required to simply “keep up” are indeed quite formidable. Further, in my experience as both a sponsor employee and consultant, I have seen not only the therapeutic area standardization effort, but foundational standardization effort being questioned.

    In my opinion, “industry” - sponsors, regulatory agencies, technology providers, standards development organizations, etc. - needs to have a more candid, public, transparent discussions regarding industry-wide standards. I know one will find support for standards, but I also know that like this blog, many will question the current state, its sustainability, and, frankly, its value in day-to-day operations.

    The PhUSE/FDA Computational Science Symposium will be hosting a panel discussion entitled “Next Generation Standards and Analysis - How do we support product review in 10 years?” Perhaps that can be the spring board to the broader discussion I referenced above. In my opinion, the industry simply cannot afford to be having the same discussions for the next ten years.


  2. It's good to see your post for medical his which is nice to see here, keep doing well and you have done good work on this post....Transplantation