2017-03-14

Extensible Code Lists: an RDF Solution

We are all familiar with code lists, or value sets as they are also called. These are permissible values for a variable. For example Race, as described by the U.S. Office of Management and Budget (OMB), can have 5 permissible values: White, Black or African American, Asian, American Indian or Alaska Native, and Native Hawaiian or other Pacific Islander.

Some standard code lists are incomplete, i.e. they don't capture the universe of possible values for a variable. These code lists are called extensible. The Sponsor may create custom terms and add them to the code list. Managing these is a challenge. Here is an idea that we are proposing for the PhUSE SDTM Data in RDF project that we are kicking off next week at the PhUSE Computational Science Symposium.  RDF has a unique advantage over other solutions in that it is designed to work with data that are distributed across the web. It can be used to integrate multiple dictionaries from multiple sources. Here's one way it can work. 

First one creates a study terminology ontology containing all the standard terminology concepts needed for clinical trials. It looks something like this:


One can see how to leverage other terminologies. For example, the Vital Signs class links to SDTM terminology expressed in RDF. In this case the resources shown here are for Diastolic Blood Pressure.

Now you create a second ontology for custom terms, which looks very similar to the first one:


In this example, the sponsor performed three custom flags for subjects who completed 8, 16, and 24 weeks of treatment, respectively. These are entered as custom:PopulationFlag analyses. Next, one imports the standard terminology ontology and specifies using the rdfs:subClassOf property that the custom terms are sub-classes of the standard concepts. So now it looks like this:


Looking at the code:PopulationFlag example, there are three standard population flags specified: Efficacy (EFF), Safety (SAF), and Intent to Treat (ITT). Furthermore there are the three custom flags as previously described.

The nice thing about this approach is that the custom terms exist independently from the standard terms and can be easily removed/ignored for the next study, yet they can be linked in this way to the standard terms so tools treat them the same. A SPARQL query looking for all members of the code:PopulationFlag class will return 6 individuals. For the next study, one can create a different set of custom terms. The "web" of study terminologies begins to look like the figure below. One can imagine a diverse library of controlled terms all available for implementation almost literally at one's fingertips.

One can link to other terminologies in the same way. Ideally, all the standard ontologies exist on the web and one merely links to them, thereby taking advantage of Linked Data principles.

I appreciate your comments. 

No comments:

Post a Comment