Here I discuss a simple, practical application of the RDF to demonstrate how it can solve a real problem now. In another prior post, I described that the ideal role of the SDTM is a standard report from a database that is used for analysis. This example automates the creation of an SDTM demographics domain from an RDF database (called a triple store). First, I create a simple ontology of a study. Then, I use it to generate sample study data in RDF. Then I then store the ontology and the data in a simple RDF triple store database (a "knowledgebase"), and then I use SPARQL (the RDF query language) to query the database and generate an SDTM demographics domain. I discuss how this strategy can also be used to validate the data. I used a commercial, off-the-shelf (COTS) product: TopBraid Composer. The RDF file used for this exercise is available for download in Turtle format.
First I created a mini study ontology, containing only the classes and properties needed for this small exercise. You'll recognize many of the classes from BRIDG and the SDTM. I added a new class called SDTM_Domain which will contain a resource for each instance of an SDTM domain/dataset.
|Mini Study Ontology - Classes|
I then created the properties. First are the object properties (in blue)...those that relate two classes with each other, then the datatype properties (in green)...those that describe the data:
|Mini Study Ontology - Properties|
For example, the :conductedAt property relates the Study_Site class with the Country class. It enables asserting that Site 0001 was conducted in Germany, for example. These relationships are captured as Domain and Range information for each property using the standard rdfs:domain and rdfs:range properties. Another example is that the :age property has the :Subject class as its domain and the datatype xsd:integer as its value (i.e. range).
Using this ontology, I populated the triple store using dummy instance data for 10 subjects (you'll see the number 10 next to :Study_Subject in the diagram above to indicate the database has 10 study subject instances). Similarly I entered 4 instances of study sites, 2 instances of Sex (male, female), etc.
Finally, I created a single instance of an SDTM demographics domain, calling it :demographics:
|Demographics domain Resource|
|SPARQL Query to generate DM Domain|
|DM Domain generated from RDF data|
:age rdfs:range xsd:integer.
which basically says permissible values for ages are integers, represents an executable validation rule in the knowledgebase. If one enters a non-integer value for age, a reasoner can identify and surface the contradiction....which is basically a validation report. More complex validation rules can be constructed using the RDF, for example, for positive integers, cardinality constraints, value set constraints, etc.
As a next step, it would be useful to redo this example using the publicly available CDISC Foundational Standards in RDF specification. I was going to do this but haven't gotten around to it.
Longer term, these datasets should be "generate-able" from a BRIDG ontology. I believe an OWL representation of BRIDG will pave the way to generate even more useful reports for analysis (see another post: BRIDG as a Computable Ontology). Then one could populate the knowledgebase with, or incorporate by reference, more and more validation rules and even FDA policy statements expressed in the RDF and automate the ability not only to detect invalid data, but also data that doesn't conform to FDA study data submission policies as described in guidances and regulations.
In summary, the RDF provides the capability to implement practical solutions now that provide an alternate mechanism to automatically generate SDTM datasets using simple COTS tools, while at the same time provides the flexibility to increase the capabilities of the knowledgebase to support more and more solutions, such as:
- Generate all SDTM datasets
- Validate the data
- Determine conformance with FDA submission policies
- Generate more useful views/reports for analysis
Once this capability in the knowledgebase is fully developed, then one would not need to exchange the tabular reports, but can exchange the RDF data themselves. Or better yet, given the distributed or "linked data" capabilities of the semantic web, the recipient can simply be granted access to the RDF data on the web.