As the saying goes, "timing is everything." This is no less true in clinical trials because knowing when activities occurred or how long they last often holds the key to proper interpretation of the data. Documenting temporal elements for activities in clinical trials is therefore crucial.
In the RDF world, we can leverage the work of others who have thought about this issue in great detail. It turns out that the World Wide Web Consortium (www.w3c.org), has developed a time ontology in OWL for anyone to use. It's rather simple and elegant and has useful application for our Study Ontology. Linking our study ontology with the w3c time ontology is a nice example of the benefits of Linked Data. The ontology goes like this....
A temporal entity can be either a time:Instant (a single point in time) or a time:Interval (has a duration). Intervals have properties like time:hasBeginning and time:hasEnd. These are not totally disjoint because one can consider an Instant as an Interval where the start and end Instants are the same, but this is a minor point.
For many Activities, such as a blood test or a vital signs measurement, all we really care about is the date/time it occurred. For all practical purposes, a time:Instant. Some activities do have a duration worth knowing about, so one can attach a time:Interval to them. The nice thing about Intervals is that it links the beginning instant to the end instant ... they go together. The time:Interval resource is the link that holds them together.
So let's look at some examples taken from the SDTM of some important Intervals and how they might look in the RDF when we link to the w3c time ontology. As always, I use Turtle syntax as it's very human-readable:
study:ReferenceStudyInterval rdf:type time:Interval;
time:hasBeginning sdtm:RFSTDTC ;
time:hasEnd sdtm:RFENDTC .
study:ReferenceExpsosureInterval rdf:type time:Interval;
time:hasBeginning sdtm:RFXSTDTC ;
time:hasEnd sdtm:RFXENDTC .
and another important one:
study:Lifespan rdf:type time:Interval;
time:hasBeginning sdtm:BRTHDTC ;
time:hasEnd sdtm:DTHDTC .
Now here is where it gets fun. Let's say you want to derive RFXSTDTC and RFXENDTC (first and last day of exposure). Imagine your database has various time:Interval triples for each subject, each describing a fixed dose interval. Imagine in this example, Person1 participates in 3 fixed dosing intervals, as shown in the RDF as follows:
study:Person1 study:participatesIn study:DrugAdministration1, study:DrugAdministration2,
study:DrugAdministration3.
Each administration is associated with an interval: Interval1, Interval2, Interval3, each of which has a time:Beginning and time:End date. One can write a SPARQL query to pull out the minimum (earliest) time:hasBeginning date and the maximum (latest) time:hasEnd date for all the drug administration intervals and thereby derive automatically the two SDTM dates of interest. The same can be done for RFPENDTC (reference participation end date). I can't tell you how often this date is wrong in actual study data submissions. A SPARQL query can identify all dates for all study activities associated with a Subject and pick out the maximum date, which happens to be the RFPENDTC. Best of all, these standard queries can exist as a resource on the web using SPIN (SPARQL Inference Notation) for anyone to use.
But first, you need study data in the RDF and a Study Ontology.
No comments:
Post a Comment