2015-10-07

Managing Study Workflow using RDF

In a previous post, I discussed the owl:Restriction class and how it can be used to define and identify eligible subjects in a study. Here I illustrate other uses of this class to help manage the study workflow.

A computable, or machine-readable and interpretable study protocol is fundamental to automating processes to collect and analyze study data. A computable protocol provides sufficient detail for information systems to determine what is supposed to happen and help identify gaps in protocol execution. It would enhance protocol compliance and enable automation of analyses that look for protocol violations and their impact on the objectives. 

At its core, a study protocol is a collection of activities to be performed on study subjects. It describes what data are collected, and the rules that determine when those activities occur. Current standards fall short in representing this workflow in a computable format. I explore here a possible solution using RDF and OWL. I must emphasize I am not an RDF expert so the discussion is by its nature at a high level. The details are my best attempt to describe how it might work. I hope that experts in semantic web technologies will read and comment on this proposed solution, and weigh in on the possible benefits of this approach. You will see slightly different uses of the owl:Restriction class that I discussed previously. 

The proposed approach is straightforward. If you think of a protocol as a collection of activities, each activity has a rule determining when it can begin. The system must be able to identify which activity is "next" and inform the site what to do next. The system must also have the ability to document digitally when an activity ends (i.e. when the activity is considered complete). Each activity has a unique ID so that it can reference other activities. In RDF this is the URI (uniform resource identifier). So a start rule can say Start B when A is complete and StudyDay is 12-17. There will always be one activity with no start rule, so that's the one the system presents first for execution. As activities are marked complete, the next activity is enabled for execution, and so on. Now for each subject, you throw all of these activities and their associated rules in a bowl and pick out individual activities until you find which one you do next. Computers are great at doing this sort of thing quickly. You repeat until all activities that were supposed to be done are completed. Then the study is over for that subject. The grouping of activities into arms, epochs, elements, etc. for any given subject for analysis purposes can all be derived.

My idea to use RDF to represent this workflow emerged while reading the book “Semantic Web for the Working Ontologist,” (excellent book, by the way. I highly recommend it) I came across an example in Chapter 11 on Basic OWL that describe how to model Questions and Answers in OWL (you'll find it on page 222). As I read it, a light went off. In reality I had to read it several times to fully understand it.  I thought, this example is entirely relevant to clinical trials. The exercise describes a group of questions (activities) that are asked generally in sequence, but some questions are asked only when other questions are completed and answered in a certain way. That is, some questions have prerequisites before the system can ask them. So there are concepts for a Question, an Answered Question, an Enabled Question (ready to be asked by the system). Only when a particular answer is obtained and documented does another question become an Enabled Question. 

So how does this apply to clinical trials? Here's how it might work. Let's review a few common study activities.

  • Obtain informed consent
  • Collect screening clinical observations
  • Determine eligibility
  • Allocate to Treatment (Drug or Placebo)
  • Administer Treatment (Drug)
  • Collect a treatment-related clinical observation on Study Day 14. 
Most rules can typically be expressed as conditional (if… then…) statements, e.g. IF informed consent is signed THEN conduct screening activities. Some activities do not have start rules. For example “obtain informed consent” has no start rule. This activity is performed by default at the start of most studies. Some activities depend on the study day or is triggered by the occurrence of another activity. IF StudyDay = 7 THEN perform a complete blood count. IF <headache occurs> THEN <administer study drug>. Some activities are unplanned and also have no start rules (e.g. unplanned clinical observations in response to an unexpected change in the patient’s medical condition). These are added as the study progresses. Activities can also be grouped and nested. Rules to determine eligibility are in effect rules on whether to start the Allocation activity, since only eligible subjects are allocated to experimental treatment.

The basic structure of study activities can be represented by classes and properties in OWL. The basic schema for the study activities is as follows. Throughout this example I use the namespace s: to refer to elements (classes, properties) that relate to an ontology of a study, and the namespace t: to refer to the elements of the particular example study used here (i.e. instance data).  A particular study will have Study Activities and Study Activity Outcomes.  A StudyActivity is considered completed when an Outcome is documented in the system. An outcome could be "unable to complete" if the subject is lost to follow-up, for example. 

First, we establish the StudyActivity and StudyActivityOutcome as OWL classes.

s:StudyActivity a owl:Class.
s:StudyActivityOutcome a owl:Class.

Typical activities and possible outcomes are the following:

Activity                                                         ActivityOutcome
Informed Consent                                       InformedConsentSigned
Screening                                                    Sub-activities completed
     Hemoglobin                                           14.0 mg/dL
     Hematocrit                                             42%
     Platelet Count                                        300K /cm3
     Fasting Blood Glucose                          102 mg/dL
     Eligibility Assessment                           True (i.e. Eligible Subject)
Randomization                                           Placebo
Substance Administration                          Placebo one daily for 3 months

Before a StudyActivity is performed, the Outcome might be one of several outcome options. We therefore define a class called StudyActivityOutcomeOption as a subclass of StudyActivityOutcome. We define a property called s:hasOption as follows. It describes the list of possible outcomes relevant to that activity. For observations, it defines the value set. 

s:hasOption a owl:ObjectProperty;
           rdfs:domain s:StudyActivity;
           rdfs:range s:StudyActivityOutcomeOption.

When a StudyActivity is complete it has a documented outcome, which is a subclass of StudyActivityOutcomeOption. Another way of saying it is the selected outcome is a subset of the outcome options. So we define a class called StudyActivitySelectedOutcome. We also define a property called s:hasOutcome. If an StudyActivity has a documented outcome, it's considered a CompletedStudyActivity, a subclass of StudyActivity. The RDF looks like this:

s:StudyActivitySelectedOutcome rdfs:subClassOf s:StudyActivityOutcomeOption.
s:CompletedStudyActivity a owl:Class.
s:CompletedStudyActivity rdfs:subClassOf s:StudyActivity.
s:hasOutcome a owl:ObjectProperty;
           rdfs:domain s:CompletedStudyActivity;
           rdfs:range s:StudyActivitySelectedOutcome.

I think s:hasOutcome is a subproperty of s:hasOption. (I defer discussion of sub properties here, but in reviewing the definition of sub properties, I think this is true). 

Because certain activities cannot begin until other activities are complete, we need the concept of a CompletedStudyActivity. This is a subclass of StudyActivity:

One way a study activity is complete (i.e. is a CompletedStudyActivity) is if it has a documented s:StudyActivityOutcome through the :hasOutcome property.  That is it's any StudyActivity that has a triple in the database of the form :StudyActivity :hasOutcome :StudyActivityOption.  Given the meaning of rdfs:domain and rdfs:range for the :hasOutcome property, this makes the study activity a completed study activity and the study activity option a study activity selected outcome. 

A StudyActivity can have one or more subactivities, which themselves are each a StudyActivity. So we define a class called SubActivity as a subclass of StudyActivity, and we define a property called s:hasSubActivity as follows:

s:hasSubActivity a owl:ObjectProperty;
           rdfs:domain s:StudyActivity;
           rdfs:range s:SubActivity.

Any StudyActivity that is a range of the s:hasSubActivity property is automatically a SubActivity.

Another way a StudyActivity is complete is if all of its SubActivities are complete. This can be expressed as a restriction on the values that the s:hasSubActivity property can have: 

s:CompletedStudyActivity owl:equivalentClass [a owl:Restriction;
                 owl:onProperty s:hasSubActivity;
                 owl:allValuesFrom s:CompletedStudyActivity].

I think the use of an allValues owl Restriction here means that a StudyActivity is a CompletedStudyActivity if all of its SubActivities are only CompletedStudyActivities.

Those activities that cannot begin until other activities are complete are described to have Prerequisites. We therefore define a new property called :hasPrerequisite:

s:hasPrerequisite a owl:ObjectProperty;
            rdfs:domain s:StudyActivity;
            rdfs:range s:StudyActivityOutcomeOption.

It's important to define the prerequisite as an outcome option (and not the activity itself) to allow branching. Depending on the selected outcome, different activities may subsequently be enabled for a particular subject.

Any StudyActivity for which all of its prerequisite activities have been completed (ie. have only associated StudyActivitySelectedOutcome(s)) becomes an EnabledStudyActivity. These are the activities that are performed next in a study. EnabledStudyActivity is defined as an owl restriction class on the property :hasPrerequisite.

s:EnabledStudyActivity a owl:Class.
s:EnabledStudyActivity owl:subclassOf s:StudyActivity.
s:EnabledStudyActivity owl:equivalentClass [a owl:Restriction;
           owl:onProperty s:hasPrerequisite;
           owl:allValuesFrom s:StudyActivitySelectedOutcome].

(note lines 2&3 have been corrected since the original posting to address a reader comment)

I think this means that a StudyActivity is an EnabledStudyActivity if all of its prerequisites have documented only StudyActivitySelectedOutcome(s) (i.e. all values for the hasPrerequisite property, wherever it's used, comes from the Study Activity Selected Outcome class.

So let's look at some instance activities for any given subject in my hypothetical study.

  • Obtain informed consent
  • Collect screening clinical observations
  • Determine eligibility
  • Allocate to Drug Treatment
  • Administer Drug Treatment
  • Collect a treatment-related clinical observation on Study Day 14. 
The very first activity, informed consent, has no prerequisites. This can be represented in the systems as the following:

t:InformedConsent a s:StudyActivity.
t:InformedConsent a [a owl:Restriction;
           owl:onProperty s:hasPrerequisite;
           owl:cardinality 0].

It basically says that the InformedConsent activity has no prerequisites. For reasons I don't fully understand, the definition of an EnabledStudyActivity (which is defined using an owl:allValuesFrom restriction) will include the InformedConsent activity because of the way the allValuesFrom restriction is evaluated for empty sets. So the system determines InformedConsent is Enabled and surfaces it in the system for action.

Once the appropriate triple is entered in the database saying that the activity is completed, i.e.,

t:InformedConsentX s:hasOutcome t:InformedConsentSigned.

the system infers it is a CompletedStudyActivity.

Meanwhile, the t:Screening activity has a Prerequisite in the system of a t:InformedConsentSigned activity outcome. As soon as the system detects that the informed consent was signed, t:Screening now infers it is an EnabledActivity and is presented to the study team for action.

Now t:Screening has multiple sub-activities as documented in the database using the s:hasSubActivity property. When every sub-activity has documented triples entered in the database in the form :subactivityX s:hasOutcome t:selected-outcomeX, the system infers that t:Screening is a CompletedStudyActivity.

Now here's one piece I haven't yet figured out. The system needs to automatically generate and enter a triple in the database stating that the screening activity hasOutcome a selected outcome. This is necessary to enable the next study activity: determine eligibility. I welcome any ideas on how to do this.

In summary, it seems possible to use OWL to model an executable study workflow. I would appreciate comments from OWL experts on whether this approach would work. I think it would be informative to get this ontology documented and working using dummy instance data. However, I find the logic rather complex and I honestly am not sure if I have it right. It may very well need substantial modification, but the overall strategy makes sense to me. I'd also like to figure out how to weave in study days in the workflow, as many activities depend on the study day. I welcome thoughts on this issue.