Eligibility Criteria, Screen Failures, and another RDF Success Story

It's T minus 6 weeks  (approximately) for the PhUSE 13th Annual Conference in Edinburgh, Scotland and I'm beaming with excitement. I'm involved in two study data projects using RDF and both are going very well. The first one, in collaboration with Tim Williams and an enthusiastic project team of PhUSE volunteers is called Clinical Trials Data in RDF, which among its various goals, will demonstrate how study data in RDF can be used to automatically generate highly standards conforming, submission quality SDTM datasets.

But it's the second paper that I want to discuss today. It's called "Managing Study Workflow Using the RDF." The paper is in pre-publication status so I can't share today, but I plan to post a copy here after the conference. I include the Abstract below.

In a nutshell, the paper describes how one can represent study activity start rules using SPIN (SPARQL Inference Notation), a type of RDF, to identify which study activity(-ies) are to be performed next based on what activities have already been performed. Well it turns out that the start rule for the Randomization activity in a typical double blind trial is in fact the Eligibility Criteria for the study. Here it is, in an executable form that, when combined with a standard COTS (commercial off the shelf) RDF inferencing engine can automatically determine eligibility. How cool is that?

A typical eligibility rule consists of multiple subrules all of which themselves must be TRUE for the overall rule to be true (e.g. AGE must be >=18 years AND RPR must be negative AND Pregnancy Test must be negative AND etc.); exclusion criteria can be negated and added as a subrule. The ontology also describes how to skip subrules that can logically be skipped (e.g. Pregnancy Test must be negative in a male subject). The end result is that identifying an Eligible Subject is automatic and performed simply by entering the screening test results in the knowledgebase. (Think of a knowledgebase as an RDF database).

Without going into the details (wait for the paper!), the rule points to all the screening activities that matter, checks each one for the expected outcome/result, and returns a TRUE or FALSE response if the conditions of the rule are or are not met. If the rule outcome is TRUE, the subject is eligible and the Randomization activity is enabled. If the rule is FALSE, then just the opposite. The paper describes the data from eight hypothetical subjects that were screened for a hypothetical study with just a few screening tests/criteria. The ontology correctly came up with the correct eligibility outcome for all eight.

But there is more....by adding a few more simple SPIN rules to the ontology, the inferencing engine can readily provide a list of all Screen Failures, and the tests that caused them to fail. It can also identify the tests that were logically skipped and therefore ignored for eligibility testing purposes. Do you want to determine which Screening Failure subjects received study medication? Another SPIN rule can do that too. The possibilities are quite exciting. It makes RDF, in my humble opinion, a strong candidate for representing clinical trial data during study conduct. No other standard that I know of supports this kind of automation "out of the box." in RDF, the model and the implementation of the model are the same!! And, once one is ready to submit, you press another button, and submission quality SDTM datasets are generated (which the first project I mentioned intends to demonstrate).

For more details, contact me, or wait until after the PhUSE meeting in October for the full paper.

A clinical study is fundamentally a collection of activities that are performed according to protocol-specified rules. It is not unusual for a single subject to undergo hundreds of study-related activities. Managing this workflow is a formidable challenge. The investigator must ensure that all activities are conducted at the right time and in the correct sequence, sometimes skipping activities that logically need not be done. It is not surprising that errors occur.

This paper explores the use of the Resource Description Framework (RDF) and related standards to automate the management of a study workflow. It describes how protocol information can be expressed in the RDF in a computable way, such that an information system can easily identify which activities have been performed, determine which activities should be performed next, and which can be logically skipped. The use of this approach has the potential to improve how studies are conducted, resulting in better compliance and better data.


Quality Data in Clinical Trials, Part 2

It's been two years since I wrote about quality data in clinical trials. As I re-read that post now, I agree with most of what I said, but it's time to update my thinking based on experience gained since then with study data validation processes.

I made the point that there are two types of validation rules: conformance rules (to data standards) and business rules, a.k.a. data quality checks. I had suggested that conformance rules are best managed by the standards development organization. The fact is that sponsors and FDA support multiple standards (SDTM, MedDRA, CDISC Terminology, WHO Drug Dictionary) so it's up to FDA to manage the collective set of conformance rules across the broad data standards landscape with regard to regulatory study data submissions.

The division between conformance rules and business rules is still quite important. They serve different functions. Ensuring conformance to standards enables automation. Ensuring data quality enable meeting the study objectives. One can assess data quality on legacy data. It is a slow, manual process. Standardized data enable automated data quality checks than can more easily uncover serious data content issues that can impede analysis and review.

As a former FDA reviewer, and a big proponent of data standards, I can honestly say that FDA reviewers care very little about data standards issues. Their overriding concern is that the data be sufficiently standardized so they can run standard automated analyses on the data. The analyses drive the level of standardization needed by the organization. These analyses include the automated data quality checks. One cannot determine if AGE < 0  (a data quality check), if AGE is called something else or is located in the wrong domain (conformance rule).

It's like driving a car. You want to get from point A to point B quickly (minimize data quality issues), you don't really care what's under the hood (standards conformance issues). That is for mechanics (or data analysts) to worry about.

FDA now has a robust service to assess study "Data Fitness" (being described as data that are fit for use). Data Fitness combines both conformance and business rules. They are not split, and the reviewer is left to deal with data conformance issues, which they care little about, as they can be quite technical and there is generally a manual work-around, along with the data quality issues, which are of most importance to them and have the biggest impact on the review. Combining the two is a mistake. I believe Data Fitness as a concept should be retired and the service split into two: Standards Conformance, and Data Quality. The Data Quality assessment service should only be performed on data that have passed the minimum level of conformance needed by the organization. If a study fails conformance testing, it wasn't standardized properly and those errors need to be corrected. In the new era requiring the use of data standards, FDA reviewers should not be burdened with data that do not pass a minimum level of data standards conformance.

Consider this hypothetical scenario as an example to drive home my point. FDA requires sponsors to submit a study report supporting the safety and effectiveness of a drug. The report should be submitted digitally using the PDF standard.  The report arrives and the file cannot be opened using the review tool (i.e. Acrobat) because of 10 errors in PDF implementation (not realistic in today's day and age, but possible nonetheless). Those 10 errors are provided in a validation report to the reviewer for action. The reviewer doesn't care about the technical details of implementing PDF. They want a file that opens and is readable within Acrobat. Let us all agree that the reviewer should not be burdened evaluating and managing standards non-conformance issues.

If you replace study report with study data, and PDF with SDTM, this scenario is exactly what is happening today. But somehow that practice remains acceptable. Why? Well because there are other "tools" (akin to a simple text editors in the document world) that allow reviewers to work with non-conformant data, albeit at much reduced efficiency. These "workarounds" for non-standard study data are all too prevalent and acceptable. With time this needs to change to take full advantage of standardized data for both the Sponsor and FDA alike.

My future state for standardized study data submissions look like this: study data arrive, they undergo standards conformance validation using pass/fail criteria. Those that pass go to the reviewer and the regulatory review clock starts. Those that fail are returned for correction. (The conformance rules are publicly available so that conformance errors can be identified and corrected before submission.) During the filing period, automated data quality checks are performed and that report goes to the reviewer. Deficiencies result in possible information requests. Serious data quality deficiencies may form the basis of a refuse to file action.

Finally, let's retire use of the term "DataFit" in favor of what we really mean: Standards Conformance or Data Quality. Let's not muddle these two important issues any longer.