2015-08-08

BRIDG as a Computable Ontology

In a recent post, I discussed using the Resource Description Framework (RDF) and the Web Ontology Language (OWL) for biomedical information. In this post, I discuss how we can start applying semantic web modeling principles in BRIDG to bring greater semantic clarity to the model. I believe BRIDG should evolve so that it becomes a computable ontology; i.e., the canonical representation of the model one day is in OWL and from which other representations can be derived that implementers find useful. This brings other benefits, such as easily incorporating other biomedical ontologies by reference, but that's another future topic. For various reasons, we are unlikely to realize this vision in the short-term. However, we can take incremental steps in that direction. Today I discuss how to refine BRIDG definitions to enable their computational representation in OWL to facilitate computer-assisted reasoning. We can start by expressing the definition of a class (as much as we can) as restrictions on properties of other defined BRIDG classes. We should avoid introducing new concepts in these definitions that haven't already been defined elsewhere in the model. 

As an example, let’s review the definitions in BRIDG 4.0 of classes related to a Research Project. As I look at the existing definitions, it is not always clear how these concepts are related to each other or distinguishable from one another. Here is the current state: 

BRIDG Class: Project
BRIDG Definition: A set of coordinated activities that is intended to achieve one or more objectives.
BRIDG Examples: The Cancer Genome Atlas (TCGA)
The Breast and Colon Cancer Family Registries
My Comment: The definition is quite broad. It can include things like construction projects, I.T. projects, etc. This is fine insofar as it forms the basis for the definition of other sub-classes relevant to the scope of BRIDG.

BRIDG Class: ResearchProject
BRIDG Generalization: Project
BRIDG Definition: A set of coordinated activities that is intended to test one or more hypotheses or lead to discoveries.
BRIDG Examples: A project to identify genetic biomarkers for cancer prognosis
A phase 2 clinical trial to test whether an experimental treatment is effective. 
An epidemiological study to determine whether there is a correlation between an exposure and a disease.
My Comment: Project is a generalization of ResearchProject, therefore the definition of ResearchProject should describe what properties of a Project make it a ResearchProject.
My Proposed Definition: A Project whose objectives include the testing of one or more biomedical hypotheses or the generation of biomedical discoveries (i.e. new hypotheses)
Discussion: It’s clear from the current BRIDG definition that a Project has objectives. A ResearchProject should be a Project whose objectives are restricted to certain biomedical research objectives: biomedical hypothesis testing or new hypothesis generation. 

BRIDG Class: Experiment
BRIDG Generalization: ResearchProject
BRIDG Definition:  A formal investigation, typically not subject to governmental oversight and regulation, that is intended to test hypotheses or lead to discoveries
BRIDG Examples: Gene expression experiment intended to discover novel genetic biomarkers.
Physicochemical characterization of nanoparticles.
My Comment: The definition should relate Experiment to ResearchProject. We should drop “formal investigation” since it is not defined elsewhere in BRIDG and causes ambiguity. We should drop “typically not subject to governmental regulation and oversight” since it means an experiment in one jurisdiction may not be an experiment in another jurisdiction, or the same experiment today may not be an experiment tomorrow if government regulations change. The definition of Experiment should be based on intrinsic properties of the concept.
My Proposed Definition: A ResearchProject whose objectives include the generation of biomedical discoveries (e.g. new biomedical hypotheses)
My Additional Comment: This proposed definition clearly distinguishes a ResearchProject from an Experiment. Whereas a ResearchProject may be designed to test or generate hypotheses, Experiments are designed only to generate hypotheses. Further discussion is needed among the life sciences community to determine whether this distinction is useful in reality. Regardless, we should identify properties of ResearchProject that make it an Experiment. If we can't then maybe we are dealing with synonyms. 

BRIDG Class: StudyProtocol
BRIDG Generalization: ResearchProject
BRIDG Definition: A discrete, structured plan (that persists over time) of a formal investigation to assess the utility, impact, pharmacological, physiological, and/or psychological effects of a particular treatment, procedure, drug, device, biologic, food product, cosmetic, care plan, or subject characteristic.
BRIDG EXAMPLE(S):
ClinicalTrials.gov study NCT01632332 Vaccine Therapy in Treating Patients With Previously Treated Stage II-III HER2-Positive Breast Cancer. The study protocol includes the elements identified in the NOTE(S) section.
BRIDG NOTE(S):
The term "protocol" is somewhat overloaded and must be qualified to provide semantic context.  Therefore the term "study protocol" was chosen to disambiguate it from other protocols. The notion of a study protocol includes (but is not limited to) the design, statistical considerations, activities to test a particular hypothesis or answer a particular question that is the basis of the study, characteristics, specifications, objective(s), background, pre-study/study/post-study portions of the plan (including the design, methodology, statistical considerations, organization).  The study may be of any type that involves subjects, including prevention, therapeutic, interventional or observational.  Subjects involved in the study protocol may be biological entities (human, animal, specimen, tissue, organ, etc.) or products. The study protocol is related to other supporting documents, including (but not limited to) informed consent documents, case report forms (CRFs), regulatory and approval documentation, correlative studies, etc. (via the inherited association to DocumentVersionRelationship).  The complete notion of the study protocol is represented in BRIDG by the classes StudyProtocol, StudyProtocolVersion, StudyProtocolDocument, StudyProtocolDocumentVersion, StudyConduct and all their associations.
- The StudyProtocol class represents the content of the study protocol which includes characteristics and plan of the study which can be distilled into or abstracted from a version of the study protocol document and can exist even before the information is put into document form.
- The StudyProtocolVersion class represents the details of the study protocol that may change over time.
- The StudyProtocolDocument class represents the document form of the study protocol and is a grouping of the various study protocol document versions.
- The StudyProtocolDocumentVersion class represents the document form of the study protocol version and is the details of the study protocol document that may change over time.
- The StudyConduct class represents the execution of a study based on a study protocol definition which includes the scheduled and performed activities that are subject-specific as well as study-level and site-level activities.
My Comment: I see numerous problems with this class and its definition: [1] StudyProtocol is the plan for a project, not the project itself, which is a Study. The generalization to ResearchProject is incorrect. The generalization should be to ResearchProjectPlan, if that class existed. (This distinction is semantically important in the same way a recipe is different from the meal or the blueprint is different from the building.); [2] The definition includes the concept of a “formal investigation.” This is not defined elsewhere in BRIDG and results in ambiguity.  It should be removed; [3] the definition should relate to a plan for a ResearchProject and what specific attributes/properties of that plan make it a StudyProtocol.

Before considering a more precise definition for StudyProtocol, we need to define a Study in BRIDG and also a ResearchProjectPlan (I intentionally defer proposing a definition for this new class here but I welcome comments). The definition for StudyProtocol can then be defined as the ResearchProjectPlan for a Study.

Proposed New BRIDG Class: Study
Generalization: ResearchProject
My Proposed Definition: A ResearchProject whose objectives include testing or confirming biomedical hypothesis(-es). The notion of a study includes (but is not limited to) the design, the planned and performed activities, and the analysis plan and its execution and documentation, to test a particular hypothesis or answer a particular question that is the basis of the study.  The study may be of any type that involves subjects, including prevention, therapeutic, interventional or observational objectives on the subject.  Subjects involved in the study may be biological entities (human, animal, specimen, tissue, organ, etc.) or products.
My Comment:  I think the lack of a Study class in BRIDG is a big gap that needs to be filled. We have had extensive discussions on this in the BRIDG Working Group calls, and there are historical reasons why a Study class was not added. Nonetheless, I think BRIDG needs a Study class, and the proposed definition clearly distinguishes a Study from an Experiment. Whereas a Study is performed to test biomedical hypotheses, an Experiment is performed to generate biomedical hypotheses (i.e. discoveries). It’s important to note that these are not disjoint classes. Since a ResearchProject may have multiple objectives, some to test hypotheses, others to generate hypotheses, a particular ResearchProject could be both a Study and an Experiment. I don’t see this as a problem and OWL handles classes that are not disjoint routinely.

My Additional Note: The existing BRIDG subclasses InVitroCharacterization and InVivoCharacterization and PhysicoChemicalCharacterization, which currently apply only to Experiment can apply to both a Study and an Experiment. The key distinction is whether the objectives are hypothesis generating or hypothesis testing. The distinction between the subclasses depend on properties of the ExperimentalUnit of the Study or Experiment.

One potential problem with these proposed definitions is that certain research projects that are typically called “studies” (e.g. early nonclinical studies during drug development) may not be studies based on this definition because they are hypothesis-generating experiments. I think that any complex domain will contain concepts that are used differently among different stakeholders. Developing an ontology with semantically precise definitions I think is an important step to decrease this type of variability gradually over time.

In summary, I think this approach is what is needed to obtain the level of semantic clarity to establish BRIDG as a computable ontology. Unfortunately, it requires a manual review of all existing definitions, which is hugely labor intensive. I think such a review will reveal classes that can be collapsed or re-classified as subclasses of other classes, hopefully resulting in a simpler model that will be easier to understand, maintain, and implement. 

4 comments:

  1. Excellent blog! I look forward reading all posts.

    The lack of a Study class is striking and also discussed in the PhUSE groups for a semantic web representation of the Protocol and of the Development program.

    I think it can be useful to investigate the use of a "full-fledge" ontology: Ontology for Biomedical Investigations http://obi-ontology.org/ e.g. Study http://purl.obolibrary.org/obo/OBI_0000073 and Study Interventions http://purl.obolibrary.org/obo/OBI_0000931

    Such a Study class can also be related to the "simplistic" schema being used by Google/Yahoo/Microsoft in improving search by defining types of entities described on webpages about for example; https://schema.org/MedicalObservationalStudy

    ReplyDelete
  2. Armando, I think you highlight one large gap, but there are others throughout BRIDG that make me ponder if is the right way forward longer term. As Kerstin points out, should we be exploring what is done outside of our world and look at other medical models and industries to identify a more flexible approach. Just a thought.

    ReplyDelete
  3. Kerstin, Chris, thank you for your comments. I agree we need to explore and evaluate other ontologies. I recently became aware of the OBO http://www.obofoundry.org and there seems to be a lot of good work there. They reference the OBI. I also found the NCBO Bioportal http://bioportal.bioontology.org. If nothing else, RDF seems to represent a common language to more easily compare, combine, incorporate by reference .... other models.

    ReplyDelete
  4. Ontology is built on the top of RDF.
    Although OBI is OWL language, which just means that OWL contains more logic expression with its predefined relations or classes. OWL's instantiation is RDF. You can look at RDF as a simplified OWL. That's the technical base to use other ontologies, such as OBI.

    ReplyDelete