In a recent post, I discussed using the Resource
Description Framework (RDF) and the Web Ontology Language (OWL) for biomedical
information. In this post, I discuss how we can start applying semantic web
modeling principles in BRIDG to bring greater semantic clarity to the model. I
believe BRIDG should evolve so that it becomes a computable ontology; i.e., the canonical representation of the model
one day is in OWL and from which other representations can be derived that implementers
find useful. This brings other benefits, such as easily incorporating other biomedical ontologies by reference, but that's another future topic. For various reasons, we are unlikely to realize this vision in the short-term. However, we
can take incremental steps in that direction. Today I discuss how to refine
BRIDG definitions to enable their computational representation in OWL to facilitate computer-assisted reasoning. We can start by expressing the definition of a class (as much as we can) as
restrictions on properties of other defined BRIDG classes. We should avoid introducing new concepts in these definitions that haven't already been defined elsewhere in the model.
As an example, let’s review the definitions in
BRIDG 4.0 of classes related to a Research Project. As I look at the existing definitions, it is not always clear how these concepts are related to each
other or distinguishable from one another. Here is the current state:
BRIDG Class: Project
BRIDG Definition: A set of coordinated
activities that is intended to achieve one or more objectives.
BRIDG Examples: The Cancer Genome Atlas
(TCGA)
The Breast and Colon Cancer Family Registries
My Comment: The definition is quite broad.
It can include things like construction projects, I.T. projects, etc. This is
fine insofar as it forms the basis for the definition of other sub-classes
relevant to the scope of BRIDG.
BRIDG Class: ResearchProject
BRIDG Generalization: Project
BRIDG Definition: A set of coordinated
activities that is intended to test one or more hypotheses or lead to
discoveries.
BRIDG Examples: A project to identify
genetic biomarkers for cancer prognosis
A phase 2 clinical trial to test whether an
experimental treatment is effective.
An epidemiological study to determine whether there
is a correlation between an exposure and a disease.
My Comment: Project is a generalization of
ResearchProject, therefore the definition of ResearchProject should describe
what properties of a Project make it a ResearchProject.
My Proposed Definition: A Project
whose objectives include the testing of one or more biomedical hypotheses or
the generation of biomedical discoveries (i.e. new hypotheses)
Discussion: It’s clear from the current BRIDG
definition that a Project has objectives. A ResearchProject
should be a Project whose objectives are restricted to certain
biomedical research objectives: biomedical hypothesis testing or new hypothesis
generation.
BRIDG Class: Experiment
BRIDG Generalization:
ResearchProject
BRIDG Definition: A formal
investigation, typically not subject to governmental oversight and regulation,
that is intended to test hypotheses or lead to discoveries
BRIDG Examples: Gene expression
experiment intended to discover novel genetic biomarkers.
Physicochemical characterization of nanoparticles.
My Comment: The definition should relate
Experiment to ResearchProject. We should drop “formal investigation” since it
is not defined elsewhere in BRIDG and causes ambiguity. We should drop
“typically not subject to governmental regulation and oversight” since it means
an experiment in one jurisdiction may not be an experiment in another
jurisdiction, or the same experiment today may not be an experiment tomorrow if government regulations change. The definition of Experiment should be based on intrinsic
properties of the concept.
My Proposed Definition: A ResearchProject
whose objectives include the generation of biomedical discoveries (e.g. new
biomedical hypotheses)
My Additional Comment: This
proposed definition clearly distinguishes a ResearchProject from an Experiment.
Whereas a ResearchProject may be designed to test or generate hypotheses, Experiments are designed only to generate hypotheses. Further
discussion is needed among the life sciences community to determine whether
this distinction is useful in reality. Regardless, we should identify
properties of ResearchProject that make it an Experiment. If we can't then
maybe we are dealing with synonyms.
BRIDG Class: StudyProtocol
BRIDG Generalization:
ResearchProject
BRIDG Definition: A discrete, structured
plan (that persists over time) of a formal investigation to assess the utility,
impact, pharmacological, physiological, and/or psychological effects of a
particular treatment, procedure, drug, device, biologic, food product,
cosmetic, care plan, or subject characteristic.
BRIDG EXAMPLE(S):
ClinicalTrials.gov study NCT01632332 Vaccine
Therapy in Treating Patients With Previously Treated Stage II-III HER2-Positive
Breast Cancer. The study protocol includes the elements identified in the
NOTE(S) section.
BRIDG NOTE(S):
The term "protocol" is somewhat
overloaded and must be qualified to provide semantic context. Therefore
the term "study protocol" was chosen to disambiguate it from other
protocols. The notion of a study protocol includes (but is not limited to) the
design, statistical considerations, activities to test a particular hypothesis
or answer a particular question that is the basis of the study,
characteristics, specifications, objective(s), background,
pre-study/study/post-study portions of the plan (including the design,
methodology, statistical considerations, organization). The study may be
of any type that involves subjects, including prevention, therapeutic, interventional
or observational. Subjects involved in the study protocol may be
biological entities (human, animal, specimen, tissue, organ, etc.) or products.
The study protocol is related to other supporting documents, including (but not
limited to) informed consent documents, case report forms (CRFs), regulatory
and approval documentation, correlative studies, etc. (via the inherited
association to DocumentVersionRelationship). The complete notion of the
study protocol is represented in BRIDG by the classes StudyProtocol,
StudyProtocolVersion, StudyProtocolDocument, StudyProtocolDocumentVersion,
StudyConduct and all their associations.
- The StudyProtocol class represents the content of
the study protocol which includes characteristics and plan of the study which
can be distilled into or abstracted from a version of the study protocol
document and can exist even before the information is put into document form.
- The StudyProtocolVersion class represents the
details of the study protocol that may change over time.
- The StudyProtocolDocument class represents the
document form of the study protocol and is a grouping of the various study
protocol document versions.
- The StudyProtocolDocumentVersion class represents
the document form of the study protocol version and is the details of the study
protocol document that may change over time.
- The StudyConduct class represents the execution
of a study based on a study protocol definition which includes the scheduled
and performed activities that are subject-specific as well as study-level and
site-level activities.
My Comment: I see numerous problems with
this class and its definition: [1] StudyProtocol is the plan for a project, not
the project itself, which is a Study. The generalization to ResearchProject
is incorrect. The generalization should be to ResearchProjectPlan, if
that class existed. (This distinction is semantically important in the same way
a recipe is different from the meal or the blueprint is different from the
building.); [2] The definition includes the concept of a “formal
investigation.” This is not defined elsewhere in BRIDG and results in
ambiguity. It should be removed; [3] the definition should relate to a
plan for a ResearchProject and what specific attributes/properties of
that plan make it a StudyProtocol.
Before considering a more precise definition for
StudyProtocol, we need to define a Study in BRIDG and also a ResearchProjectPlan (I intentionally defer proposing a definition for this new class here but I welcome comments).
The definition for StudyProtocol can then be defined as the
ResearchProjectPlan for a Study.
Proposed New BRIDG Class: Study
Generalization: ResearchProject
My Proposed Definition: A ResearchProject
whose objectives include testing or confirming biomedical hypothesis(-es). The
notion of a study includes (but is not limited to) the design, the planned and
performed activities, and the analysis plan and its execution and
documentation, to test a particular hypothesis or answer a particular question
that is the basis of the study. The study may be of any type that
involves subjects, including prevention, therapeutic, interventional or
observational objectives on the subject. Subjects involved in the study
may be biological entities (human, animal, specimen, tissue, organ, etc.) or
products.
My Comment: I think the lack of a
Study class in BRIDG is a big gap that needs to be filled. We have had extensive discussions on
this in the BRIDG Working Group calls, and there are historical reasons why a
Study class was not added. Nonetheless, I think BRIDG needs a Study class, and the proposed definition clearly distinguishes a Study from an Experiment.
Whereas a Study is performed to test biomedical hypotheses, an Experiment is
performed to generate biomedical hypotheses (i.e. discoveries). It’s important
to note that these are not disjoint classes. Since a ResearchProject may have
multiple objectives, some to test hypotheses, others to generate hypotheses, a
particular ResearchProject could be both a Study and an Experiment. I don’t see
this as a problem and OWL handles classes that are not disjoint routinely.
My Additional Note: The
existing BRIDG subclasses InVitroCharacterization and InVivoCharacterization
and PhysicoChemicalCharacterization, which currently apply only to Experiment can apply to both a Study and an
Experiment. The key distinction is whether the objectives are hypothesis
generating or hypothesis testing. The distinction between the subclasses depend
on properties of the ExperimentalUnit of the Study or Experiment.
One potential problem with these proposed definitions is that certain research projects that are typically called “studies” (e.g. early nonclinical studies during drug development) may not be studies based on this definition because they are hypothesis-generating experiments. I think that any complex domain will contain concepts that are used differently among different stakeholders. Developing an ontology with semantically precise definitions I think is an important step to decrease this type of variability gradually over time.
In summary, I think this approach is what is needed to obtain the level of semantic clarity to establish BRIDG as a computable ontology. Unfortunately, it requires a manual review of all existing definitions, which is hugely labor intensive. I think such a review will reveal classes that can be collapsed or re-classified as subclasses of other classes, hopefully resulting in a simpler model that will be easier to understand, maintain, and implement.
One potential problem with these proposed definitions is that certain research projects that are typically called “studies” (e.g. early nonclinical studies during drug development) may not be studies based on this definition because they are hypothesis-generating experiments. I think that any complex domain will contain concepts that are used differently among different stakeholders. Developing an ontology with semantically precise definitions I think is an important step to decrease this type of variability gradually over time.
In summary, I think this approach is what is needed to obtain the level of semantic clarity to establish BRIDG as a computable ontology. Unfortunately, it requires a manual review of all existing definitions, which is hugely labor intensive. I think such a review will reveal classes that can be collapsed or re-classified as subclasses of other classes, hopefully resulting in a simpler model that will be easier to understand, maintain, and implement.
Excellent blog! I look forward reading all posts.
ReplyDeleteThe lack of a Study class is striking and also discussed in the PhUSE groups for a semantic web representation of the Protocol and of the Development program.
I think it can be useful to investigate the use of a "full-fledge" ontology: Ontology for Biomedical Investigations http://obi-ontology.org/ e.g. Study http://purl.obolibrary.org/obo/OBI_0000073 and Study Interventions http://purl.obolibrary.org/obo/OBI_0000931
Such a Study class can also be related to the "simplistic" schema being used by Google/Yahoo/Microsoft in improving search by defining types of entities described on webpages about for example; https://schema.org/MedicalObservationalStudy
Armando, I think you highlight one large gap, but there are others throughout BRIDG that make me ponder if is the right way forward longer term. As Kerstin points out, should we be exploring what is done outside of our world and look at other medical models and industries to identify a more flexible approach. Just a thought.
ReplyDeleteKerstin, Chris, thank you for your comments. I agree we need to explore and evaluate other ontologies. I recently became aware of the OBO http://www.obofoundry.org and there seems to be a lot of good work there. They reference the OBI. I also found the NCBO Bioportal http://bioportal.bioontology.org. If nothing else, RDF seems to represent a common language to more easily compare, combine, incorporate by reference .... other models.
ReplyDeleteOntology is built on the top of RDF.
ReplyDeleteAlthough OBI is OWL language, which just means that OWL contains more logic expression with its predefined relations or classes. OWL's instantiation is RDF. You can look at RDF as a simplified OWL. That's the technical base to use other ontologies, such as OBI.