One of my favorite quotes is by a physician, mathematician, and thought
leader David M. Eddy, MD. He said:
“The complexity of modern medicine exceeds the inherent
limitations of the unaided human mind.”
When I was a first
year Neurology resident, I remember trying to memorize all the
important clinical features of the peripheral neuropathies. When confronted with a patient with a new
onset neuropathy, this knowledge is essential in trying to determine the type
of neuropathy, and therefore the treatment. At the time there were over 100 known peripheral neuropathies. I was quickly overwhelmed. There
was no way I could manage all this knowledge in my head. Without it, I could
easily overlook a rare neuropathy. I needed a computer program that would take
as input all the clinical features of the patient’s neuropathy and provide back
an ordered list of possible neuropathies, starting with the most likely to the
least likely. I actually wrote a rudimentary program using Visual Basic to help
me manage this knowledge.
It was clear to me then that doctors need
computers to better treat patients. As a Neurologist, I remain fascinated by the interaction between our brains and these marvelous machines.
We have come a long
way in leveraging information technology to improve patient care, but we still have a long
way to go. The biggest hurdle that I see is that so much knowledge remains
stored in our brains or in free text form in medical textbooks and journals. It is not accessible to computers and information
systems in ways that enable computers to reason and create new knowledge like our brains can do now. How can we capture knowledge in a way that is searchable, usable (by both humans and computers), and persistent?
It starts by a common understanding of what
is knowledge and how we use it. It turns out a lot is written on this topic. I like the following diagram:
The Knowledge Pyramid |
First we collect data about the world around us. We then organize and summarize the data and put
it into context: this is information. Then we analyze the information to create
knowledge. Then we use that knowledge to make good decisions: that is wisdom.
Taking these concepts a bit further:
Data: are raw, unorganized facts that need to be processed. Data can be something simple and seemingly random and useless until it is organized.
Information: When data are processed, organized, structured or presented in a given context so as to make it useful, it is called information.
Knowledge: is what we know (or think we know). Think of this as the map of the world we build inside our brains. Like a physical map, it helps us know where things are – but it contains more than that. It also contains our beliefs and expectations. “If I do this, I will probably get that.” Crucially, the brain links all these things together into a giant network of ideas, memories, predictions, beliefs, etc. It is from this “model” that we base our decisions, not the real world itself. Our brains constantly update this model from the signals coming through our sensory organs. There are two sources that the brain uses to build this knowledge - data and information.
Wisdom: is the ability to make correct judgments and decisions based on knowledge. It is often an intangible quality gained through our experiences in life.
In clinical
research, an example of data would be study data tabulations that can be
represented using the CDISC Study Data Tabulation Model (SDTM). An example of
information might be the Study Report. The FDA receives data and information,
reviews and analyzes them and generates reviews. The reviews represent knowledge about
what the Agency thinks about the safety, efficacy, and quality of the product. The
decision to approve a new product represents wisdom: taking the right action
based on present (and often previous) knowledge.
How can we recruit computers to help
automate this process of converting data to information to knowledge and
eventually to wisdom? We start by making data, information, and knowledge computable. We need
computable definitions of important concepts (e.g. classes) so that computers can reason on
our behalf. We then distill knowledge into simple three-part statements using computationally-defined concepts. These statements are called triples (:Subject
:Property :Object). Next we need rules how to process these statements. Here is a simple
example:
- :Apple :isA :Fruit.
- :McIntosh :isA :Apple.
- Rule: IF A :isA B AND B :isA C, THEN A :isA C
So a computer can reason:
- :McIntosh :isA :Fruit.
Another important
step in making knowledge computable is to express the definition of a class in terms of restrictions on properties of other classes. Here’s an example.
Let’s define Entity as a physical object that has/had/will have existence. What
are some of its properties? One can imagine:
- :Size
- :Weight
- :Shape
- :Color
- :Life (Yes/No)
- :Manufactured (Yes/No)
Let’s now define BiologicEntity as
an Entity that is/was/will be living.
Computationally, if A is a member of the Entity class and A has property
:Life=Yes; then A is also a BiologicEntity.
We have defined BiologicEntity by restricting the permissible value of a property
(:Life) of another class (Entity) such that only an Entity with Life=Yes is considered a BiologicEntity. This provides an unambiguous definition of BiologicEntity from a computational standpoint.
Let’s now say a
BiologicEntity has these additional properties:
- :species
- :birthdate
- :deathdate
- :sex
Let’s now define Person as a
BiologicEntity whose :species property is restricted to homo sapiens. We now have a computationally unambiguous definition of a
Person.
Now things get
interesting. Let’s say we’re enrolling subjects for an oral contraceptive
study. We can define an EligibleSubject as any Person with certain
restrictions on two properties
- :age >= 18 and
- :sex = female
The computer can search the EHR system and
identify Persons who are also eligible subjects. This is a simple example how computable
definitions can aid in subject recruitment for a trial. In this case, we express eligibility criteria as restrictions on properties of the Person class.
Using the same
strategy, we can define (Entity)àManufacturedEntityàDrugà EffectiveDrug as restrictions on properties of other classes. What
properties of ManufacturedEntity make it a Drug? What properties of a Drug make
it an EffectiveDrug? This may not be so easy to define, but the strategy is the same to use computers to identify an EffectiveDrug based on all knowledge about the drug that is expressed in this way.
Technically for all
of this to work, we need a single data standard that can represent all
data/information/knowledge. That standard already exists: the RDF (Resource Description Framework). It is the foundation for the Semantic Web. Computable
definitions can be expressed in a type of RDF called OWL (Web Ontology Language).
Semantic Web tools exist to query and reason across RDF data from multiple
sources.
I am learning as
much as I can about semantic web standards. Already I am convinced that this technology is
the best suited to take data from multiple sources and automate the
conversion to information and knowledge, and maybe even wisdom. There is lots of work being done in this area. But we need more data, information, and knowledge expressed in the RDF and more individuals working on how to leverage this technology to solve current challenges. I’ll definitely
talk more about these standards in future posts as it relates to clinical
research data and information.
In the meantime, I encourage you to learn
more about the semantic web and its standards. Here are some interesting resources.
- The Semantic Web in Academic Medicine
- The Yosemite Project
- Bioportal
- The Open Biological and Biomedical Ontologies
- Protégé: an open source ontology editor
No comments:
Post a Comment