Computable Medical Knowledge: The Resource Description Framework

One of my favorite quotes is by a physician, mathematician, and thought leader David M. Eddy, MD. He said:

“The complexity of modern medicine exceeds the inherent limitations of the unaided human mind.”

When I was a first year Neurology resident, I remember trying to memorize all the important clinical features of the peripheral neuropathies.  When confronted with a patient with a new onset neuropathy, this knowledge is essential in trying to determine the type of neuropathy, and therefore the treatment. At the time there were over 100 known peripheral neuropathies. I was quickly overwhelmed. There was no way I could manage all this knowledge in my head. Without it, I could easily overlook a rare neuropathy. I needed a computer program that would take as input all the clinical features of the patient’s neuropathy and provide back an ordered list of possible neuropathies, starting with the most likely to the least likely. I actually wrote a rudimentary program using Visual Basic to help me manage this knowledge.

It was clear to me then that doctors need computers to better treat patients. As a Neurologist, I remain fascinated by the interaction between our brains and these marvelous machines. 

We have come a long way in leveraging information technology to improve patient care, but we still have a long way to go. The biggest hurdle that I see is that so much knowledge remains stored in our brains or in free text form in medical textbooks and journals. It is not accessible to computers and information systems in ways that enable computers to reason and create new knowledge like our brains can do now. How can we capture knowledge in a way that is searchable, usable (by both humans and computers), and persistent?

It starts by a common understanding of what is knowledge and how we use it. It turns out a lot is written on this topic. I like the following diagram:
The Knowledge Pyramid

First we collect data about the world around us. We then organize and summarize the data and put it into context: this is information. Then we analyze the information to create knowledge. Then we use that knowledge to make good decisions: that is wisdom.

Taking these concepts a bit further:
Data: are raw, unorganized facts that need to be processed. Data can be something simple and seemingly random and useless until it is organized.
Information: When data are processed, organized, structured or presented in a given context so as to make it useful, it is called information.
Knowledge: is what we know (or think we know). Think of this as the map of the world we build inside our brains. Like a physical map, it helps us know where things are – but it contains more than that. It also contains our beliefs and expectations. “If I do this, I will probably get that.” Crucially, the brain links all these things together into a giant network of ideas, memories, predictions, beliefs, etc. It is from this “model” that we base our decisions, not the real world itself. Our brains constantly update this model from the signals coming through our sensory organs. There are two sources that the brain uses to build this knowledge - data and information.
Wisdom: is the ability to make correct judgments and decisions based on knowledge. It is often an intangible quality gained through our experiences in life.

In clinical research, an example of data would be study data tabulations that can be represented using the CDISC Study Data Tabulation Model (SDTM). An example of information might be the Study Report. The FDA receives data and  information, reviews and analyzes them and generates reviews. The reviews represent knowledge about what the Agency thinks about the safety, efficacy, and quality of the product. The decision to approve a new product represents wisdom: taking the right action based on present (and often previous) knowledge.

How can we recruit computers to help automate this process of converting data to information to knowledge and eventually to wisdom? We start by making data, information, and knowledge computable. We need computable definitions of important concepts (e.g. classes) so that computers can reason on our behalf. We then distill knowledge into simple three-part statements using computationally-defined concepts. These statements are called triples (:Subject :Property :Object). Next we need rules how to process these statements. Here is a simple example:
  • :Apple :isA :Fruit.
  • :McIntosh :isA :Apple.
  • Rule: IF A :isA B AND B :isA C, THEN A :isA C  
So a computer can reason:
  • :McIntosh :isA :Fruit. 

Another important step in making knowledge computable is to express the definition of a class in terms of  restrictions on properties of other classes. Here’s an example.

Let’s define Entity as a physical object that has/had/will have existence. What are some of its properties? One can imagine:
  • :Size
  • :Weight
  • :Shape
  • :Color
  • :Life  (Yes/No)
  • :Manufactured (Yes/No)
Let’s now define BiologicEntity as an Entity that is/was/will be living.  Computationally, if A is a member of the Entity class and A has property :Life=Yes; then A is also a BiologicEntity.

We have defined BiologicEntity by restricting the permissible value of a property (:Life) of another class (Entity) such that only an Entity with Life=Yes is considered a BiologicEntity. This provides an unambiguous definition of BiologicEntity from a computational standpoint.

Let’s now say a BiologicEntity has these additional properties:
  • :species
  • :birthdate
  • :deathdate
  • :sex

Let’s now define Person as a BiologicEntity whose :species property is restricted to homo sapiens.  We now have a computationally unambiguous definition of a Person.

Now things get interesting. Let’s say we’re enrolling subjects for an oral contraceptive study. We can define an EligibleSubject as any Person with certain restrictions on two properties 
  • :age >= 18 and 
  • :sex = female
The computer can search the EHR system and identify Persons who are also eligible subjects. This is a simple example how computable definitions can aid in subject recruitment for a trial. In this case, we express eligibility criteria as restrictions on properties of the Person class. 

Using the same strategy, we can define (Entity)àManufacturedEntityàDrugà EffectiveDrug as restrictions on properties of other classes. What properties of ManufacturedEntity make it a Drug? What properties of a Drug make it an EffectiveDrug? This may not be so easy to define, but the strategy is the same to use computers to identify an EffectiveDrug based on all knowledge about the drug that is expressed in this way. 

Technically for all of this to work, we need a single data standard that can represent all data/information/knowledge. That standard already exists: the RDF (Resource Description Framework). It is the foundation for the Semantic Web. Computable definitions can be expressed in a type of RDF called OWL (Web Ontology Language). Semantic Web tools exist to query and reason across RDF data from multiple sources.

I am learning as much as I can about semantic web standards. Already I am convinced that this technology is the best suited to take data from multiple sources and automate the conversion to information and knowledge, and maybe even wisdom. There is lots of work being done in this area. But we need more data, information, and knowledge expressed in the RDF and more individuals working on how to leverage this technology to solve current challenges. I’ll definitely talk more about these standards in future posts as it relates to clinical research data and information.

In the meantime, I encourage you to learn more about the semantic web and its standards. Here are some interesting resources.

No comments:

Post a Comment