2017-06-26

The Semantic Web Way of Thinking

Before I knew anything about the Resource Description Framework (RDF) and the Semantic Web, I would say that I had a traditional way of thinking about the world. Take a clinical trial for instance. First you have a research idea or hypothesis, then you design a trial to test that hypothesis, then you write the protocol, recruit subjects, conduct the trial, analyze the data, and reach conclusions. All of these steps are important but I failed to see any commonality in any fundamental way. For example, writing the protocol and analyzing the data are very different processes needing very different skill sets. How could they possibly be similar?

Enter the RDF and suddenly everything is related in some way with everything else. It may be obvious but no less profound to notice that everything in the world is a type of "Thing." The way we come to understand the world via the scientific process is to classify Things, group Things, separate Things into different buckets or classes based on their properties. Here's an obvious example, written in pseudo-RDF-turtle syntax.

A Car is a subClassOf Thing.
A Red Car is a subClassOf Car.

How does a computer know a car is a red car? Well, one can define a property of Car called Color and one of the options for Color is Red.

A Red Car is [any Car with Color value = Red].

So I can ask the computer to find every Red Car and it knows to look for those with Color property is Red. This is very straightforward, almost to the point of being insultingly simple. But wait...

To make scientific discoveries, we first identify what properties are important for the type of Thing we are studying, and we measure those properties. Let's say you have an investigational drug A and you want to know if it's effective for Multiple Sclerosis. You have the following assertions.

DrugA is a subClassOf Thing.
EffectiveDrug is a subClassOf Drug.

How can a computer discover that DrugA is an EffectiveDrug?  In the same way as the Red Car example, there are properties of DrugA that semantic web tools can analyze to determine that the drug is an EffectiveDrug. Sometimes those properties are difficult to define, or difficult to measure, but the principle is the same.

So, getting back to the different steps in the lifecycle of a study, they are also Things that we can call Activities. There are rules that determine when Activities begin and end, and rules that determine which Activity is performed next. One can define a property of Activities called State or Status (e.g. not yet started, ongoing, completed, aborted). So the lifecycle of a study is broken down to a series of activities, each with its own properties: hasState; hasStart Rule; hasEnd Rule. Suddenly processes that look very different now look very similar.

This is the semantic web way of thinking. The universe is made up of things: similar things and different things. All are grouped together and distinguished from one another by their properties. The challenge is identifying those properties that matter, documenting them, and using semantic web tools to do the grouping and sorting for us. This is how the Semantic Web can work for us and help us make new discoveries.