Mapping Terminologies using RDF

I received an email this morning from a colleague describing how RDF could be used to map local terms in a company's information system to standard terms for regulatory submission. It reminded me of a recent conversation on terminology mapping.

Just before leaving FDA, I was involved in a lengthy conversation about the challenges in the post-marketing world in mapping adverse events to MedDRA. The FDA expends a tremendous amount of resources coding post-marketing adverse event reports using MedDRA. MedDRA is the terminology adopted for this use case by the International Conference on Harmonisation (ICH), of which FDA is a member.

The problem is magnified because Electronic Health Record systems in the U.S. don't use MedDRA. The Office of the National Coordinator for Health Information Technology has adopted SNOMED CT and ICD-9 for medical problems/conditions (which include adverse events; see my recent post on this topic).

Needless to say, the FDA could use a mapping from SNOMED CT and ICD-9 to MedDRA. This is not an easy task, but assuming such a mapping existed, how could one implement it easily? Here the RDF provides a solution.

First, the terminologies must exist in RDF format. I recently came across this web site: the NCBO Bioportal, which makes available common medical terminology as ontologies. I have not evaluated it thoroughly, but it certainly looks promising.

Then comes the hard part...identifying concepts across terminologies that are the same.

Then comes the easy part... making links across ontologies using the owl:sameAs property. Here's an example. Let's assume the terminologies already assert the following in RDF:
meddra:10027599 rdf:type meddra:MedDRAConcept.
meddra:10027599 rdfs:label “Migraine”.
snomed:37796009 rdf:type snomed:SNOMEDConcept.
snomed:37796009 rdfs:label “Migraine (Disorder)”.

One asserts the following triple in the database:

           meddra:10027599 owl:sameAs snomed:37796009.

Then as Medwatch reports roll in from EHRs with an Adverse Event coded in SNOMED CT, one simply loads and stores the report with the SNOMED code in the knowledgebase. Any reviewer or analyst that queries the knowledgebase using the MedDRA term will automatically retrieve all the reports with the SNOMED code because the system treats them the same.

A benefit of this approach is that one doesn't have to map the entire terminology. It can be done incrementally. A large percentage of reports refer to a relatively small number of concepts. Those can be mapped first, leaving a small manual mapping process for the relatively less common terms as they come in. This would be huge improvement over today's process.

Another benefit is that one could leverage other organizations' mappings. If those are posted on the web in RDF, they can easily be imported and used. One would need additional metadata, such as provenance information, to help determine whether the mapping is reliable. We all do this now manually pretty routinely when evaluating information on the web. I'm more likely to trust a news report from www.cnn.com than from nationalenquirer.com, for example. An organization could develop a list of "trusted sources" for mapping information, or could conduct multiple searches, using different mappings from different sources, to see how they affect the search results.

The possibilities boggle the mind.


  1. Excellent feed of new interesting post. Provenance, and justificatk out ion, for mappings is a key topic. Check out: A Justification-based Semantic Framework for Representing, Evaluating and Utilizing Terminology Mappings (slides http://www.slideshare.net/kerfors/cim2014 and MIE paper http://ebooks.iospress.nl/volumearticle/37557

  2. We can not simply use owl:sameas to link two indviduals, the node in an RDF.
    Please read this paper: 'When owl:sameAs isn’t the Same: An Analysis of Identity Links on the Semantic Web' ( tw.rpi.edu/media/latest/261.pdf )
    In OWL2, a better more explicit relation to declare that two individuals are identical is : owl:sameIndividual.

    Again, when it comes to mapping, there are differences. Just as Kerstin pointed out in the slides.

    I would propose to use SKOS and Similarity Ontology proposed in the paper above.

    However, both SKOS and Similarity Ontology didn't touch the provenance of a mapping. We need this piece.

  3. Thank you for the feedback. I am new to SKOS but as I learn more about it I agree that it is a better approach than owl:sameAs. I understand it provides more nuanced mappings such as skos:broader and skos:narrower. I will take a look at Similarity Ontology.