2017-03-01

What's in a Name?

Standardizing clinical trial data is all about automation. Standard data enable automated processes that bring efficiency and less human error. But automating a process, for example, an analysis of a lab test across multiple subjects in a trial, requires computers and information systems to be able to unambiguously identify that lab test. This is called computable semantic interoperability (CSI). The key is "computable." It's not enough that a human can identify the lab test of interest, but computers need to do the same.  I previously wrote about the interoperability problem and I revisit it here today, focusing on test names.

There are two situations that impede CSI: [1] when the same Thing goes by two different names, or even more troublesome [2] when two different Things go by the same name.  When I say Mustang do I mean the car, or the horse? Some describe the term Mustang is "overloaded" because it can represent more than one Thing. Issue #1 is addressed by controlled terminology. Synonyms can then be mapped to a controlled term that all agree to use. Issue #2 is more challenging, but it is avoidable by assigning different names to different things. I consider this a best practice to promote CSI.

As an example, let's look at the CDISC controlled term "glucose" (code C105585). The definition is "a measurement of the glucose in a biological specimen." The reality is that a serum glucose and a urine glucose are two completely different tests, having different clinical meaning and interpretation. I have been advocating for more granular lab test names for a long time so that computers can easily distinguish different tests. The counter-argument is that serum glucose is really two concepts: the specimen and the "thing" being measured (known as the component, or analyte in LOINC), and therefore should be represented as two different variables. In fact, the SDTM does have a separate field for specimen information (LBSPEC), and don't get me wrong, there is value is separate specimen information, but that doesn't diminish the need for different test names. The problem is, one has to tell or program a computer "if test=glucose, look at specimen information to pick out the correct glucose test." But what about another observation, say "Occurrence Indicator" (an FATEST as described in the Malaria Therapeutic Area User's Guide). One must know to look at another field (FAOBJ) to understand that the occurrence is a fever, or a chill. Where to look for that additional data is not always obvious and varies by test. In the Malaria example, we have two different occurrences and they should each have their own name: Fever Indicator, Chills Indicator.

There are two problems with relying on other data fields to disambiguate an overloaded concept: [1] keeping track of which field to disambiguate which test is onerous, and [2] new lab tests are being added all the time. (By the way, LOINC avoids this problem by assigning different codes to different tests and providing separate data fields for analyte, source, method, etc.)

This problem became clear to me when a colleague at FDA, who was using an automated analysis tool and was analyzing serum glucose levels among thousands of patients and was getting funny results. After quite some digging, she realized the tool was pooling serum and urine glucoses. She and I knew to look at LBSPEC. The tool, however, wasn't smart enough to do so. I wonder how many other analyses of other tests have this problem and go unrecognized.

So, in the interest of promoting true computable semantic interoperability without burdening data recipients with unnecessary algorithms to disambiguate overloaded terms, please remember to name different things differently. It can be that simple.






No comments:

Post a Comment