Everyone involved in planning, conducting, analyzing clinical trials wants quality data, but what do we mean by “quality” data in clinical trials? Here is a definition that I ran across that I particularly like. I don’t know who wrote this. If you know, please leave a comment:
In clinical trials, quality data support the objectives and analyses described in the protocol and accurately reflect a subject’s experience related to those objectives.
In the time I have worked helping to standardize clinical trial data, I continue to run across the misconception that standardizing the data somehow make it higher quality data. It is important to understand that
Standardized Data does NOT equal quality data. However,
Standardized data makes it easier to assess data quality.
Let’s explore these two statements further.
Quality Data is all about what is collected, available for analysis, reported, and from FDA’s perspective, submitted for regulatory review. What gets collected, analyzed, reported depends on good science and regulatory policy.
High Quality data are the result of many factors:
- Good protocol design
- Good study execution
- Good data collection and management processes
- Qualified research staff
- Others ….
Standardized data is all about how to structure the data to make it more useful. In the SDTM, it means standard domain names, file names, standard tabular structures, standard terminology, and standard data types.
Well Standardized data are the result of good:
- Data standards
- Understanding of the data standards
- Implementation of the data standards
- Conformance to the data standards
Here are some examples of poorly standardized, high quality data:
Determining quality is a slow, manual process. It requires visual inspection of the data. One cannot write a reusable software program to automatically check that the age of the subject is reasonable.
Here are some examples of well standardized, but poor quality data:
Because the data are standardized, we can now automate quality checks. We can write a computer program to check data quality or us:
- If AGE is <missing> then generate quality alert report
- If [AGE > 120 and AGEU = YEARS] then generate quality alert report
Also, these quality checks can be built into the electronic data capture (EDC) instrument to improve the data collection processes that do affect quality.
We are surrounded by data quality checks at the point of data collection. We’ve become used to them. How often do you fill out an online form and try to submit it, only to be alerted that a particular data field is not filled out properly?
|Data quality check on usps.com|
We need more quality checks in clinical trials data collection processes.
So what about validation rules? There are two types: conformance rules (how well the data conform to the data standards) and quality checks (sometimes also called business rules).
Conformance rules depend on the standard. If the standard changes, the rules may change.
Quality checks are data standards independent. They make sense whether the data are standardized or not. An age less than 0 is a data quality problem, whether we are dealing with legacy data or standardized data. Standardized data do however enable data quality checks. This is a huge benefit.
Conformance rules are best managed by the standards development organization that creates and maintains the standard. They understand what it means to be standards-conformant. The users of the data best maintain quality checks. They understand the implications if data are missing or of poor quality.
In the case of SDTM submissions to FDA, the current set ofvalidation rules contains a mixture of conformance rules and quality checks. It makes sense to move to a governance model where CDISC manages the conformance rules and FDA manages the quality checks.