Ability to store QA check results - hicdep.org

Forums
Search Forum
Advanced SearchTopicsPosts

Prev Go to previous topic

Go to next topic

Last Post 14 Jan 2013 12:00 AM by SuperUser Account

Ability to store QA check results

10 Replies

Sort:


You are not authorized to post a reply.

Author

Messages

SuperUser Account

Basic Member

Posts:289

22 Nov 2012 12:00 AM
Hello On behalf of the D:A:D datamanagers I would like to suggest a format for Error checks and the result on running them. And also a possibility of capturing feedback such as 'verified' for outliers. A QA check table should include: 'Error Code' - the QA-check Identifier (e.g. 'VW004' for the weight out of acceptable range) the 'unique identifier' from the table that triggered the error (e.g. PATIENT+VIS_D for tblVIS) And the 'Verified' field to allow for feedback / Explanations the verified field should be numeric with coding: 1 = Verified (in source-doc), 2 = Plausible, .. and any other values we could find relevant An example of using the "Plausible" category is: A patient weighs 139 kg at a visit. This is confirmed. At the next visit the record says the patient weighs 140 kg. This could then be marked as plausible based on the prior knowledge of the patient. Likewise very high/low lab values for a patient that have had previous verified outliers could then be marked as plausible. The idea of keeping this Error-Table would be to limit the error-lists created at subsequent mergers. We do not need to keep checking outliers that have already been confirmed. I hope you will consider making this part of the HICDEP protocol. Best Regards, Rikke S. Brandt Datamanger for D:A:D at the Copenhagen HIV Centre.

SuperUser Account

Basic Member

Posts:289

22 Nov 2012 12:00 AM
Thank you Rikke - I fully support this proposal. Thanks, Bruno.

SuperUser Account

Basic Member

Posts:289

22 Nov 2012 12:00 AM

I have constructed a tabular form of the proposal:

Note: Fields marked bold form the unique identifier for a record of the table.

Field name	Format	Description
CHECK_ID	character	The QA check identifier
ROWKEY	character Encoding of multi-part keys needs further consideration	Unique identifier of row that failed the check
VERIFIED	numeric: 1 = Verified 2 = Plausible ...	Outcome from review of this error
COMMENT	text	Optional explanation of outcome

As noted above, the encoding of the ROWKEY field needs clarification: The content of this field may be a concatenation of multiple fields, e.g. PATIENT and TEST_ID from tblLAB_RES. At the same time, it should be possible to unambiguously decode the ROWKEY value into the set of original field values so that the original observation may be determined based on the ROWKEY. To make use of data in the HICDEP format, a common encoding/decoding schema for the ROWKEY across all cohorts is necessary, but is made difficult by the presence of arbitrary characters in PATIENT and TEST_ID.

I recommend we discuss this at the next TC.

SuperUser Account

Basic Member

Posts:289

23 Nov 2012 12:00 AM
In DAD at the moment the ROWKEY variable is converted to multiple fields instead of just one. So in fact the in the example I used to describe the issue would have two fields - PATIENT and VIS_D But I do realise that could be quite challenging if even more fields would go into the unique definition. I would be happy to be part of the next TC so that I could hear what ideas are out there. Thank you very much, Rikke

SuperUser Account

Basic Member

Posts:289

27 Nov 2012 12:00 AM

A solution with multiple fields sounds easier to handle than my suggestion. The current HICDEP draft would require 5 such fields, as the longest unique identifier, from tblLAB_RES_LVL_2, uses 5 fields. The fields would be filled from lowest (ROWKEY1) to highest (ROWKEY5), in the order as they appear in the HICDEP specification of the source table, leaving the trailing superfluous fields blank (NULL for databases and . for SAS, Stata etc).

The table would be as follows:

Note: Fields marked bold form the unique identifier for a record of the table.

Field name	Format	Description
CHECK_ID	character	The QA check identifier
ROWKEY1	character	First field of unique identifier of row that failed the check
ROWKEY2	character	Second field of unique identifier of row that failed the check
ROWKEY3	character	Third field of unique identifier of row that failed the check
ROWKEY4	character	Fourth field of unique identifier of row that failed the check
ROWKEY5	character	Fifth field of unique identifier of row that failed the check
VERIFIED	numeric: 1 = Verified 2 = Plausible ...	Outcome from review of this error
COMMENT	text	Optional explanation of outcome

Should the table containing the row also be provided, or is that specified implicitly by the check code? Some checks compare values from multiple tables, but the checks always refer to a specific table and row as the check violator.

Also, are there

Suggestions for other values of the VERIFIED field?
Suggestions for a table name?

SuperUser Account

Basic Member

Posts:289

28 Nov 2012 12:00 AM
You could either have one discrepancy table for each table that you run error checks on e.g. tblBAS_DISCREP, tblVIS_DISCREP etc. This solution would allow for less fields for the tables with fewer ID-fields - and also allows you to name the ROWKEY1-ROWKEY5 as the actual fields they relate to e.g. Patient, VIS_D etc Or you need a field to show what table the error is related to. This would be neccesary to decode what ROWKEY1-ROWKEY5 translates to. And at the present relaese of HICDEP the error codes can be the same for multiple tables. (It is neccesary to know which table had the duplicate records.) If we choose this option, perhaps we can use the name tblDISCREPANCY/tblDISCREP / Rikke

SuperUser Account

Basic Member

Posts:289

17 Dec 2012 12:00 AM
Dear, I do support the proposal to create a table for discrepancy table(s). However, I am not too fond of the code values for the VERIFIED field. I would say that the code "Verified" without any further details (verified and resolved, verified and confirmed as is) would not be very informative, and the code "Plausible" is too ambiguous. I would like to suggest to use the following coding values instead: 1 = Resolved (with data update) 2 = Confirmed as is (i.e. not a problem) 3 = Cannot be resolved (but incorrect) 4 = Data management SEC Thanks Monique

SuperUser Account

Basic Member

Posts:289

18 Dec 2012 12:00 AM
I agree with Monique that codes could be more informative. What is SEC?? Bruno

SuperUser Account

Basic Member

Posts:289

08 Jan 2013 12:00 AM
SEC = Self Evident Correction

SuperUser Account

Basic Member

Posts:289

11 Jan 2013 12:00 AM
Coming back to the issue of one or many tables, either solution has up- and downsides: For a single table -All keys must be strings, making queries difficult -The PK-column names will be generic, making it more difficult to tell which column corresponds to which PK-column in the target tables. +All QA check outcomes are in one place, making it easier to summarize the status/quality of data checks For multiple tables -No precise specification of the table is possible as every discrepancy table will be different +Each discrepancy table will use the same PK-columns as its target table, resulting in easier joins and no guessing as to the key correspondancies Although my personal preferance would be for the second option, this is mostly a technical issue in the end, so if either option is easier for Rikke to implement in the DDM tool I would choose that.

SuperUser Account

Basic Member

Posts:289

14 Jan 2013 12:00 AM
I will see what makes most sense within the tool. Regards, Rikke

You are not authorized to post a reply.

Forums Default Group HICDEP Development