HICDEP 1.60 release candidate - hicdep.org - Wiki

Go to previous topic

Go to next topic

Last Post 21 Oct 2012 12:00 AM by SuperUser Account

HICDEP 1.60 release candidate

5 Replies

Author

Messages

SuperUser Account

Basic Member

Posts:289

29 Sep 2012 12:00 AM
After a few more changes in the last week, the HICDEP 1.60 specification is now in release candidate state. The PDF has been updated. To reiterate, the release process is as follows: WP4 members should forward the release candidate to any data managers or researchers they know who might have valuable feedback. As there is no TC scheduled, WP4 members should reply to this discussion with an acknowledgement of the changes, whether any issues are found or not. In 3 weeks, the wiki pages will be frozen. If any issues were found or a TC was requested during that time, a TC will be scheduled for a week after. If the release candidate is accepted it will be finalized and released, and a new set of pages will be created for the next draft version. Best regards, Mark UPDATE: As noted below, the release has been postponed to Tuesday 23.10.2012 to resolve remaining comments.

SuperUser Account

Basic Member

Posts:289

12 Oct 2012 12:00 AM
Please find below the input I have received from one of our COHERE cohort data managers on the HICDEP release version 1.60. It concerns the coding of missing date values: "I have just one comment on the missing data for the date. Why "NK" is not used for the missing day or month instead of respectively "15 and "07", and NK for a complete missing date. Lot of Data Mangement software use this kind of missing data code!"

SuperUser Account

Basic Member

Posts:289

17 Oct 2012 12:00 AM
Please find below the comments of Linda Wittkop on HICDEP 1.60: Dear all, Many thanks for the new version of the HICDEP 1.60 version. LAB_RES file - Regarding HIV-1 subtypes: I was wondering whether there should be additional information of which gene was used for subtype classification (gag, pol, env) ? I am also wondering whether there should be an additional variable for those patients having an entry in this table and no entry in the table LAB_RES_LVL_2 indicating that if there is no entry then it means that no resistance mutations were found? Or that there must be an entry in this case for the library used because otherwise it must be considered as missing value? LAB_RNA file: Should there not also be a check for RNA_V < -1 and RNA_L missing ? Even when the SOP allows detection limit as negative value: We need to make assumptions for patients with values of RNA_V = -13 or -6 etc. and missing RNA_L. Does this mean that these patients had 13 or 6 copies or does it mean that these patients had less than 13 or 6 copies, respectively? There are also a lot of entries with the form RNA_V=-49 and RNA_L=50. Should the value of RNA_V not be -1 or -50 in this case? Best regards, Linda

SuperUser Account

Basic Member

Posts:289

19 Oct 2012 12:00 AM
Thank you for the input. I am unfamiliar with this “NK” code, but looking at a number of popular database systems, statistic software and programming languages, many seem to be unable to encode missing date components, unless the entire date is missing. In case of imprecise dates I would suggest that the alternative specification of precision be used instead. The HICDEP specification currently lacks a naming convention for such additional fields; I would suggest forming the precision field's name by adding a “_P” suffix to the corresponding date field's name. I will look into the possibility of clarifying this convention, and would be interested in hearing anyone else's opinion on this issue.

SuperUser Account

Basic Member

Posts:289

19 Oct 2012 12:00 AM
The release of HICDEP 1.60 has been postponed to Tuesday 23.10.2012 to resolve the above comments.

SuperUser Account

Basic Member

Posts:289

21 Oct 2012 12:00 AM
I was wondering whether there should be additional information of which gene was used for subtype classification (gag, pol, env) ? So far we have kept the subtype format simple. Yet it may be necessary to normalise this as you propose, basically creating a specific subtype table allowing for linking between the subtype and the gene(s) used for subtype determination. I am also wondering whether there should be an additional variable for those patients having an entry in this table and no entry in the table LAB_RES_LVL_2 indicating that if there is no entry then it means that no resistance mutations were found? Or that there must be an entry in this case for the library used because otherwise it must be considered as missing value? For the next version we should try and think about a code to indicate success/failure of sequencing which in turn could be used to indicate if no mutations have been found (successful sequencing and no mutations and no sequence should be a check then). I find it difficult to have a variable that on its own is used to indicate no resistance mutations found. That would also mean that we should be working with a specific reference sequence and/or a specific mutation list which in both cases would be a challenge across cohorts. LAB_RNA file: Should there not also be a check for RNA_V < -1 and RNA_L missing ? We will change the following check tblLAB_RNA Within Table RW002 RNA_V=-1 and RNA_L missing YES to say less than 0 instead. Even when the SOP allows detection limit as negative value: We need to make assumptions for patients with values of RNA_V = -13 or -6 etc. and missing RNA_L. Does this mean that these patients had 13 or 6 copies or does it mean that these patients had less than 13 or 6 copies, respectively? There are also a lot of entries with the form RNA_V=-49 and RNA_L=50. Should the value of RNA_V not be -1 or -50 in this case? As a general rule we porpose that as much original information from the source data is kept. If the local coding is 49 for something below a limit of 50 we ask the cohorts to submit -49. In case the actual reading said 13 or 6 and these values are below the detection limit we propose that the data to submit should be -13 and -6. In both cases the cohort should inform of the detection limit. If the assumption at the source was that a reading of 50 with a detection limit of 50 is below limit we suggest that the data should be presented as -50. In all cases, regardless of value, anything negative is below detection limit (as specified at source or at the cohort level) and as far as possible the detection limit should be stated. /Jesper

Forums Default Group HICDEP Development