Validating Transformation to CDISC SDTM and ADaM

You have just finished transforming your data from your operational clinical database into SDTM and ADaM, now how do you go about validating this? In general you want to have a method that reviews both the structure according to the guidelines and the values of the data to ensure that nothing is changed from what was collected. The following steps uses SAS tools to validate and ensure the integrity of your final CDISC data.

  1. Transformation Model Validation- A transformation model documenting the source data and how it was transformed confirming the destination and source variables.



  2. Data Value Subset Review - An automated report printing out a subset of the data before and after the transformation is reviewed and validated.  This may catch truncation.


     
  3. Categorical Aggregate Review - An automated summary report is generated summarizing the frequency counts of categorical variables verifying the counts are the same.  This catches missing or dropped values.


      
  4. Continuous Aggregate Review - An automated summary report is generated summarizing the min, max median  counts of continuous variables verifying the counts are the same.  This catches missing or dropped values.

     
  5. CDISC Rules PROC CDISC - SAS tools such as PROC CDISC provides a short list of deviations or guidelines that may have been violated.  This review is applied programmatically and a report is generated.


  6. Variable Lengths - An evaluation of all variable lengths and a report is generated with recommendations on standardizing lengths for variables across all data to adhere to standards.


     
  7. Deviation Summary - A summary report documenting all deviations and their resolutions.


     
  8. Test Plan - A formal test plan document is used to document all the related tests and deviations.

  9. CDISC Builder Rule Test - An 18 criteria check list.  The list are shown here with an example report shown below:
    1. Required Fields: Required identifier variables including: DOMAIN, USUBJID, STUDYID and --SEQ.
    2. Subject Variable: (4.1.2.3) For variable names, labels and comments, use the word "Subject" when referring to "patients" or "healthy volunteer".
    3. Variable Length: (4.1.2.1) Variable names are limited to 8 characters with labels up to 40 characters.
    4. Yes/No: (4.1.3.7) Variables where the response is Yes or No (Y/N) should normally be populated for both Yes and No responses.
    5. Date Time Format: (4.1.4.1) Date or Datetime must be in ISO 8601 format.
    6. Study Day Variable: (4.1.4.4) Study day variable has the name ---DY.
    7. Variable Names: (3.2.3) If any variable names used matches CDISC variables, the associated label has to match.
    8. Variable Label: (3.2.3) If any variable labels match that of CDISC labels, the associated variable has to match.
    9. Variable Type: (3.2.3) If any variables match that of CDISC variables, the associated type has to match.
    10. Dataset Names: (3.2.3) If any of the dataset names match CDISC, the associated data label has to match.
    11. Dataset Labels: (3.2.3) If any of the dataset label match CDISC, the associated dataset name  has to match.
    12. Abbreviations: (10.3.1) (10.4) The following abbreviations are suggested for variable names and data sets.
       

      Acronym

      Descriptive Text

      AEAdverse Events
      AUAutopsy
      BMBone Mineral Density (BMD) Data
      BRBiopsy
      CMConcomitant Meds
      COComments
      DADrug Accountability
      DCDisease Characteristics
      DMDemographics
      DSDisposition
      DVProtocol Deviations
      EEEEG
      EGEEG
      EXExposure
      HUHealthcare Resource Utilization
      IEInclusion/Exclusion
      IMImaging
      LBLaboratory Data
      MBMicrobiology Specimens
      MHMedical History
      MLMeal Data
      MSMicrobiology Susceptibility
      OMOrgan Measurements
      PCPK Concentration
      PEPhysical Exam
      PPPK Parameters
      PGPharmacogenomics
      QSQuestionnaires
      SCSubject Characteristics
      SESubject Elements
      SGSurgey
      SKSkin Test
      SLSleep (Polysomnography) Data
      SLSigns and Symptoms
      STStress (Exercise) Test Data
      SUSubstance Use
      SVSubject Visits
      TATrial Arms
      TETrial Elements
      TITrial Inclusion/Exclusion Criteria
      TSTrial Summary
      TVTrial Visits
      VSVital Signs
      CANACTION
      ADJADJUSTMENT
      ADJANALYSIS DATASET
      BLBASELINE
      BRTHBIRTH
      BODBODY
      CANCANCER
      CATCATEGORY
      CCHARACTER
      CNDCONDITION
      CLASCLASS
      CDCODE
      COMCOMMENT
      CONCONCOMITANT
      CONGCONGENTTAL
      DTCDATE TIME - CHARACTER
      DYDAY
      DTHDEATH
      DECODDECODE
      DRVDERIVED
      DESCDESCRIPTION
      DISABDISABILITY
      DOSDOSE
      DOSDOSAGE
      DOSEDOSE
      DOSEDOSAGE
      DURDURATION
      ELELAPSED
      ETELEMENT
      EMEMERGENT
      ENDEND
      ENEND
      ETHNICETHNICITY
      XEXTERNAL
      EVALEVALUATOR
      EVLEVALUATION
      FASTFASTING
      FNFILENAME
      FLFLAG
      FRMFORMULATION, FORM
      FREQFREQUENCY
      GRGRADE
      GRPGROUP
      HIHIGHER LIMIT
      HOSPHOSPITALIZATION
      IDIDENTIFIER
      INDCINDICATION
      INDCINDICATOR
      INTINTERVAL
      INTPINTERPRETATION
      INVINVESTIGATOR
      LIFELIFE-THREATENING
      LOCLOCATION
      LOINCLOINC CODE
      LOLOWER LIMIT
      MIEMEDICALLY-IMPORTANT EVENT
      NAMNAME
      NSTNON-STUDY THERAPY
      NRNORMAL RANGE
      NDNOT DONE
      NUMNUMBER
      NNUMERIC
      ONGOONGOING
      ORDORDER
      ORIGORIGIN
      ORORIGINAL
      OTHOTHER
      OOTHER
      OUTOUTCOME
      ODOVERDOSE
      PARMPARAMETER
      PATTPATTERN
      POPPOPULATION
      POSPOSITION
      QUALQUALIFIER
      REASREASON
      REFREFERENCE
      RFREFERENCE
      RGMREGIMEN
      RELRELATED
      RRELATED
      RELRELATIONSHIP
      RRELATIONSHIP
      RESRESULT
      RLRULE
      SEQSEQUENCE
      SSERIOUS
      SERSERIOUS
      SEVSEVERITY
      SPECSPECIMEN
      SPCSPECIMEN
      SPECSPONSOR
      SPCSPONSOR
      STSTANDARD
      STDSTANDARD
      STSTART
      STDSTART
      STATSTATUS
      SCATSUBCATEGORY
      SUBJSUBJECT
      SUPPSUPPLEMENTAL
      SYSSYSTEM
      TXTTEXT
      TMTIME
      TPTTIMEPOINT
      TOTTOTAL
      TOXTOXICITY
      TRANSTRANSITION
      TRTTREATMENT
      UUNIT
      UUNIQUE
      UPUNPLANNED
      VARVARIABLE
      VALVALUE
      VVEHICLE
    13. SEQ Values: When the --SEQ variable is used, it must have unique values for each USUBJID within each domain.
    14. Label Casing: For Dataset labels and variable labels, all non trivial words (more than three characters) must start with a capital letter with the rest of the characters lowercase.
    15. Required Values: (4.1.1.5) For required fields such as the ones specified in number 1, check to see if there are values.  If there are any missing, values, report the observation number where it is missing.
    16. Similar Parenthesis:  For labels with matching values inside parenthesis such as (Yes/No) within the same dataset, it will check to see if the variables have the same type and length.  If not, it will report the differences.
    17. Required Variables: (4.1.1.5) A Required variable is any variable that is basic to the identification of a data record (i.e., essential key variables and a topic variable) or is necessary to make the record meaningful. Required variables should always be included in the dataset and cannot be null for any record.
    18. Expected Variable: (4.1.1.5) An Expected variable is any variable necessary to make a record useful in the context of a specific domain. Columns for Expected variables are assumed to be present in each submitted dataset even if some values are null.

Comments

Popular posts from this blog

How to Get a Job as a SAS Programmer

Remembering Dad