See Clinical Data Acceptance Testing Procedure and CDM: Edit Checks, Teaching an Old Dog New Tricks training sessions.
See CDM: Edit Checks - Teaching Old Dogs New Tricks Training Video
Clinical Data Management
Good Clinical Data Management Practices, Version 4, October 2005
SCDM: Data Basics Newsletters SmartSheet Tool
Managing Data in Clinical Research [Presentation]
Quality Management in Clinical Trials
5 data management best practices to help you do data right blog
What is necessary to provide good clinical data for a clinical trial?
Data Cleanup Presentation and Tools [Course Notes]
Macro to Conduct Consistency Checks [Presentation]
Best Practices in Data Management [Reference]
1. Clinical Trial Laboratory Data Management using the SAS® System, Marianne Hack
4. Create Your Customized Case Report Form (CRF) Tracking System, Tikiri Karunasundera
5. Missing Pages Report, David Gray, Zhuo Chen [Missing expected scheduled visits and case report form pages, macro]
6. Annotate CRFs and Keep Track of them, Rohit Banga
7. Data Management Metrics: how to use our instruments, Cristina Anselm [Key Performance Index]
8. A SAS Macro for Rapid Identification of Missing Data Records, Kim Truett [Impute]
9. A Visual Approach to Monitoring Case Report Form Submission During Clinical Trials, Rebecca Horney, Karen Jones, Annette Wiseman [Missing expected case report form pages]
10. A SAS Macro for Rapid Identification of Missing Data Records, Kim Truett
11. Data in the Doughnut Hole: Using SAS® to Report on What is NOT There, Sarah Woodruff
12. Standardizing Data Cuts, Mercidita Navarro [Presentation]
13. Data management in clinical research: An overview
Data Quality (Edit Checks)
Type of Data Issue |
Brief Description (Take advantage of SAS Procedure's strengths) |
Codelist: Acceptable Values |
Values are one of the valid values for variable (PROC FREQ) |
Positive Values | Values are non-negative |
Consistency Across Variables |
Values are consistent across multiple variables, ex. sequence of dates (PROC FREQ) |
Consistency Across Data sets |
Values are consistent across multiple data sets, ex. same variable in multiple datasets (PROC SQL) |
Consistency Across Lab Dates | For multiple source of lab data, Lab Dates (Min, Max) of each lab analyte should be on same date |
Consistency Across Lab Analytes | For multiple lab analytes, check for any missing expected lab analyte |
Consistency Across Lab Analytes units |
For multiple lab analytes, check for any inconsistent lab analyte units |
Non-duplicate Records | Each record is unique and not duplicated by key variable (PROC SORT) |
Overlapping Records | Records that inappropriately overlap. For example, treatment records that overlap cycles in an oncology study (LAG()) |
Protocol Compliance Rules | Study specific logic-based check to confirm data compliance, ex. lab conversion (IF-THEN) |
Range check | Values are within specified range (PROC MEANS) |
Required Value |
Value is non-missing, use PROC FORMAT to group non-missing and missing values (PROC FREQ) |
Unique Value | Values are unique such as key variables (PROC FREQ) |
Missing Schedule Visits |
For important lab visits, identify any missing lab visits based on scheduled and expected visits |
Missing Data Report |
Confirm non-missing and missing variable combinations (PROC FORMAT). |
Transcription Errors | Data from excel files, for example, are not correctly saved in datasets |
Based on the Data Management Plan, below is a list of best practice SAS functions and programming techniques used to check quality of all source data (datasets, raw/derived variables and records) at both across datasets as well as individual datasets. Make sure to confirm no false positives (finding an issue that is not valid), and no false negatives (not finding a valid issue). Generally, the primary edit checks also serve as data acceptance testing during database lock. See also SAS Enterprise Guide's Characterize Data task and SAS Functions.
Data Quality can be broken into two components - Content Correctness (Accuracy, Completeness, Balancing, Consistency, Precision, Granularity, Currency, Duration, Retention, Continuity, Precedence), Structure Integrity (Identity, Cardinality, Reference, Inheritance, Values, Dependency)
Check Condition (REPORT IF ...) |
Clinical SAS Example (... THEN DO; <...> END;) |
---|---|
1a. General Tips - i) Report for each issue - data set name, patient number, visit date, variables checked, supporting variables and subset condition, ii) Apply inverse of good data condition, iii) Use CATX(' ', ' ', N1, C1) to concatenate text, numeric or character variables into a query text variable to be displayed. Identify if global solution, which effects all sites such as age calculation, or local solution, which is focused on selected sites, is required. 1b. Scope Key Performance Indicator - Summary level measurement, at least one failed patient out of x edit checks, = # of failed edit checks/Total # of edit checks 1c. Impact Detail level measurement – systematic or localized problem based on # of patients affected?, Overall = # of failed records/(Total # of records x Total # of edit checks) and By edit check = # of failed records/Total # of records |
IF ^(GOOD DATA CONDITION) THEN REPORT ISSUE displaying key and condition variables |
2. Character General Tips | Compare as UPCASE(<CHAR_VAR>) = 'UPPER CHARACTERS' |
3. Confirm issues found and no false positives | TITLE "<WHERE CONDITION>"; PROC FREQ; TABLES VAR1*VAR2 / LIST MISSING; |
4. Missing required single variable (numeric or character). The MISSING() function returns number greater than 0 if at least one missing value exists. |
IF MISSING(<NUMERIC OR CHARACTER VAR>) THEN display key and missing variables |
5. Missing required two or more variables (numeric or character). The NMISS()/CMISS() function returns the number of missing numeric/character values in a list of values. | IF CMISS(<NUMERIC OR CHARACTER VARS>) NE 0 THEN display key and missing variables |
6. Missing minimum number of numeric variables | IF NMISS(<6 NUMERIC VARS>) LT 5 THEN display key and numeric vars |
7. Below a minimum numeric value | IF <NUMERIC VAR> > . AND <NUMERIC_VAR> < 35 THEN display key and numeric variables |
8. Numeric values outside of range |
IF <NUMERIC_VAR> NOT BETWEEN <MIN> AND <MAX> AND <NUMERIC_VAR> IS NOT MISSING THEN display key and numeric variables |
9. One numeric variable value is larger than another numeric variable value | IF <NUMERIC_VAR1> > <NUMERIC_VAR2> THEN display key and numeric vars |
10. Codelist check for simple characters or words - A character variable is not a valid value based on list of selected values. Single or word valid values can be applied. TRIM() removes all spaces so 'ABC' will be 'BASELINEWEEK2WEEK4' for example. Note that these are all case sensitive. VERIFY() value is not equal to 0 if variable has a value outside of 'ABC' codelist. INDEXW()=0 is better for confirming valid words in a list of words. |
IF <CHARACTER_VAR> ^IN ('A', 'B', 'C') THEN display key and char vars IF VERIFY(TRIM(<CHARACTER_VAR), 'ABC') ^=0 THEN display key and char vars IF INDEXW('BASELINE WEEK2 WEEK4', COMPRESS(VISIT)) = 0 THEN display key and visit variables |
11. A numeric variable is not a valid value in a list of selected values. | IF <NUMERIC_VAR> ^IN (5, 10, 15) THEN display key and numeric vars |
12. A variable does not equal at least one other list of variables | ARRAY OEDATES {*} OEVISDAT1 - OEVISDAT13; DO i=1 to HBOUND(OEDATES); IF CMISS(OEDATES{i}, OCPROCDT)=0 and OEDATES{i} = OCPROCDT THEN OCPRODTC=VNAME(OEDATES{i}); END; |
13. A specific character string does not exist in a variable | IF INDEX(<CHARACTER_VAR>, 'SEARCH_TEXT') = 0 THEN display key and char vars |
14. Start date is after end date | IF <START_DATE> > <END_DATE> THEN display key and date vars |
15. Confirm a list of all unique values for a variable | PROC FREQ; TABLES < VARS > / LIST MISSING; RUN; |
16. A numeric value exists in a character variable | IF ANYALNUM(<CHARACTER_VAR>) ^= 0 THEN display key and char vars |
17. A character value exists in a character variable | IF ANYALPHA(<CHARACTER_VAR>) ^= 0 THEN display key and char vars |
18. Group valid character values to identify invalid values | PROC FORMAT; VALUE $GENDER ''<valid value1>'', "<valid value2>" = 'Valid' ' '='Missing' OTHER='Miscoded'; QUIT; |
19. Group valid numeric values to identify invalid values | PROC FORMAT; VALUE AGE 0 - 100 = 'Valid' .='Missing' OTHER='Miscoded'; QUIT; |
20. Check for duplicate records | PROC SORT NODUPKEY; BY _ALL_; RUN; |
21. Checking across two visits, objective is to compare values on the same record - See SAS Paper |
Use one or more techniques in DATA Step - RETAIN, FIRST., LAST., BY, LAG() |
22. Checking across two or more visits, create new variable names for each visit by patient, use PREFIX option The alternative for creating several variable is within a DATA step, to subset for each visit and rename each new variable |
PROC TRANSPOSE DATA=bln OUT=BLN3A PREFIX=VISDAT; BY USUBJID; VAR VISDAT; WHERE VISDAT > .; RUN; |
23. Converting multiple to single record per patient using an option below. Add _single as suffix and keep selected variables.
e) FIRST. or LAST. in a DATA Step to get first non-missing value based on grouping variables |
PROC SORT DATA=OE NODUPKEY OUT=OE_SINGLE (KEEP=USUBJID OCUPROC OCUPROCE AEANY); BY USUBJID OCUPROC OCUPROCE; RUN; |
24. At least one record of many per patient matches condition. Out of many records, if at least record per patient matches condition then the patient passes the check. All other records for that patient are ignored. |
1. Apply required condition as WHERE dataset option to first exclude any non-checked records. 2. Initially assign CHECKFLG to missing for each patient. RETAIN CHECKFLG; IF FIRST.SUBJID THEN CHECKFLG=. 3. Apply IF-THEN, ELSE IF-THEN conditions to assign CHECKFLG variable for each record: 0 for pass, . and 1 for fail condition. 4. Apply descending sort order on CHECKFLG variable to order 1 before 0 and '.' records by patient. 5. IF FIRST.SUBJID THEN OUTPUT to save first record per patient. 6. Report if any patient has CEHCKFLG = 1. |
25. Visits outside of protocol visit windows - based on STUDY_DAY, assign visit numbers and then apply VISITT format to identify multiple visits within the visit window as well as outside visits. Then use PROC SQL to group by VISIT and select MIN(STUDY_DAY) to get first visit or MAX(STUDY_DAY) to get the last visit within the window. |
if &stdyvar = 1 then &visvar = 1; PROC FORMAT; VALUE VISITT 1 - 3 = 'VISIT 2' 5 - 10 = 'VISIT 5' .='Missing' OTHER='Outside of Window'; QUIT; |
26. Joining many-to-many records by dates such as AE and CM to compare dates from two datasets, see PROC SQL. For one-to-one join of multiple variables, use DATA Step. | proc sql; create table ae_meds as select a.subject_id, a.ae_start, a.ae_stop, a.adverse_event, c.cm_start, c.cm_stop, c.conmed from aes as a left join conmeds as c on (a.subject_id = c.subject_id) and ( (a.ae_start <= c.cm_start <= a.ae_stop) or (a.ae_start <= c.cm_stop <= a.ae_stop) or ((c.cm_start < a.ae_start) and (a.ae_stop < c.cm_stop)) ); quit; |
27. Assure next followup visit date is after previous visit date. This assumes the dataset is sorted by visitNum which should be in sequence order. |
proc sort data=OE(keep=siteID subjID usubjid visitNum visit visdat) data dpquery_6; length prevDate 8 prevVisit $20 ; if first.subjectID then do; |
28. For indexed systems, use multiple SET statements with KEY=/UNIQUE option to use key variables to automatically and correctly join datasets. Not that this technique keep all records from the first master dataset, without the KEY= option, such as that for IF A. Four Components to the Model A. SET master dataset with WHERE dataset option and w/o KEY=. Note that master dataset must have defined all subsequent key variables in any following SET datasets for this technique to work. B. SET datasets with KEY= C. Just after the SET dataset to check, IF _IORC_ = %SYSRC(_DSENOM) THEN to capture non-matching records D. ELSE DO to capture matching variables from other datasets for all records in master SET dataset |
DATA CHECK; SET MALES2 (where=(sex='M')); SET CLASSID KEY=_PRIMKEY/UNIQUE; IF AGE <= 14; |
29. When displaying records based on a summary condition, which approach is best? - for identifying matches based on existing records, SUBQUERY is better to use since the condition selects existing records. For identifying non-matches based on missing records, EXCEPT is better to use since the condition selects missing or records that do not exist. |
title 'Check for missing important baseline records'; proc sql; select unique usubjid, paramcd from BZ1_adam.adbmk where paramcd > '' except select unique usubjid, paramcd from BZ1_adam.adbmk where avisit='BASELINE'; quit; |
EDC: Oracle Inform Support
1. Standards for Clinical Data Quality and Compliance Checks, Sunil Gupta
2. Clinical-Data Acceptance Testing Procedure, Sunil Gupta
3. Saving Trees: Managing the Edit Check Process Electronically
Tim Kelly
5. Risk-Based Approach to SAS® Program Validation, Keith Benze
6. Consistent Variables + Consistent Checks = Cleaner Data, Greg Silva
7. DATA CLEANING: LONGITUDINAL STUDY CROSS-VISIT CHECKS, Lauren Parlett
8. Let SAS© Improve Your CDISC Data Quality, Wayne Zhong [CDISC]
9. Data Edit-checks Integration using ODS Tagset, Niraj J. Pandya, Vinodh Paida
10. Data Cleaning 101, Ron Cody (Book Examples) (PennState Online) (Book Insert)
11. Longitudinal Data Techniques: Looking Across Observations, Ron Cody
12. Identifying Continuity in Longitudinal Data, Merle Hamburger, Thomas Sukalac
13. Consistency Check: QC Across Outputs for Inconsistencies John Morrill, David Austin
14. Get With the Validation Program –Successful Steps to Edit Check Failures, Michael Rea
15. Datacut Strategies: What, why and how, Hiren Naygandhi, [Presentation] [Data Cutoff]
16. Know Your Data While Writing the SAS Code, Shu-Min Chuang [Data Cutoff]
20. ICH E6 rewritten to reflect recent GCP inspection findings article [Course] [Webinar]
21. Electronic Data Capture—Pros and Cons
22. FDA Guidance for Industry Electronic Source Data in Clinical Investigations
23. Electronic Source Data in Clinical Studies [eSource solutions]
24. The Beginner’s Guide to an Electronic Data Capture (EDC) System
25. GCP101 Good Clinical Practices OR “Why we do What we do the Way we do it“, Elaine Dempsey
28. Data Quality Management, The Most Critical Initiative You Can Implement, Jonathan Geiger [Presentation]
29. SAS® Data Quality – A Technology Overview Eric Hunley
30. Data Cleaning 101: An Analyst’s Perspective, Anca Tilea, Deanna Chyn
31. Getting to the Point: Data Scrubbing and Cleaning 101, Sean Bair
32. CLEAN UP YOUR ACT: DATA CLEANING WITH SAS, Mel Widawski
33. From data management to statistical programming: a real life change, Benjamin Lerbour
36. CHECKING OUT-OF-RANGE DATA, Aileen Yam
39. PROC DATACHK Revisited: The DATACHK Macro, Jeffrey Abolafia
40. MAINTAINING QUALITY CONTROL OVER INPUT DATA WITH SAS SOFTWARE, R. Whitfield
41. eSource Implementation in Clinical Research: A Data Management Perspective
42. Pre-Data Checks for SDTM Development, Abhinav Srivastva
43. Get Your Hands Dirty Cleaning Your Data with SAS® Data Quality Server, Faron Kincheloe
44. What Makes A Data-Savvy Manager blog
45. Preemptive DATA CLEANING: Techniques, Malachy Foley [Completeness]
46. How to STRIP Your Data: Five Go-To Steps to Assure Data Quality, Michael Santema, Fagen Xie [Completeness]
48. The Art of Defensive Programming: Coping with Unseen Data, Philip Holland [Presentation]
49. Troublemarker Records [Presentation]
51. In-Database Data Quality – Performance for Big Data, Charlotte Crain, Mike Frost, and Scott Gidley [Completeness, Accuracy]
52. Garbage In, Gourmet Out: How to Leverage the Power of the SAS® Quality Knowledge Base, Brian Rineer
53. An Efficient Tool for Clinical Data Check, Chao Su, Shunbing Zhao, Cynthia He
54. A time saving approach to track data issues, Aishhwaryapriya Elamathivadivambigai
55. Automate Clinical Trial Data Issue Checking and Tracking, Dale LeSueur and Krishna Avula
56. DATA CLEANING: LONGITUDINAL STUDY CROSS-VISIT CHECKS, Lauren Parlett [ALLCOMB]
57. Advanced Array Applications in Clinical Data Manipulation, Zaizai Lu, David Shen
58. Data Quality Checker, Ryan Finch
59. Improving CDISC SDTM Data Quality & Compliance Right from the Beginning, Bharat Chaudhary, Padamsimh Balekundri [Presentation]
60. Clinical Database acceptance: what statistical review checks are necessary to validate a database? Genevieve Jehl [Checklist]
61. Keeping Track of Database Changes During Database Lock, Sanjiv Ramalingam [Proc Compare]
63. Handling Interim and Incomplete Data in a Clinical Trials Setting, Paul Stutzman [Macros]
65. PROC SQL: A Powerful Tool to Improve Your Data Quality Keh-Dong Shiang
66. Cleaning Data with Just a Handful of SAS® Functions Ben Cochran
67. Data Cleaning and Base SAS Functions, Caroline Bahler
69. Handling Dynamic Variable Types in SAS, Venkat Lajapathirajan
70. LET SAS® CLEANSE YOUR DIRTY DATA, Kaushal Chaudhary
71. Clinical and Vendor Database Harmony; Can’t we all just get along?, Brian Armstrong, Renée Kerwin
73. Arbovirus, Varicella and More: Using SAS® for Reconciliation of Disease Counts, Misty Ann Johnson
74. Cleaning and transforming data for delivery of analytic datasets, Christopher Schacherer
75. Enterprise, Prepare for Analysis! Using SAS® Data Management to Prepare Data for Analysis, Bob Janka
76. Be Prepared: An Introduction to SAS Data Preparation, Mary Queen
77. Risk-Based Monitoring (RBM) and Centralized Statistical Monitoring(CSM) ‘Data Quality Analysis’
78. Creating a single instance / multitenant eClinical Platform
79. Best Practices for Submission of Event Adjudication data [PhUSE White paper]
82. Looking for columns with all missing values, Binoy Varghese
83. Dropping variables from a large SAS® data set when all their values are missing, Gwen Babcock
84. Missing Values in SAS, Magnus Mengelbier
85. Statistical Analysis Plan – Clinical Programming Reviewers Guide, Xiaoyin Zhong, Dan Schramek
86. Building Quality into the SAS System, Frank Lassiter
MEDDRA Papers [Presentation] [Who Drug]
1. Can Coding MedDRA and WHO Drug be as Easy as a Google Search?, Sy Truong, Na Li
2. MedDRA Dictionary: Reporting Version Updates Using SAS® and Excel, Richard Zhou, Denis Michel
3. MedDRA data as SAS formats, Jim Groeneveld
4. Data Extraction & MedDRA: A SAS®Coding Shortcut, Fang Dong, Saket Badola
5. Applying SMQs to Adverse Event Data, John van Bemmelen [Standardized Metadata Queries]
9. Recursive Programming Applications in Base SAS. Jinson Erinjeri and Pratap Kunwar [SMQ]
12. SMQ SAS® Dataset Macro, Mi Young Kwon, Ishan Shah [SMQ]
RAVE Papers [Medidata]
1. Sharing and Building Sponsor-Specific CRFs – No Paper Required, Scott Bahlavooni, Retha Gerber [Presentation]
2. Measuring the Benefits of Standards, Gary Fleming, Michaela Jahn
3. Automatic generating blankcrf.pdf for Rave Study, Haiqiang Luo, Yong Cao
5. Innovative approach to building an adaptive trial design in Medidata Rave, Joris Bondt
6. Pre-Data Checks for SDTM Development, Abhinav Srivastva
7. No Solution to Auto-generating acrf.pdf? Try to Use GROOVY Procedure!, Yin-Jhen Yan, Kyle Chang
SAS Life Science Analytical Framework [LSAF]
2. What’s your data flow in SAS® LSAF (How to code in LSAF), Shuang Fu
3. Making the most of SAS® Jobs in LSAF, Sonali Garg
6. Incorporating Pinnacle21® With LSAF, Sonali Garg, Sandeep Juneja, Aleksey Vinokurov