Data Validation - QC/QA/Data Quality Techniques  

          Training Videos:

Applying SAS Program Validation Techniques 

Below is a collection of validation strategies and SAS® papers on data validation techniques.  See also CDMPROC COMPARESix Sigma in SAS Business Intelligence and ETLSAS Debugging, and Efficiency and Performance Survey. Mind map. For Quality Assurance, see also Metadata.


 Quality Control 


 Quality Assurance 



Inherent in day to day work as part of a process to deliver a useful product

 Self critiquing

 Documented what went right, what went wrong and what you did about it

 Steps may be updated based on QA results 


An oversight function based on random or risk-based sample selection

 Independent of the process

 Will offer critical comment to challenge the quality of the system

 Assures that the glue is in place to hold it all together




In general, the adaptive validation strategies below are suggestions only for large lab lists, tables and graphs.  Most all other lists, tables or graphs such as demog, vitals, etc. require 100% parallel programming and checking 100% of all pages.  Since the number of pages is expected to be less than 5 in general, this validation is not expected to be time consuming.  In addition, all derived datasets are still 100% data values compared.

It is recommended to consult your statistician and FDA to assure appropriate sampling techniques are applied with acceptable type 1 and type 2 error rates.  In general, while 100% parallel programming is required in over 80% of the cases, the number of page results that need to be validated may not require 100%.  For the page numbers randomly selected, 100% of all values on these pages are compared.  

If the random sample validation fails by finding a difference, then the production table is reproduced to correct the issue and a larger random sample size is required in the second pass.  This sequence acceptance testing will increase the random sample size for each pass is followed if differences continue.  If, however, there are no difference or issues, then subsequent passes are not required.


Risk Level


Adaptive Validation Strategies for lab lists, tables and graphs


Code Review, 10%-20% random sample pages


25% to 100% random sample pages based on Sampling Techniques
  1st pass 25% of random sample pages
  2nd pass 50% of random sample pages if differences/issues found in 1st pass
  3rd pass 75% of random sample pages if differences/issues found in 2nd pass
  4th pass 100% of all pages if differenes/issues found in 3rd pass


100% of all pages, Parallel Programming (Most time consuming)





Validation Techniques

1. Specifications/Table Shells > 2. Source Program > 3. Output > 4. QC Program



Program header, programming style and intermediate steps

Validation excel file - program, log and output, See SAS Debugging

Method - parallel programming against single source specification/table shells, code review and results spot check 

Macros - preventative programming and user message for each invalid parameter, keyword parameters with defaults, metadata checks and global macro variables, and document sections 

Code Review tips - identify incorrect logic or process flow, compliance with sops, modular, meet specifications  





PROC COMPARE datasets.  Generally, 100% data values are compared.

SDTM: Compare corresponding related variables, ex. PROC FREQ LBSTRESN*LBSTRESC

Method - group related variables together, such as sequence of all date variables, to confirm consistency across variables when browsing dataset, use dataset shell macro to control order of variables

Data issues - missing values, invalid values, zero and out of range 




Other Listings, Datasets, PROC REPORT (detail lists), RTF Compare/QC Dataset from OUT= option.

Confirm: specification with variable names, any subset conditions, correct units, description of Other or Comment from SUPPXX dataset, column alignments (left, center and right), case-sensitivity, sort order, message for missing records 







Other Tables such as complement Tables (Prior Con Meds and Con Meds), Listings, PROC TABULATE, RTF Compare/QC Dataset from OUT= option.  For % calculations, make sure to delete duplicate records for the denominator variable with PROC SORT and NODUPKEY. 

Confirm: specification with variable names, any subset conditions, correct units, column alignments (left, center and right), case-sensitivity and sort order




PROC TRANSPOSE Listings for raw, min and max data values, Tables, PROC SGPLOT, consistent y-axis scale within lab tests as needed for valid dose group comparisons, SAS Enterprise Guide

Visual QC Steps: Text placement, Subset labels, titles, footnotes, source dataset compare, x and y axis,  



Validation Plan The 5 Most Important Clinical SAS Programming Validation Steps, Brian Shilling [QC Checklist]

PROC COMPARE Validation: Let SAS do the comparison for you, Lara E.H. Guttadauro

                        Validating Listing Output: A Better Way, Hunter Vega, James Kniffen Jr


MPRINT/MFILE Have it both ways: Macros that produce publication-quality tables and stand-alone code, Linda Collins, Lisa Brooks, Michael Rea, Alan Hopkins

Macros Saving QC Time for Production Tables, Linfeng Xu, Sunil Gupta

RTF Processing Save Those Eyes: A Quality-Control Utility for Checking RTF Output Immediately
And Accurately, Michiel Hagendoorn, Jonathan Squire, Johnny Tai

RTF2DATA Utility A Utility Macro to convert RTF Table to SAS DatasetProve QC Quality — Create SAS® Dataset from RTF File

RTF_READ Decoding RTF Files


Regulatory Compliance (Validation Presentation)

1. Quality Testing for Programs Used in Regulatory Submission
Xiaohui Wang, Elaine Czarnecki

2. External Data Utility (EDU), a SAS® Tool to Manage Data from Acquisition to Database Release
Margaret Hung, Mohit Goel, Sheila Moody

3. Good Programming Practices in Clinical Trial – a Check Program
John H. Adams

4. Could Have, Would Have, Should Have! Adopting Good Programming Standards, Not Practices, to Survive An Audit, Vincent J. Amoruccio [QC Checklist]

5. Macros to Create Quality Control (QC) Documents, Xiaoyu Liu, Hong (Ellen) Xiao

6. SAS® Macros to Help Relieve Common Program Documentation Pain, Chris Hord, Jay Zhou

7. CSI: San Antonio – Common SAS Issues in Our Programs and Tips for Better Investigation of Your SAS Code, Rachel Brown

8. The Validator: A Macro to Validate Parameters, Steven Wilson

9. Is Your Code Complex? Here’s One Way to Tell, Mike Harris

10. SAS File Design: Guidelines for Statisticians and Data Managers, Douglas Zirbel

11. An Annotated Guide: Using Proc Tabulate And Proc Summary to
Validate SAS Code, Russ Lavery

12. Mission Possible: Your Assignment is to Validate Output for a Study, Susan Fehrer Coulson, Kevin R. Coulson [QC Checklist]

13. PROGRAMMER’S SAFETY KIT: Important Points to Remember While Programming or Validating Safety Tables, Sneha Sarmukadam, Sandeep Sawant [Table Checklists]

14. Validation, SAS, and the Systems Development Life Cycle: An Oxymoron?, Neil Howard, Michelle Gayar

15. Designing Validation and Verification Activities as a Staff Development Tool, John Gorden

16. Communicating Standards: A Code Review Experience, David Scocca

17. When Good Looks Aren’t Enough When Good Looks Aren’t Enough, Lisa Eckler

18. Validating Analysis Data Set without Double Programming - An Alternative Way to
Validate the Analysis Data Set, Linfeng Xu, Christina Scienski

19. Quality Assurance: Best Practices in Clinical SAS® Programming, Parag Shiralkar [QA Checklist]

20. SAS Programmer’s check list-Quick checks to be done before the statistical reports go off the SAS Programmer’s table, Thomas T. Joseph, Babruvahan Hottengada

21. TLF Validation Etiquette: What to say, When to Say, How to Say, Why to Say, Karen Walker

22. Quick Checks for Quick Review, Gauri Khat [ADSL, ADAE, ADEX vitals]

23. Consistency Check: QC Across Outputs for Inconsistencies, John Morrill, David Austin [Cytokine]

24. A Strategy for Managing Data Integrity Using SAS, Brett Peterson

25. Beyond Double Programming ----SAS® Programming By Design (PBD) with Soop, Laiju Zhang

26. Managing 21 CFR Part 11 Compliance: Using Checksums on Opens Systems, Carey Smoak, Mario Widel [FDA Guide]

27. Find / Track / Check and Close, Using SAS to Streamline SDTM Validation Including the Hyperlinks, David Tillery, Qiang Zhai, Lily Peng

28. Data-driven Validation Rules: Custom Data Validation Without Custom Programming, Don Hopkins [Metadata, Proc SQL]

29. What auditors want, Cedric Marchand, Angelo Tinazzi [Presentation, Technical Interview, QA Checklist]

30. Using Proc Contents Output to Perform Quality Control Checks on SDTM Datasets, Jennifer Srivastava [VARNUM]

31. A Well-Formatted and Easy-to-Navigate Solution for Submitting SAS Source Code in NDA Submission, Jeff Xia, Lugang Xie [QA Checklist]

32. Software Validation in Clinical Trial Reporting [Presentation]

33. Is Your Output Telling the Truth? Tips and Tricks in Verifying SAS Outputs, Angelo Tinazzi, Sonia Colombini, Lisa Comarella, Marta Zanus [Checklist]

34. QC:manual vs program – a personal view, Helen Nicholson

35. Quality Control Programming: A Lost Art?, Amber Randall and William Coar

36. Compliance Readiness for the New Millennium How does your SAS Environment Measure Up?, Steven Light, Andy Siegel

37. Tips for efficient CDISC eCRT production, Lanting Li, Yu Zhu, Huan Zhu


39. Playing Detective: Hints and Tips for Independent Programming QC, Bethan Thomas

40. How to QC your own programs, Kevin Lee

41. Handbook on Data Quality Assessment Methods and Tools [Book]

42. Data Quality Management, Data Cleansing, and Discrepancy Reporting, Jenine Milum

43. An Efficient Report Checking Method, Xuejing Mao, Mario Widel

44. Data Matching, David Johnson, Wendy Dickinson

45. Basic Defensive Programming Techniques, Baoxian Lan, Daniel Tsui

46. Automated or Manual Validation: Which One is for You?, Richann Watson, Patty Johnson

47. Seeing the Forest for the Trees: Part Deux of Defensive Coding by Example, Nancy Brucken and Donna E Levy

48. The Art of Defensive Programming: Coping with Unseen Data, Philip Holland [Presentation]

49. Check Your Data: Tools for Automating Data Assessment, Paul Stutzman

50. Beyond “Just fix it!” Application of Root Cause Analysis Methodology in SAS® Programming, Nagadip Rao

51. Statistician’s secret weapon: 20 ways of detecting raw data issues, Lixiang Larry Liu

52. Common Mistakes by Programmers & Remedies, Venkata Sairam Veeramalla

53. Clinical Study Report Review: Statistician’s Approach, Amita Dalvi [Checklist]

54. Cue the word QC – All you need to know [Presentation]

55. Automated Validation of Complex Clinical Trials Made Easy, Richann Watson, Josh Horstman [Macro]

56. How to Speed Up Your Validation Process Without Really Trying, Alice Cheng, Michael Wise, Justina Flavin

57. Set Yourself Free–Use ODS Report Writing Technology in SAS Enterprise Guide Instead of Dynamic Data Exchange in PC SAS, Robert Richard Springborn


59. Developing and Implementing a Comprehensive Clinical QA Audit Program


61. FDA Inspection Preparation Guide

62. Data Integrity: One step before SDTM, Pavan Kathula, Sonal Torawane

63. Let’s Check Data Integrity Using Statistical (SAS®) Programmers with SAS, Harivardhan Jampala

64. A Strategy for Managing Data Integrity Using SAS, Brett Peterson

65. Why clinical trials are terminated, Teodore Pak, Maria Rodriguez, Frederick Roth

66. A SAS® Programmer's Guide to Project- and Program-Level Quality Control, Paul Gorrell

67. Best Practices for Quality Control and Validation [PhUSE White Paper]

68. Playing Detective: Hints and Tips for Independent Programming QC, Bethan Thomas

69. QC made easy using macros, Prashanthi Selvakumar

70. Quality Control With SAS Numeric Data, Paul Gorrell

71. Ensuring Consistency Across CDISC Dataset Programming Processes, Jennifer Fulton

72. Using SAS to Ease the Proofing of Messy Text, Nat Wooding, Richard Valley [Spell]

73. Spelling Checker Utility in SAS® using VBA Macro and SAS® Functions, Ajay Gupta

74. Macro to Conduct Consistency Checks, Walter Hufford [QC Macro]





Log Check

1. The Automatic Detection of Problems in the SAS Log, MaryAnne D. Hope

2. Search Your LOG Files for Problems the Easy Way: Use LOGCHK.SAS, Michael A. Walega

   Download LOGCHK macro

3. Programming Tips and Examples for Your Toolkit, John Morrill, Kristi Wiser

4. Programming Tips and Examples for Your Toolkit, II, John Morrill

5. Programming Tips and Examples for Your Toolkit, III, John Morrill

6. The Implementation of Automatic Obvious Data Modifications (ODMs) on
Late Phase Trials Using SAS PROC SQL, Kathleen Kushner, Philip Pellicone

7. A Utility Program for Checking SAS Log Files, Carey Smoak  

Powered by Wild Apricot Membership Software