Training Videos: |
Let's go vertical to compare all differences by subject
When comparing datasets, there are three main levels that can impact differences. It is also helpful to know if the two datasets are expected to be the same or not to validate the dataset. For example an updated dataset is expected to have more records than an older dataset but may not have data values changed or a new dataset is expected to the same as older dataset except for the addition of a new variable.
1. Differences in record count - patient selection and count, visits and dates.
2. Differences in variable count - missing variables.
3. Differences in data values - data values.
For validation of any dataset, PROC COMPARE is your first choice. Not only are datasets compared but also listings and summary reports. For summary reports from PROC REPORT for example, plan to save results in output datasets.
Use the power of PROC COMPARE to not only compare variable attributes, but also the accuracy of data values. If misalignment of key variables occurs, then you should correct this before any other matches can be checked.
See 'Characterize Data' in SAS Enterprise Guide to apply to both source and qc datasets. See also QC process.
80% - Most Common Utilized |
20% - Less Utilized |
QC purpose to compare datasets for all variables and all records using common statements/options |
QC purpose to compare datasets for selected variables or records using custom options/traffic lighting or from summary reports, Confirm dataset updates, Save to output datasets |
Order of Severity |
Tips and Seven Types of PROC COMPARE Matches |
---|---|
Tips |
BASE = Generally source dataset shown in the first column COMPARE = Generally qc dataset shown in the second column Useful options: LISTVAR LISTOBS LISTEQUALVAR ID statement to compare and display by key variables (requires presorting both datasets) BY statement to compare by key variables WHERE DATA SET option to compare subset records only VAR to compare selected variables only, exclude to compare all variables WITH to list different variable names in COMPARE datast (DROP=) dataset option to select variables not to compare Three Possible Options 1) Two datasets match 2) Two datasets do not match in variables (count, attributes, etc.) 3) Two datasets do not match in records (count, etc.) Six Types of Proc Compare Reports 1) Data Set Summary 2) Variables Summary 3) Observation Summary 4) Values Comparison Summary 5) Value Comparison Results 6) Summary Statistics |
1 (Ideal) |
Exact match - For all variables and for all records, no difference in two datasets |
2 | Inconsistent Sorting of Records - Need to proc sort both datasets by key variables (Most important, First to correct) |
3 |
Misalignment of Key Variables using ID statement - Need to correct variable name / type / length / label |
4 |
Record Count mismatch - Need to view list of records in one dataset but not in the other dataset and update record selection condition / add / delete records as needed, test with subsets |
5 |
Variable mismatch - Need to view the list of variables in one dataset but not in the other dataset and add / delete variables, test with VAR and WITH if different names. If you want a list of BASE variables but not in the COMPARE data set, use LISTBASEVAR and if you want a list of the COMPARE variables but not in the BASE data set, use LISTCOMPVAR option. |
6 |
Data Values mismatch - Need to view source dataset with filter / sort to identify correct values if needed, update variable assignments, consider: case-sensitive, spaces, and rounding. Instead of using exact match options such as LISTALL METHOD=ABSOLUTE CRITERION=0.01, consider changing settings to be more flexible: METHOD=RELATIVE CRITERION=0.00001 As an option, apply temporary FORMAT BINARY16. format for numeric variables or $BINARY16. format for character variables to display the actual stored value representation. |
7 |
Variable Attributes mismatch - Need to update variable attributes (name / type / length / label) (Generally length and label mismatch may be acceptable, not not name or type) |
proc sort data=emp95_byidnum; by idnum; run;
proc sort data=emp96_byidnum; by idnum; run;
* Example of syntax for saving differences in output dataset;
proc compare data=&oldlib.ProcContentsByStudy compare=&oldlib.ProcContentsByStudy
out=DiffByTableVar outnoequal outbase outcomp outdif noprint;
by TableName Varname;
run;
Beginner Paper "How Does Your Data Compare?" SAS’s COMPARE PROCEDURE,
Jenna Heyen
HOW Proc Compare - Worth Another Look!, Christianna Williams [NOVALUES WARNING NOPRINT options, OUT=]
Macro Tool An Abbreviated PROC COMPARE with Traffic Lighting, William Conover [ODS]
SAS® Reference PROC COMPARE Syntax
3. Finally-An Easy Way To Compare Two SAS Files!, Doug Zirbel (Download SAS Macro)
4. Comparing Two SAS Data Sets: Start At the Top, Ann Olmsted [binary]
5. The Supplement of the COMPARE Procedure, WeiQin Pan
6. How Does Your Data Compare? SAS's COMPARE Procedure, Jenna Heyen
7. Don’t Get Blindsided by PROC COMPARE, Joshua Horstman, Roger Muller
8. VALIDATION: Let SAS do the comparisons for you, Lara E.H. Guttadauro
9. Go Compare: Flagging up some underused options in PROC COMPARE, Michael Auld [SDTMs]
10. 3N Validation to Validate Proc Compare Output, Amarnath Vijayarangan [Macro, e-poster]
11. How to Speed Up Your Validation Process Without Really Trying?, Alice Cheng [binary, &sysinfo]
12. DARE TO COMPARE Tailoring PROC COMPARE Output, Maria Reiss [Updates]
14. Finding All Differences in two SAS® libraries using Proc Compare, Bharat Kumar Janapala
15. %DIFF: A SAS Macro to Compare Documents in Word or ASCII Format, Michelle Xu and Jay Zhou [Macro]
17. Automated Validation of Complex Clinical Trials Made Easy, Richann Watson, Josh Horstman
18. Quality Control of Clinical Data Listings with Proc Compare, Robert Bikwemu, Nicole Wallstedt
19. Proc Compare to Validate Datasets, Angelina Cecilia Casas [Transpose]
20. Demystifying PROC COMPARE : A Programming Approach, Stanley Fogleman
21. DARE TO COMPARE Tailoring PROC COMPARE Output, Maria Reiss
22. PROC COMPARE: What Did I Inadvertently Change?, Melissa Pfeiffer [VAR WITH]
23. Comparing Datasets: Using PROC COMPARE and Other Helpful Tools, Deb Cassidy
24. Quality Control and Validation – More than Just PROC COMPARE, Evgeny Starostin, Michael Rimler [Checklist]
25. Programming Solutions when Developing a Database Compare Macro, Michael Rimler [Presentation]
26. Meta Data That You (Probably) Didn’t Know That You Had: A Beginners’ Guide to Using SAS Dictionaries and Automatic Macro Variables, Richard F. Pless [&syslibrc, &sysfilrc, &sysinfo]
27. Deciphering PROC COMPARE Codes: The Use of the bAND Function, Joseph Hinson, Margaret Coughlin [sysinfo, macro]
28. A Few Quick and Efficient Ways to Compare Data, Abraham Pulavarti [Binomial, &sysinfo, Macro]
30. Validation Summary using SYSINFO, Srinivas Vanam, Mahipal Vanam, Shravani Vanam [Macro]
31. Macro utility to compare multiple SAS data sets, Krish Krishnan [Macro]
32. Comparing 2 SAS Data Sets: An Alternative to Using PROC COMPARE, Kevin King [Alternative]
33. Proc Compare: Wonderful Procedure!, Anusuiya Ghanghas, Rajinder Kumar [OUT=]
34. Non Printable & Special Characters: Problems and how to overcome them, Sridhar Dodlapati, Praveen Lakkaraju, Naresh Tulluru and Zemin Zeng [Clean character variables before proc compare]
35. A Collection of Items from a Programmers’ Notebook, David Franklin, Cecilia Mauldin