PROC COMPARE   

          Training Videos:

How to Proc Compare your Output 

Alternative method to compare datasets without Proc Compare

Let's go vertical to compare all differences by subject 


When comparing datasets, there are three main levels that can impact differences.  It is also helpful to know if the two datasets are expected to be the same or not to validate the dataset.  For example an updated dataset is expected to have more records than an older dataset but may not have data values changed or a new dataset is expected to the same as older dataset except for the addition of a new variable.

1. Differences in record count - patient selection and count, visits and dates. 

2. Differences in variable count - missing variables.

3. Differences in data values - data values.

For validation of any dataset, PROC COMPARE is your first choice.  Not only are datasets compared but also listings and summary reports.  For summary reports from PROC REPORT for example, plan to save results in output datasets.  

Use the power of PROC COMPARE to not only compare variable attributes, but also the accuracy of data values.  If misalignment of key variables occurs, then you should correct this before any other matches can be checked. 

See 'Characterize Data' in SAS Enterprise Guide to apply to both source and qc datasets.  See also QC process.

 
80% - Most Common Utilized

20% - Less Utilized 

QC purpose to compare datasets for all variables and all records using common statements/options

QC purpose to compare datasets for selected variables or records using custom options/traffic lighting or from summary reports, Confirm dataset updates, Save to output datasets


 Order of Severity


Tips and Seven Types of PROC COMPARE Matches 

See Gallery of Proc Compare Reports 

 Tips BASE = Generally source dataset shown in the first column

COMPARE
= Generally qc dataset shown in the second column

Useful options: LISTVAR LISTOBS LISTEQUALVAR

ID statement to compare and display by key variables (requires presorting both datasets)

BY
statement to compare by key variables

WHERE
 DATA SET option to compare subset records only

VAR
to compare selected variables only, exclude to compare all variables

WITH
to list different variable names in COMPARE datast

(DROP=) dataset option to select variables not to compare

Three Possible Options
1) Two datasets match
2) Two datasets do not match in variables (count, attributes, etc.)
3) Two datasets do not match in records (count, etc.)

Six Types of Proc Compare Reports

1) Data Set Summary 

2) Variables Summary

3) Observation Summary

4) Values Comparison Summary

5) Value Comparison Results

6) Summary Statistics

 1

(Ideal)

 Exact match
- For all variables and for all records, no difference in two datasets

 2  Inconsistent Sorting of Records

- Need to proc sort both datasets by key variables
(Most important, First to correct)

 

3

 Misalignment of Key Variables using ID statement

- Need to correct variable name / type / length / label 

 

4

 Record Count mismatch

- Need to view list of records in one dataset but not in the other dataset and update record selection condition / add / delete records as needed, test with subsets

 


5

 Variable mismatch

- Need to view the list of variables in one dataset but not in the other dataset and add / delete variables, test with VAR and WITH if different names.  

If you want a list of BASE variables but not in the COMPARE data set, use LISTBASEVAR and if you want a list of the COMPARE variables but not in the BASE data set, use LISTCOMPVAR option.

 


6

 Data Values mismatch

- Need to view source dataset with filter / sort to identify correct values if needed, update variable assignments, consider: case-sensitive, spaces, and rounding.  

Instead of using exact match options such as LISTALL METHOD=ABSOLUTE CRITERION=0.01, consider changing settings to be more flexible: METHOD=RELATIVE CRITERION=0.00001

As an option, apply temporary FORMAT BINARY16. format for numeric variables or $BINARY16. format for character variables to display the actual stored value representation.

 

7

 Variable Attributes mismatch

- Need to update variable attributes (name / type / length / label) (Generally length and label mismatch may be acceptable, not not name or type)

proc sort data=emp95_byidnum; by idnum; run; 

proc sort data=emp96_byidnum; by idnum; run; 


title 'Comparing Observations that Have Matching IDNUMs';
proc compare base=emp95_byidnum compare=emp96_byidnum listall;
id idnum;
run;


* Example of syntax for saving differences in output dataset;

proc compare data=&oldlib.ProcContentsByStudy  compare=&oldlib.ProcContentsByStudy

 out=DiffByTableVar  outnoequal outbase outcomp outdif noprint;

 by TableName Varname;

run;


 


______________________________________________

Beginner Paper "How Does Your Data Compare?" SAS’s COMPARE PROCEDURE,
Jenna Heyen

HOW Proc Compare - Worth Another Look!, Christianna Williams [NOVALUES WARNING NOPRINT options, OUT=]

Macro Tool An Abbreviated PROC COMPARE with Traffic Lighting, William Conover [ODS]

SAS® Reference PROC COMPARE Syntax

______________________________________________

1. Proving It Works: Using PROC COMPARE to Verify an Analysis Converted into SAS Software,
Lauren Haworth

2. An Easy, Concise Way to Summarize Multiple PROC COMPAREs Using the SYSINFO Macro Variable, Lex Fennell

3. Finally-An Easy Way To Compare Two SAS Files!, Doug Zirbel (Download SAS Macro)

4. Comparing Two SAS Data Sets: Start At the Top, Ann Olmsted [binary]

5. The Supplement of the COMPARE Procedure, WeiQin Pan

6. How Does Your Data Compare? SAS's COMPARE Procedure, Jenna Heyen

7. Don’t Get Blindsided by PROC COMPARE, Joshua Horstman, Roger Muller

8. VALIDATION: Let SAS do the comparisons for you, Lara E.H. Guttadauro

9. Go Compare: Flagging up some underused options in PROC COMPARE, Michael Auld [SDTMs]

10. 3N Validation to Validate Proc Compare Output, Amarnath Vijayarangan [Macro, e-poster]

11. How to Speed Up Your Validation Process Without Really Trying?, Alice Cheng [binary, &sysinfo]

12. DARE TO COMPARE Tailoring PROC COMPARE Output, Maria Reiss [Updates]

13. Using PROC COMPARE and PROC CONTENTS to Compare SAS Data Set Values and Characteristics, Tim Lehman and Ann Lehman

14. Finding All Differences in two SAS® libraries using Proc Compare, Bharat Kumar Janapala

15. %DIFF: A SAS Macro to Compare Documents in Word or ASCII Format, Michelle Xu and Jay Zhou [Macro]

16. Comparing ODS RTF Output Files in Batch Using SAS (or, 500 ODS RTF Documents to Compare in 15 minutes!), David Franklin

17. Automated Validation of Complex Clinical Trials Made Easy, Richann Watson, Josh Horstman

18. Quality Control of Clinical Data Listings with Proc Compare, Robert Bikwemu, Nicole Wallstedt

19. Proc Compare to Validate Datasets, Angelina Cecilia Casas [Transpose]

20. Demystifying PROC COMPARE : A Programming Approach, Stanley Fogleman

21. DARE TO COMPARE Tailoring PROC COMPARE Output, Maria Reiss

22. PROC COMPARE: What Did I Inadvertently Change?, Melissa Pfeiffer [VAR WITH]

23. Comparing Datasets: Using PROC COMPARE and Other Helpful Tools, Deb Cassidy

24. Quality Control and Validation – More than Just PROC COMPARE, Evgeny Starostin, Michael Rimler [Checklist]

25. Programming Solutions when Developing a Database Compare Macro, Michael Rimler [Presentation]

26. Meta Data That You (Probably) Didn’t Know That You Had: A Beginners’ Guide to Using SAS Dictionaries and Automatic Macro Variables, Richard F. Pless [&syslibrc, &sysfilrc, &sysinfo]


27. Deciphering PROC COMPARE Codes: The Use of the bAND Function, Joseph Hinson, Margaret Coughlin [sysinfo, macro]

28. A Few Quick and Efficient Ways to Compare Data, Abraham Pulavarti [Binomial, &sysinfo, Macro]

29. Obtaining an Automated Summary of PROC COMPARE Results: &SYSINFO and VB Script, Umesh Gautam, Jeff Roberts

30. Validation Summary using SYSINFO, Srinivas Vanam, Mahipal Vanam, Shravani Vanam [Macro]

31. Macro utility to compare multiple SAS data sets, Krish Krishnan [Macro]

32. Comparing 2 SAS Data Sets: An Alternative to Using PROC COMPARE, Kevin King [Alternative]

33. Proc Compare: Wonderful Procedure!, Anusuiya Ghanghas, Rajinder Kumar [OUT=]


34. Non Printable & Special Characters: Problems and how to overcome them, Sridhar Dodlapati, Praveen Lakkaraju, Naresh Tulluru and Zemin Zeng [Clean character variables before proc compare]

35. A Collection of Items from a Programmers’ Notebook, David Franklin, Cecilia Mauldin

  • data _null_;
  •  x=4097;
  •  y=put(x,binary16.);
  •  put x= y=;
  • run;
  • x=4097 y=0001000000000001 
Powered by Wild Apricot Membership Software