Training Videos: |
Basic simple program efficiency steps as well as more complex strategies and techniques are included in this section. Before implementing an upgrade, make sure to benchmark measurements. In addition, SAS® system option papers are also included. See also SAS Business Intelligence and ETL, SAS Version 9.2 for hashing, SAS Certification exam and Proc SQL. See Compare and Conquer SAS Programming Techniques.
Before deciding to use SAS Indexes, it makes sense to evaluate the advantages and disadvantages. In general, over 80% of SAS users do not use SAS indexes. Keep in mind that SAS Indexes are for very specialized purpose and come with some conditions and at a price. You must first know your data very well and the types and quantity of daily data transactions to make a decision.
USE SAS Indexes if .. (See Basic paper, FAQ paper) |
DO NOT Use SAS Indexes if ... |
1. You want to reduce processing time by accessing one dataset using the direct access method instead of the normal sequential access method. Indexes save time by allowing SAS to read starting from the middle of the dataset instead of starting from the first record and reading the entire dataset when returning matching conditions. |
1. You do not want to maintain SAS indexes since using indexes creates a separate file which gets deleted if the dataset is recreated without recreating the index. |
2. You plan to index on few key variables in one dataset which are used for subsetting records, for example. Simple index is indexing one variable, which is the most popular, and complex index is two or more variables, where more than two variables is rare. Datasets can have multiple simple and complex indexes. Simple index files are the same name as the key variable and complex index file names need to be specified. |
2. You do not have at least 10% of input dataset size as extra storage to save the separate index file which is the dataset name with .SAS7BNDX extension for simple indexes. |
3. You expect your data, or indexed variables, to be stable and not change often since this requires recreating the index file. | 3. You expect to select more than 25% of all records when subsetting since this makes indexes very inefficient. Ideally, indexed values will be less than 10% of the records. GENDER, for example, is a poor index variable. |
4. You are processing VERY LARGE datasets, such as with over 10,000 records, on repetitive or transactional type processing, grouping, sorting for merging or subsetting datasets. Generally indexed variables are using these statements: BY, WHERE, SET/MODIFY and KEY option in DATA steps or SAS Procedures. |
4. Your dataset is small since without index will be faster. Using indexes on small datasets is overkill. |
5. You continue to use PROC SORT to run up to 50% faster. Using SAS Index does not prevent you from using PROC SORT. | 5. If you are using Hash objects since using Hash objects is about 10% faster than indexes. |
6. You are fine with the required ascending sort order of the indexed variables. Using DESCENDING and NOTSORTED options in PROC SORT will not use indexes. | 6. You plan to use selected SAS Procedures such as PROC APPEND since using indexes is not compatible with these procedures. |
SAS Procedure examples for creating SAS Indexes |
Description |
data sales.bighits (index=(cdnumber / unique)); set olddata.oldhits; < additional SAS statements > run; |
1. Using the INDEX DATA set option in a DATA step to create a simple index of CDNUMBER variable with same name file on the SALES.BIGHITS dataset. One or more options for creating index include: UNIQUE and NOMISS. Note that the INDEX DATA set option can also be applied in PROC SORT, for example. This makes it easier for recreating indexes whenever the dataset sets updated. |
data sales.bighits (index=(numname=(cdnumber artistname) / nomiss)); set olddata.oldhits; < additional SAS statements > run; |
2. Using the INDEX DATA set option in a DATA step to create a complex index named NUMNAME which is based two variables CDNUMBER ARTISTNAME on the SALES.BIGHITS dataset. One or more options for creating index include: UNIQUE and NOMISS. Note that the INDEX DATA set option can also be applied in PROC SORT, for example. This makes it easier for recreating indexes whenever the dataset sets updated. |
proc sql; create unique index cdnumber on sales.bighits; quit; |
3. Using PROC SQL to create a simple index with the same name file CDNUMBER on the SALES.BIGHITS dataset. One or more options for creating index include: UNIQUE and NOMISS. |
proc sql; create index numname on sales.bighits(cdnumber artistname); quit; |
4. Using PROC SQL to create a complex index named NUMNAME which is two variables CDNUMBER ARTISTNAME on the SALES.BIGHITS dataset. One or more options for creating index include: UNIQUE and NOMISS. |
proc datasets library=sales; modify bighits; index create cdnumber / unique; run; |
5. Using PROC DATASETS to create a simple index with the same name file CDNUMBER on the SALES.BIGHITS dataset. One or more options for creating index include: UNIQUE and NOMISS. |
proc datasets library=sales; modify bighits; index create numname=(cdnumber artistname) / nomiss; run; |
6. Using PROC DATASETS to create a complex index named NUMNAME which is two variables CDNUMBER ARTISTNAME on the SALES.BIGHITS dataset. One or more options for creating index include: UNIQUE and NOMISS. |
data work.direct; do age=13,14; do until (eof); set class key=age end=eof; if _IORC_=0 then do; /* 0 indicates a match was found */ put _all_; output; end; else _ERROR_=0; /* if no match, reset the error flag and continue */ end; end; stop; /* stops future searching on ages 13 and 14 have been found */ run; |
7. Using the DATA Step to use a simple index with the KEY= option for direct access method of the AGE key variable. SAS automatically knows an simple index exits on AGE. For composite index, such as COUNTRY_STATE, SAS can select records based on two or more variables. This is similar to using the POINT= option with record #. |
data work.driver; age=13; output; age=14; output; run; data work.direct; set work.driver; /* <- sequential access & implicit loop */ do until (eof); /* <- explicit loop */ set work.class key=age end=eof; /* <- direct access */ if _IORC_=0 then do; /* 0 indicates a match was found */ put _all_; output; end; else _ERROR_=0; /* if no match, reset the error flag and continue */ end; run; |
8. Using DATA Step to use a simple index with the KEY= option with values in another dataset using direct access method of the AGE key variable. |
data illinois; set indexlib.prodindx (idxname=state); where state eq 'Illinois' and product in ('BED' 'DESK'); run; |
9. User has option to override SAS in using the default index using the IDXNAME= DATA set option. In WHERE statements, users also have the option to force or turn off using SAS index with the IDXWHERE= option. |
Balance of Efficiency Resources |
---|
Reducing CPU often results in reducing Data Storage and Input/Output but may increase Memory usage |
80% improvements can be expected with simple programming techniques 20% improvements can be expected with fine tuning, optimal combination of resources, index, system options, hardware upgrades, datawarehouse strategies such as summary-level datasets |
Programming Techniques - using Proc SQL for multi-tasking or using PROC DATASETS to modify data set structure |
CPU Time - using KEEP or DROP to limit number of variables or PROC SORT options such as SORTEDBY= and PRESORTED options |
Data Storage - using LENGTH to reduce variable size or SAS View instead of a SAS Dataset |
Input/Output - using WHERE data set option to limit number of records |
Memory - reading only the records, variables or datasets required |
DATA Step vs. Proc SQL Differences |
In general, simple merges are faster with DATA Steps and complex merges are faster with Proc SQL. This is because DATA Step uses sequential reads while Proc SQL first creates a Cartesian Product and then subsets the dataset. |
Dataset Factors - Size, Static, Relationships between datasets, Density of Matches, and Outside SAS Access. |
Operating System - Windows and Unix (generally similar), MVS and VMS (DATA step is generally faster). |
One-Stop Reporting - Proc SQL is faster than 2 Proc Sorts, DATA merge, 1 Proc Sort and 1 Proc Print. |
Improve Proc SQL by up to 2X - When joining tables, process pre-sorted tables as compared to processing unsorted tables. |
Animated Guide Speed Merges: resource use by common procedures, Russ Lavery
Proc SQL Advantages |
Multi-tasking and sorting with one pass instead of multiple passes SELECT name, sex (Memory) FROM sashelp.class (CPU with SORTEDBY= option when joining tables) |
SQL Pass-Through for native database processing |
Validate syntax before executing |
Better programming style with concise and self-documenting code |
(SORTEDBY = ) dataset option to take prevent re-sorting dataset |
Utilize dictionary tables to access summary level data |
SAS® Paper Dear Miss SASAnswers: A Guide to Efficient PROC SQL Coding
SAS® Savvy Resource SAS Programming Efficiency Factors Summary Sheet
Compare Program Efficiency with DATA Step
Working the System: Our Best SAS Options, Patrick Thornton, Luliana Barbalau
Save/Retrieve SAS Systems Options Are Your Friends, Edward Heaton
Beginner SAS Programmer |
Beginner Paper and Survey SAS® Performance Tuning Strategies and Techniques, Kirk Lafler
The How and When of Efficient Programming, Frank DiIorio [Presentation]
SAS® Index Paper The Basics of Using SAS Indexes, Michael Raithel [Presentation]
Advanced SAS Programmer |
Advanced Parallel Process Techniques Multi-Process Connect Paper, John Cohen
System AutoExec, Setup.SAS file, Config file and Batch processing
Changing SAS System Option Settings
1. USING AUTOEXEC.SAS TO CUSTOMIZE SAS SESSIONS, Jennifer Price
2. %SETUP – A Utility Macro to Build Directory Structures and a SAS Initialization Program, Guowei Wu [Macro]
3. A SASautos Companion: Reusing Macros, Ronald Fehd
4. Run All Your SAS® Programs in One Program: Automatically, Ling Y. Chen, Steven A. Gilbert
5. Batch Processing with SAS®: Beyond Running Programs Overnight, Lisa Sanbonmatsu
6. More Than Batch – A Production SAS® Framework, Denis Cogswell
7. Interactive SAS® Mode vs. Batch Mode, Irina Walsh
8. Double-Clicking a SAS® File: What Happens Next?, Sandy Gibbs, Michele Mosca, and Donna Bennett
10. How Do I …. ? Some Beginners FAQs, Peter Eberhardt, Audrey Yeo
11. SAS® System Options: The True Heroes of Macro Debugging Kevin Russell and Russ Tyndall
12. An Autoexec Companion, Allocating Location Names during Startup, Ronald Fehd
General Papers
1. SAS Program Efficiency for Beginners, Bruce Gilsen
2. Summarizing Data with Base SAS® PROCs, Debbie Buck
4. The Most Important Efficiency Techniques, Bob Virgile
5. Efficiency Considerations Using the SAS® System, Rick Langston
6. Efficiency Techniques for Accessing Large Data Files, By AndrewWilcox
7. Efficiency Techniques for Beginning PROC SQL Users, Kirk Lafler
8.The DATA Statement: Efficiency Techniques, S. David Riba
9. Efficient Techniques and Tips in Handling Large Datasets, Jason Shilong Kuang
11. Make Your Life a Little Easier: A Collection of SAS Macro Utilities, Pete Lund [NONOTES]
13. Keeping Your Data in Step - Utilizing Efficiencies, Michael Sadof
14. SAS® Macro Autocall and %Include, Jie Huang and Tracy Lin
15. Think FAST! Use Memory Tables (Hashing) for Faster Merging, Gregg Snell [Index]
16. Don't Be a Litterbug: Best Practices for Using Temporary Files in SAS, Richard Langston
17. WHATlTDO: DOCUMENTATION AUTOMATICALLY PRODUCED BY YOUR SAS PROGRAM, Karen Crandall
18. Run All Your SAS® Programs in One Program: Automatically, Ling Y. Chen, Steven A. Gilbert
19. Efficient Statistical Programming? – These Small Steps Make the Difference, Keiko Powers
21. What’s In A Name; Describing SAS File Types, Randall Cates [Index]
22. How Many Observations Are In My Data Set?, Jack Hamilton
23. Keep it Clean, Cecilia Mauldin
24. Merge vs. Join vs. Hash Objects: A Comparison using "Big" Medical Device Data, James Johnson
25. Secrets of Efficient SAS® Coding Techniques, Andrew Kuligowski, Swati Agarwal
26. GAINING EFFICIENCY WITH SAS® SOFTWARE, Kirk Lafler
27. Efficient SAS Coding Techniques: Measuring the Effectiveness of Intuitive Approaches, Frank Dilorio
28. Proper Housekeeping – Developing the Perfect “Maid” to Clean your SAS® Environment, Chuck Bininger
29. An Overview of Syntax Check Mode and Why it is Important, Thomas Billings
30. SAS Statements Used Anywhere, Karin LaPann [Duplicate Libnames]
31. SAS® and Open Source: Two Integrated Worlds, Jesse Luebbert and Radhikha Myneni
32. Copy That! Using SAS® to Create Directories and Duplicate Files, Nicole Ciaccia
34. Proper Housekeeping – Developing the Perfect “Maid” to Clean your SAS® Environment, Chuck Bininger
35. Solving SAS Performance Problems: Our Methodology, Jim Kuell
36. Using SAS® Job Monitor to Analyze SAS Data Integration Studio Job Performance, Jeff Dyson
37. Checking and Tracking SAS Programs Using SAS Software, Keith Gregg, Yefim Gershteyn
38. Slim Down Your Data, Mickael Borne
SAS Indexes
1. Frequently Asked Questions about SAS® Indexes, Billy Clifford
2. Creating and Exploiting SAS Indexes, Michael Raithel
3. A Faster Index for Sorted SAS® Datasets, Mark Keintz
4. Using SAS Indexes with Large Databases, Alex Vinokurov, Lawrence Helbers
5. CRAFTING YOUR OWN INDEX: WHY, WHEN, HOW, Paul M. Dorfman, Lessia S. Shajenko
6. Add a Little Magic to Your Joins, Kirk Paul Lafler
7. Power Indexing: A Guide to Using Indexes Effectively in Nashville Releases, Diane Olson [IDXWHERE]
SAS® Savvy Class:Maximizing Productivity and Efficiency using Proc SQL