PROC SQL    

 e-Guide

Mind Map 

 Getting Started

Training Videos: 

Proc SQL e-Guide I, Proc SQL e-Guide I

 Best of Both Worlds - Proc SQL and Data Step Joins

Ready to Become Really Productive using Proc SQL?
 SQL Tutorial
  Maximizing Productivity and Efficiency using Proc SQL ($120)

Smarter Proc SQL searches - It is the only SAS® Procedure that most closely resembles the DATA Step. Learn how to create simple queries to complex joins from these top SAS® papers or by access the Proc SQL Summary Sheet.  

See also DATA Step and MergeSAS Macro Programming, Dictionary tables and SAS Program Efficiency, Large Dataset and System Options.  See SAS® blog on adding group descriptive statistics.  See also Lab Data blog for examples of complex clinical trials data analysis.   

Basic Structure of SELECT statement:

SELECT name, sex FROM sashelp.class WHERE sex = 'F'
GROUP BY sex HAVING weight > avg(weight) ORDER BY name;

Basic Structure of JOINs: Worksheet

CREATE table mytable as SELECT class.name, students.sex FROM sashelp.class as class, mylib.students as students WHERE class.name = students.name and class.sex = students.sex;

CREATE table mytable as SELECT class.name, students.sex FROM sashelp.class as class LEFT JOIN mylib.students as students ON class.name = students.name and class.sex = students.sex;

  ______________________________________________

Proc SQL


______________________________________________

Proc SQL features by clauses 

 

 CALCULATED

(Required before new variable)


SUMMARY FUNCTIONS 


SUBQUERIES

X 


Detail/Summary Level 

 

5. SELECT

 

 X (For CASE clause, specify as SELECT CASE CALCULATED NEW_VAR WHEN ...)

 X 

(Can re-merge fixed value by group variable to calculate differences in fixed and variable values)

 

 

 1. FROM

or JOIN

   

 X

(Dynamic SELECT clause of KEY and outer SELECT variables)

 

2. WHERE

or ON

 

X

 X

(When joining tables, first detail subset then apply summary function in second PROC SQL)

 

X

 Along with joins by key variables, any additional conditons with 'AND' will subset at the detail or summary level.  Order is first detail level subset and then summary level subset.

 3. GROUP BY

 

X


 X

   

 

4. HAVING

 

 X

 X

(When joining tables, can summary subset to exclude records post join)

 

X

 Second Summary Subset
 6. ORDER BY

 X

 X

   
 
Horizontal Summary Functions (Across multiple variables, Detail level subset)

Vertical Group Summary Functions (Across records for 1 variable, Summary level subset)

SELECT SEX, MAX(WEIGHT, HEIGHT) FROM SASHELP.CLASS WHERE AGE > 10 ORDER BY SEX;

 
SELECT SEX, COUNT(NAME) FROM SASHELP.CLASS GROUP BY SEX HAVING COUNT(NAME) > 2;

 ORDER BY, WHERE

GROUP BY, HAVING 

______________________________________________

______________________________________________


______________________________________________

           

______________________________________________

Steps for Building Datasets with Proc SQL

______________________________________________


______________________________________________

The Ultimate Proc SQL Example

______________________________________________

Complex and Fuzzy Join with Repeated Table, Summary Variables and Conditons

Not easily possible with DATA Step: 1. _DDAY dataset is referenced twice with different conditions.  2. DATEPART functions are applied in ON conditions.  3. COALESCE() function is applied to assure non-missing variable. 4. Summary functions are applied in SELECT statement to assure on record by GROUP BY variables SUBJECT, PKDTF and PKREFID.  5. CALCULATED is applied in GROUP BY clause since SUBJECT was created in SELECT clause.  6. Can JOIN by variables from any of the four datasets.  7. Can have flexibility in JOINs such as FULL, LEFT and RIGHT.

CREATE table qc_tpt as
SELECT unique coalesce(a.subject, b.subjid, substr(c.usubjid, 17, 4))
 as subject, b.pkdtf, a.drawdt, a.timepoint, a.pk_code, b.pkrefid,
 max(c.ddayf) as nddayf format=datetime16., (b.pkdtf - calculated
 nddayf)/(60*60) as hrdiff format=4.1,
 max(d.ddayf) as nddayf3 format=datetime16., (b.pkdtf - calculated
 nddayf3)/(60*60) as hrdiff3 format=4.1
FROM _pk2 as a
FULL JOIN _pkcrf as b on a.subject=b.subjid and b.pkrefid=a.pk_code
FULL JOIN _dday as c on b.subject=substr(c.usubjid, 17, 4)
 and ( (a.timepoint ^= 'Predose' and c.ddayf <= b.pkdtf) or
 (a.timepoint = 'Predose' and (datepart(c.ddayf) = datepart(b.pkdtf))
 or (c.ddayf <= b.pkdtf) ) )
FULL JOIN _dday as d on c.subject=substr(d.usubjid, 17, 4)
 and (d.ddayf <= b.pkdtf)
GROUP BY calculated subject, b.pkdtf, b.pkrefid;  

Fuzzy Join with EQT to link related dates

CREATE TABLE fpgms as
SELECT UNIQUE scan(a.pgname, 1) as spgname, (coalesce(d.pgname,
f.pgname, g.pgname)) as pgname,
max(d.dslabel) as dslabel,
max(d.spname) as spname, max(d.spgdt) as spgdt format=date9.,
max(a.pdttm) as spgdtm format=datetime16., max(e.pdttm) as lpgdtm
format=datetime16.,
max(d.vpname) as vpname, max(d.vpgdt) as vpgdt format=date9.,
max(b.pdttm) as vpgdtm format=datetime16.,
max(c.pdttm) as vfrdtm format=datetime16.,
max(f.pdttm) as rtfdtm format=datetime16.,
max(g.pdttm) as dpgdtm format=datetime16.
FROM spgms as a
FULL JOIN vpgms as b on strip(scan(a.pgname, 1)) EQT strip(scan(b.pgname, 1))
FULL JOIN vfrms as c on strip(scan(b.pgname, 1)) EQT strip(scan(c.pgname, 1))
FULL JOIN pindex as d on strip(scan(c.pgname, 1)) EQT strip(scan(d.pgname, 1))
FULL JOIN lpgms as e on strip(scan(d.pgname, 1)) EQT strip(scan(e.pgname, 1))
FULL JOIN rtfs as f on strip(scan(e.pgname, 1)) EQT strip(scan(f.pgname, 1))
FULL JOIN dpgms as g on strip(scan(f.pgname,1)) EQT strip(scan(g.pgname, 1))
GROUP BY calculated pgname
;

   * Drop non-targets that already have a target visit match;
   CREATE new_vwm3nt (where=(xtra = .)) as
   SELECT a.*, b.t_vn as xtra FROM new_vwm3nta as a LEFT JOIN new_vwm3t as b
   on a.subjid=b.subjid and a.lbtestid=b.lbtestid and a.t_vn=b.t_vn;

   * Remerge group counts - <group_variable>n or other group descriptive stats;    * Select all source variables;
   * Left join to keep all records;
   * Subquery - count and link by name, group by sex;
   create table class2 as
   select  a.*, b.sexn from sashelp.class as a
   left join
   (select name, count(name) as sexn from sashelp.class  where sex > ' ' group by sex
   ) as b on a.name=b.name;

 See SAS blog - Something for nothing?  Adding group descriptive statistics. 

______________________________________________ 

Many-to-Many Join SAS Example - Note that LEFT JOIN is applied to keep all AE records, dataset WHERE option can be applied to AE_MED final dataset if needed. For difficult cases, consider transposing one dataset and then concateniating variables into one variable to remove multiple key variables and then merge the final dataset instead of the original multiple key variable records.

**** MERGE MEDICATIONS WITH ADVERSE EVENTS;
proc sql;
   create table ae_meds as
   select a.subject_id, a.ae_start, a.ae_stop, a.adverse_event,
          c.cm_start, c.cm_stop, c.conmed from
   aes as a  left join  conmeds as c
   on (a.subject_id = c.subject_id) and
      ( (a.ae_start <= c.cm_start <= a.ae_stop) or
        (a.ae_start <= c.cm_stop <= a.ae_stop) or
       ((c.cm_start < a.ae_start) and (a.ae_stop < c.cm_stop)) );
quit;

Generally requires AE starting and AE ending date variables and is applied to combine AEs with ConMeds or PK with Dosing.

General Steps: 1. Merge all records with each other. 

2. Condition keeps any overlap of ConMeds visit dates in-between starting and ending AE visit dates based on three rules: a. Starting ConMeds between AE start and AE stop visit dates, b. Stopping ConMeds between AE start and AE stop visit dates, c. Starting ConMeds before AE start and stopping ConMeds after AE stop visit dates. 

* Code to standardize list of values with sequence numbers starting from 1;

proc sql;
 create table _seq_n as
   select unique sex
   from sashelp.class  where sex > ' '
   group by sex;
quit;

* Create sequence variable;
data _seq_n;
 set _seq_n;
 seq_n + 1;
run;

* Attach sequence variable to dataset;
proc sql;
 create table class2 as
   select a.*, b.seq_n
   from sashelp.class  as a left join _seq_n as b on a.sex=b.sex;
quit;

______________________________________________

SAS® Institute Book Guide PROC SQL Examples

SAS® Reference PROC SQL Syntax

SAS® Reference Proc SQL Summary Functions - ExamplesSyntax

 

 Beginner SAS Programmer

Beginner Paper PROC SQL for DATA Step Die-hards, Christianna S. Williams

Animated Guide PROC SQL, Russ Lavery

HOW Paper Ready to Become Really Productive Using Proc SQL?

SQL Set Operators Presentation

An Intro to PROC SQL: Simple & Complex Queries Melissa Navarro

 

 Advanced SAS Programmer

Unique PROC SQL PROC SQL - Is it a Required Tool for Good SAS Programming?, Ian Whitlock

The Power of PROC SQL, Amanda Reilly [Presentation] [Paper]

Subqueries in - WHERE, HAVING, FROM, SELECT, SET, and JOINS

Subquery Paper 1 The Many Users of SQL Subqueries, Tasha Chapman

Subquery Paper 2 Working with Subquery in the SQL Procedure, Lei Zhang and Danbo Yi [Correlated Subquery]

Subquery Paper 3  SQL SUBQUERIES: Usage in Clinical Programming, Pavan Vemuri

Subquery Paper 4 An In-Line View to a SQL, Darryl Putnam

Validation Paper Clinical Trial Data Validation: Using SAS PROC SQL Effectively

SAS® Paper Dear Miss SASAnswers: A Guide to Efficient PROC SQL Coding

Database Accessing Using The SAS System

Compare Efficiency with DATA Step

 

 Macro SAS Programmer

HOW Macro Paper How To Use Proc SQL select into for List Processing, Ronald Fehd [Manage Lists]

 

SAS Slides by SAS Techies

Fun  Puzzle  Answer

Search for Proc SQL Papers by Kirk Lafler

______________________________________________

1. A Hands-On Tour Inside the World of PROC SQL, Kirk Lafler

2. Undocumented and Hard-to-find SQL Features, Kirk Lafler

3. Kirk’s Ten Best PROC SQL Tips and Techniques, Kirk Lafler

4. Ten Great Reasons to Learn SAS Software's SQL Procedure, Kirk Lafler

5. Introduction to Using Proc SQL, Thomas J. Winn

6. Merging Tables in DATA Step vs. PROC SQL: Convenience and Efficiency Issues, Gajanan Bhat, Raj Suligavi

7. Proc SQL – A Primer for SAS® Programmers, Jimmy DeFoor [Having, IN operator]

8. Top Ten Reasons to Use PROC SQL, Weiming Hu

9. Using the Magical Keyword INTO: in PROC SQL, Thiru Satchi

10. What would I do without PROC SQL and the Macro Language, Jeff Abolafia

11. Application of Some Advanced PROC SQL, Features in Clinical Trial Programming, Changhong Shi, Sylvianne B. Roberge

12. Simplifying NDA Programming with Proc SQL, Aileen L. Yam

13.DATA Step vs. PROC SQL: Issues with common variables when combining data sets, Curt Edmonds, Sunil Gupta

14. PROC SQL: From SELECT to Pass-Through SQL, Christopher Schacherer, Michelle Detry

15. PROC SQL: A Powerful tool to Impove Your Data Quality, Keh-Dong Shiang  [COUNT(), NMISS(), MISSING(), UNIQUE(), Validation Rules, Macro Loop]


16. Get SAS®sy with PROC SQL, Amie Bissonett

17. Top 10 Most Powerful Functions for PROC SQL, Chao Huang, Yu Fu [COUNT(), NMISS(), MISSING(), UNIQUE(), IFN, IFC]

18. Getting More Out of “INTO” in PROC SQL: An Example for Creating Macro
Variables, Mary-Elizabeth Eddlestone

19. A Visual Introduction to SQL Joins, Kirk Paul Lafler

20. SQL or not SQL When Performing Many-to-Many Merge – A Realistic Example, Wenjie Wang, Jade Huang

21. Using the Power of SAS SQL, Jessica Wang

22. Principles of Writing Readable SQL, Kenneth W. Borowiak

23. Introduction to Proc SQL, Katie Minten Ronk [Proc SQL options, Remerge, Percentages]

24. Combining Summary Level Data with Individual Records, Frank Ivis

25. The SQL Procedure: Tricks and Techniques for Efficiency and Simplicity, Sreekanth Middela, Suhas Sanjee [MONOTONIC()]

26. Helpful Undocumented Features in SAS®, Wei Cheng [MONOTONIC(), EQT]

27. Best Practices: Subset Without Getting Upset, Mary Rosenbloom, Kirk Lafler

28. SAS Data Views: A Virtual View of Data, John C. Boling

29. Solving Business Problems with the SQL Procedure, Kirk Lafler, Sunil Gupta

30. Frame Your View of Data with the SQL Procedure, Kirk Lafler

31. Diving into SAS Software with the SQL Procedure, Kirk Lafler

32. Five Little Known, But Highly Valuable and Widely Usable, PROC SQL Programming Techniques, Kirk Lafler

33. Nifty Uses of SQL Reflexive Join and Sub-query in SAS®, Cynthia Trinidad

34. A Comparison Between SQL and Data Step and Some SAS® Procedures, LaTonya Murphy, Lan Tran

35. Top 10 SQL Tricks in SAS®, Chao Huang [MEDIAN, HAVING SUM(CASE WHEN]

36. The SQL Procedure: When and How to Use It?, Ying Feng

37. Improving Maintenance and Performance of SQL queries, Bas van Bakel, Rick Pagie [Index]

38. Leveraging SQL Return Codes When Querying Relational Databases, John Bentley [SQLOBS]

39. Taking Advantage of Missing Values in Proc SQL, Ya Huang

40. PROC SQL for SQL Die-hards, Barbara Ross, Mary Jessica Bennett [||, CATS()]

41. Matching SAS® Data Sets with PROC SQL: If at First You Don’t Succeed, Match, Match Again, Imelda Go

42. Mastering Data Summarization with PROC SQL Christianna Williams

43. PROC SQL for PROC SUMMARY Stalwarts, Christianna Williams

44. SQL Step by Step: An advanced tutorial for business users, Judy Loren, Gregory Barnes Nelson

45. Efficient SQL for Pharma… and Other Industries, Chris Olinger [Advance]

46. MERGING vs. JOINING: Comparing the DATA Step with SQL, Malachy Foley [Cartesian Product]

47. When PROC SQL Is Overwhelmed, How to Use the SyncJoin Algorithm to Match Millions of Records Fast, Houliang Li

48. Proc SQL Tips and Techniques - How to get the most out of your queries, Kevin McGowan, Brian Spruell

49. Ditch the Data Memo: Using Macro Variables and Outer Union Corresponding in PROC SQL to Create Data Set Summary Tables, Andrea Shane [Append table rows]

50. DATA Step vs. PROC SQL: What’s a neophyte to do?, Craig Dickstein, Ray Pass [Data Entry]

51. Yes, Proc SQL is Great, But I Love Data Step Programming, What's a SAS User To Do?, Mira J. Shapiro

52. Queries, Joins, and WHERE Clauses, Oh My!! Demystifying PROC SQL, Christianna Williams [Advanced Joins, Subquery]

53. SQL SET OPERATORS: SO HANDY VENN YOU NEED THEM, Howard Schreier [OUTER UNION, UNION, INTERSECT, and EXCEPT]

54. Intermediate PROC SQL, Thomas Winn [Correlated Subquery]

55. Advanced Programming Techniques with PROC SQL, Kirk Paul Lafler [First., Last.]

56. Using Data Set Options in PROC SQL, Kenneth Borowiak

57. Relational Database Schemes and SAS Software SQL Solutions, Sigurd W. Hermansen

58. Update, Insert, and Carry-Forward Operations in Database Tables Using SAS® Enterprise Guide, Thomas E. Billings, Sreenivas Mullagiri [Integrity Constraints]

59. PROC SQL: Tips and Translations for Data Step Users, Susan Marcella, Gail Jorgensen

60. Labor-saving SQL Constructs, Brian Fairfield-Carter, David Carr

61. Cool SQL Tricks Russ Lavery

62. Fun with Address Matching: Use of the COMPGED Function and the SQL Procedure, Bianca Salas, Alexandra Varga, and Elizabeth Shuster

63. Fuzzy Matching Programming Techniques Using SAS® Software, Stephen Sloan, Kirk Paul Lafler

64. Getting Your Random Sample in Proc SQL, Richard Severino

65. Confessions of a PROC SQL Instructor, Charu Shankar

66. Fun with PROC SQL, Darryl Putnam [having count(sales)>1) ]

67. PROC SQL: Make it a monster using these powerful functions and options, Arun Raj Vidhyadharan, Sunil Mohan Jairath [IFN, IFC, Colon Modifier] 

68. Using Lookup Tables to Match Data, Wenyu Hu, Fang Chen, Liping Zhang

69. Table Lookup Techniques: From the Basics to the Innovative, Art Carpenter

70. Look Up Not Down: Advanced Table Lookups in Base SAS, Jay Iyengar, Josh Horstman

71. Table Lookups: From IF-THEN to Key-Indexing, Art Carpenter

72. Step-by-Step SQL Procedure, Charu Shankar [Boolean Conditions in Summary Functions]

73. Teaching a New Dog Old Tricks - Using the EXCEPT Operator in PROC SQL and Generation Data Sets to Produce a Comparison Report, Stanley Fogleman 

74. Five Secrets of The SQL Goddess, Charu Shankar

75. Time Since Last Dose: Anatomy of a SQL Query, Derek Morgan

76. SQL 101 Course Notes [PharmaSUG]

77. PROC SQL - GET "INTO:" IT!, Kelly Schlessman

78. Let SQL Write SQL Scripts for You – Counts Report, Christine Teng


______________________________________________

Everything you need to know about subsetting data using SQL Sub-Queries
 
Curious to learn more about how you can use SQL sub-queries? You're in the right place. SQL queries are the basic and essential aspects of RDBMS CRUD (create, read, update, delete). But when your program or database grows, the need for a sharp-witted query to retrieve data becomes a must-have.

What Is a Subquery in SQL?
 
A subquery selects rows from one table based on values in another table. SQL subqueries are also called nested queries or inner queries, while the SQL statement containing the subquery is usually referred to as an outer query. Depending on the clause that contains it, a subquery can return a single value or multiple values. Subqueries are most often used in WHERE and HAVING clauses.
 
Sub-Queries with Where Clause
One of the most basic structures of a sub-query in SQL is within the Where clause. It's the simplest way to define what you are searching for. The select statement returns values as per the sub-query condition(s) and uses it as a parameter for the main query.
proc sql number feedback;
               select *
               from sashelp.class
               where Age > (select avg(Age) from sashelp.class);
quit;
 
The following PROC SQL query contains a subquery in the HAVING clause:
proc sql number;
               select Name, Sex, avg(Age) as AvgAge
               from sashelp.class
               group by Sex
               having calculated AvgAge > (select avg(Age) from sashelp.class);
quit;
 
The subquery shown above is a single-value subquery. It returns a single value, the average age from the table sashelp.class to the outer query. A subquery can return values for multiple rows but only for a single column. The table that a subquery reference can be either the same as or different from the table that is referenced by the outer query. In this PROC SQL query shown above, the subquery selects data from the same table as the outer query.
 
The two types of subqueries
Noncorrelated: A noncorrelated subquery is a self-contained subquery that executes independently of the outer query. The simplest type of subquery is a noncorrelated subquery that returns a single value.
Correlated: A dependent subquery is one that requires one or more values to be passed to it by the outer query
 
Both noncorrelated and correlated subqueries can return either single or multiple values to the outer
query.
 
Subsetting Data by Using Noncorrelated Subqueries
EXAMPLE: USING SINGLE-VALUE NONCORRELATED SUBQUERIES
The following PROC SQL query is the same query that is used previously.
proc sql;
               select Name, Sex, avg(Age) as AvgAge
               from sashelp.class
               group by Sex
               having calculated AvgAge > (select avg(Age) from sashelp.class);
quit;
PROC SQL always evaluates a noncorrelated subquery before the outer query. 

______________________________________________ 

Connect to key SAS® books and references for more information

The Essential PROC SQL Handbook for SAS Users by Katherine Prairie

PROC SQL by Example: Using SQL within SAS by Howard Schreier

WHERE expressions 1 2

SAS Functions

SAS® Savvy Class: Maximizing Productivity and Efficiency using Proc SQL

Powered by Wild Apricot Membership Software