Smarter Proc SQL searches - It is the only SAS® Procedure that most closely resembles the DATA Step. Learn how to create simple queries to complex joins from these top SAS® papers or by access the Proc SQL Summary Sheet.
See also DATA Step and Merge, SAS Macro Programming, Dictionary tables and SAS Program Efficiency, Large Dataset and System Options. See SAS® blog on adding group descriptive statistics. See also Lab Data blog for examples of complex clinical trials data analysis.
Basic Structure of SELECT statement:
SELECT name, sex FROM sashelp.class WHERE sex = 'F'
GROUP BY sex HAVING weight > avg(weight) ORDER BY name;
Basic Structure of JOINs: Worksheet
CREATE table mytable as SELECT class.name, students.sex FROM sashelp.class as class, mylib.students as students WHERE class.name = students.name and class.sex = students.sex;
CREATE table mytable as SELECT class.name, students.sex FROM sashelp.class as class LEFT JOIN mylib.students as students ON class.name = students.name and class.sex = students.sex;
Proc SQL features by clauses
CALCULATED (Required before new variable) |
SUMMARY FUNCTIONS |
SUBQUERIES |
Detail/Summary Level |
|
5. SELECT |
X (For CASE clause, specify as SELECT CASE CALCULATED NEW_VAR WHEN ...) |
X (Can re-merge fixed value by group variable to calculate differences in fixed and variable values) |
X |
|
1. FROM or JOIN |
X |
|||
2. WHERE or ON |
X |
X (When joining tables, first detail subset then apply summary function in second PROC SQL) |
X |
Along with joins by key variables, any additional conditons with 'AND' will subset at the detail or summary level. Order is first detail level subset and then summary level subset. |
3. GROUP BY |
X |
X |
||
4. HAVING |
X |
X (When joining tables, can summary subset to exclude records post join) |
X |
Second Summary Subset |
6. ORDER BY |
X |
X |
Horizontal Summary Functions (Across multiple variables, Detail level subset) |
Vertical Group Summary Functions (Across records for 1 variable, Summary level subset) |
---|---|
SELECT SEX, MAX(WEIGHT, HEIGHT) FROM SASHELP.CLASS WHERE AGE > 10 ORDER BY SEX; |
SELECT SEX, COUNT(NAME) FROM SASHELP.CLASS GROUP BY SEX HAVING COUNT(NAME) > 2; |
ORDER BY, WHERE |
GROUP BY, HAVING |
Steps for Building Datasets with Proc SQL
The Ultimate Proc SQL Example
Complex and Fuzzy Join with Repeated Table, Summary Variables and Conditons
Not easily possible with DATA Step: 1. _DDAY dataset is referenced twice with different conditions. 2. DATEPART functions are applied in ON conditions. 3. COALESCE() function is applied to assure non-missing variable. 4. Summary functions are applied in SELECT statement to assure on record by GROUP BY variables SUBJECT, PKDTF and PKREFID. 5. CALCULATED is applied in GROUP BY clause since SUBJECT was created in SELECT clause. 6. Can JOIN by variables from any of the four datasets. 7. Can have flexibility in JOINs such as FULL, LEFT and RIGHT.
CREATE table qc_tpt as
SELECT unique coalesce(a.subject, b.subjid, substr(c.usubjid, 17, 4))
as subject, b.pkdtf, a.drawdt, a.timepoint, a.pk_code, b.pkrefid,
max(c.ddayf) as nddayf format=datetime16., (b.pkdtf - calculated
nddayf)/(60*60) as hrdiff format=4.1,
max(d.ddayf) as nddayf3 format=datetime16., (b.pkdtf - calculated
nddayf3)/(60*60) as hrdiff3 format=4.1
FROM _pk2 as a
FULL JOIN _pkcrf as b on a.subject=b.subjid and b.pkrefid=a.pk_code
FULL JOIN _dday as c on b.subject=substr(c.usubjid, 17, 4)
and ( (a.timepoint ^= 'Predose' and c.ddayf <= b.pkdtf) or
(a.timepoint = 'Predose' and (datepart(c.ddayf) = datepart(b.pkdtf))
or (c.ddayf <= b.pkdtf) ) )
FULL JOIN _dday as d on c.subject=substr(d.usubjid, 17, 4)
and (d.ddayf <= b.pkdtf)
GROUP BY calculated subject, b.pkdtf, b.pkrefid;
Fuzzy Join with EQT to link related dates
CREATE TABLE fpgms as
SELECT UNIQUE scan(a.pgname, 1) as spgname, (coalesce(d.pgname,
f.pgname, g.pgname)) as pgname,
max(d.dslabel) as dslabel,
max(d.spname) as spname, max(d.spgdt) as spgdt format=date9.,
max(a.pdttm) as spgdtm format=datetime16., max(e.pdttm) as lpgdtm
format=datetime16.,
max(d.vpname) as vpname, max(d.vpgdt) as vpgdt format=date9.,
max(b.pdttm) as vpgdtm format=datetime16.,
max(c.pdttm) as vfrdtm format=datetime16.,
max(f.pdttm) as rtfdtm format=datetime16.,
max(g.pdttm) as dpgdtm format=datetime16.
FROM spgms as a
FULL JOIN vpgms as b on strip(scan(a.pgname, 1)) EQT strip(scan(b.pgname, 1))
FULL JOIN vfrms as c on strip(scan(b.pgname, 1)) EQT strip(scan(c.pgname, 1))
FULL JOIN pindex as d on strip(scan(c.pgname, 1)) EQT strip(scan(d.pgname, 1))
FULL JOIN lpgms as e on strip(scan(d.pgname, 1)) EQT strip(scan(e.pgname, 1))
FULL JOIN rtfs as f on strip(scan(e.pgname, 1)) EQT strip(scan(f.pgname, 1))
FULL JOIN dpgms as g on strip(scan(f.pgname,1)) EQT strip(scan(g.pgname, 1))
GROUP BY calculated pgname
;
* Drop non-targets that already have a target visit match;
CREATE new_vwm3nt (where=(xtra = .)) as
SELECT a.*, b.t_vn as xtra FROM new_vwm3nta as a LEFT JOIN new_vwm3t as b
on a.subjid=b.subjid and a.lbtestid=b.lbtestid and a.t_vn=b.t_vn;
* Remerge group counts - <group_variable>n or other group descriptive stats; * Select all source variables;
* Left join to keep all records;
* Subquery - count and link by name, group by sex;
create table class2 as
select a.*, b.sexn from sashelp.class as a
left join
(select name, count(name) as sexn from sashelp.class where sex > ' ' group by sex
) as b on a.name=b.name;
See SAS blog - Something for nothing? Adding group descriptive statistics.
Many-to-Many Join SAS Example - Note that LEFT JOIN is applied to keep all AE records, dataset WHERE option can be applied to AE_MED final dataset if needed. For difficult cases, consider transposing one dataset and then concateniating variables into one variable to remove multiple key variables and then merge the final dataset instead of the original multiple key variable records.
**** MERGE MEDICATIONS WITH ADVERSE EVENTS;
proc sql;
create table ae_meds as
select a.subject_id, a.ae_start, a.ae_stop, a.adverse_event,
c.cm_start, c.cm_stop, c.conmed from
aes as a left join conmeds as c
on (a.subject_id = c.subject_id) and
( (a.ae_start <= c.cm_start <= a.ae_stop) or
(a.ae_start <= c.cm_stop <= a.ae_stop) or
((c.cm_start < a.ae_start) and (a.ae_stop < c.cm_stop)) );
quit;
Generally requires AE starting and AE ending date variables and is applied to combine AEs with ConMeds or PK with Dosing.
General Steps: 1. Merge all records with each other.
2. Condition keeps any overlap of ConMeds visit dates in-between starting and ending AE visit dates based on three rules: a. Starting ConMeds between AE start and AE stop visit dates, b. Stopping ConMeds between AE start and AE stop visit dates, c. Starting ConMeds before AE start and stopping ConMeds after AE stop visit dates.
* Code to standardize list of values with sequence numbers starting from 1;
proc sql;
create table _seq_n as
select unique sex
from sashelp.class where sex > ' '
group by sex;
quit;
* Create sequence variable;
data _seq_n;
set _seq_n;
seq_n + 1;
run;
* Attach sequence variable to dataset;
proc sql;
create table class2 as
select a.*, b.seq_n
from sashelp.class as a left join _seq_n as b on a.sex=b.sex;
quit;
SAS® Institute Book Guide PROC SQL Examples
SAS® Reference PROC SQL Syntax
SAS® Reference Proc SQL Summary Functions - Examples, Syntax
Beginner SAS Programmer |
Beginner Paper PROC SQL for DATA Step Die-hards, Christianna S. Williams
Animated Guide PROC SQL, Russ Lavery
HOW Paper Ready to Become Really Productive Using Proc SQL?
SQL Set Operators Presentation
An Intro to PROC SQL: Simple & Complex Queries Melissa Navarro
Advanced SAS Programmer |
Unique PROC SQL PROC SQL - Is it a Required Tool for Good SAS Programming?, Ian Whitlock
The Power of PROC SQL, Amanda Reilly [Presentation] [Paper]
Subqueries in - WHERE, HAVING, FROM, SELECT, SET, and JOINS
Subquery Paper 1 The Many Users of SQL Subqueries, Tasha Chapman
Subquery Paper 2 Working with Subquery in the SQL Procedure, Lei Zhang and Danbo Yi [Correlated Subquery]
Subquery Paper 3 SQL SUBQUERIES: Usage in Clinical Programming, Pavan Vemuri
Subquery Paper 4 An In-Line View to a SQL, Darryl Putnam
Validation Paper Clinical Trial Data Validation: Using SAS PROC SQL Effectively
SAS® Paper Dear Miss SASAnswers: A Guide to Efficient PROC SQL Coding
Database Accessing Using The SAS System
Compare Efficiency with DATA Step
Macro SAS Programmer |
HOW Macro Paper How To Use Proc SQL select into for List Processing, Ronald Fehd [Manage Lists]
Search for Proc SQL Papers by Kirk Lafler
1. A Hands-On Tour Inside the World of PROC SQL, Kirk Lafler
2. Undocumented and Hard-to-find SQL Features, Kirk Lafler
3. Kirk’s Ten Best PROC SQL Tips and Techniques, Kirk Lafler
4. Ten Great Reasons to Learn SAS Software's SQL Procedure, Kirk Lafler
5. Introduction to Using Proc SQL, Thomas J. Winn
7. Proc SQL – A Primer for SAS® Programmers, Jimmy DeFoor [Having, IN operator]
8. Top Ten Reasons to Use PROC SQL, Weiming Hu
9. Using the Magical Keyword INTO: in PROC SQL, Thiru Satchi
10. What would I do without PROC SQL and the Macro Language, Jeff Abolafia
12. Simplifying NDA Programming with Proc SQL, Aileen L. Yam
14. PROC SQL: From SELECT to Pass-Through SQL, Christopher Schacherer, Michelle Detry
15. PROC SQL: A Powerful tool to Impove Your Data Quality, Keh-Dong Shiang [COUNT(), NMISS(), MISSING(), UNIQUE(), Validation Rules, Macro Loop]
16. Get SAS®sy with PROC SQL, Amie Bissonett
17. Top 10 Most Powerful Functions for PROC SQL, Chao Huang, Yu Fu [COUNT(), NMISS(), MISSING(), UNIQUE(), IFN, IFC]
19. A Visual Introduction to SQL Joins, Kirk Paul Lafler
20. SQL or not SQL When Performing Many-to-Many Merge – A Realistic Example, Wenjie Wang, Jade Huang
21. Using the Power of SAS SQL, Jessica Wang
22. Principles of Writing Readable SQL, Kenneth W. Borowiak
23. Introduction to Proc SQL, Katie Minten Ronk [Proc SQL options, Remerge, Percentages]
24. Combining Summary Level Data with Individual Records, Frank Ivis
25. The SQL Procedure: Tricks and Techniques for Efficiency and Simplicity, Sreekanth Middela, Suhas Sanjee [MONOTONIC()]
26. Helpful Undocumented Features in SAS®, Wei Cheng [MONOTONIC(), EQT]
27. Best Practices: Subset Without Getting Upset, Mary Rosenbloom, Kirk Lafler
28. SAS Data Views: A Virtual View of Data, John C. Boling
29. Solving Business Problems with the SQL Procedure, Kirk Lafler, Sunil Gupta
30. Frame Your View of Data with the SQL Procedure, Kirk Lafler
31. Diving into SAS Software with the SQL Procedure, Kirk Lafler
33. Nifty Uses of SQL Reflexive Join and Sub-query in SAS®, Cynthia Trinidad
34. A Comparison Between SQL and Data Step and Some SAS® Procedures, LaTonya Murphy, Lan Tran
35. Top 10 SQL Tricks in SAS®, Chao Huang [MEDIAN, HAVING SUM(CASE WHEN]
36. The SQL Procedure: When and How to Use It?, Ying Feng
37. Improving Maintenance and Performance of SQL queries, Bas van Bakel, Rick Pagie [Index]
38. Leveraging SQL Return Codes When Querying Relational Databases, John Bentley [SQLOBS]
39. Taking Advantage of Missing Values in Proc SQL, Ya Huang
40. PROC SQL for SQL Die-hards, Barbara Ross, Mary Jessica Bennett [||, CATS()]
41. Matching SAS® Data Sets with PROC SQL: If at First You Don’t Succeed, Match, Match Again, Imelda Go
42. Mastering Data Summarization with PROC SQL Christianna Williams
43. PROC SQL for PROC SUMMARY Stalwarts, Christianna Williams
44. SQL Step by Step: An advanced tutorial for business users, Judy Loren, Gregory Barnes Nelson
45. Efficient SQL for Pharma… and Other Industries, Chris Olinger [Advance]
46. MERGING vs. JOINING: Comparing the DATA Step with SQL, Malachy Foley [Cartesian Product]
48. Proc SQL Tips and Techniques - How to get the most out of your queries, Kevin McGowan, Brian Spruell
49. Ditch the Data Memo: Using Macro Variables and Outer Union Corresponding in PROC SQL to Create Data Set Summary Tables, Andrea Shane [Append table rows]
50. DATA Step vs. PROC SQL: What’s a neophyte to do?, Craig Dickstein, Ray Pass [Data Entry]
51. Yes, Proc SQL is Great, But I Love Data Step Programming, What's a SAS User To Do?, Mira J. Shapiro
52. Queries, Joins, and WHERE Clauses, Oh My!! Demystifying PROC SQL, Christianna Williams [Advanced Joins, Subquery]
53. SQL SET OPERATORS: SO HANDY VENN YOU NEED THEM, Howard Schreier [OUTER UNION, UNION, INTERSECT, and EXCEPT]
54. Intermediate PROC SQL, Thomas Winn [Correlated Subquery]
55. Advanced Programming Techniques with PROC SQL, Kirk Paul Lafler [First., Last.]
56. Using Data Set Options in PROC SQL, Kenneth Borowiak
57. Relational Database Schemes and SAS Software SQL Solutions, Sigurd W. Hermansen
58. Update, Insert, and Carry-Forward Operations in Database Tables Using SAS® Enterprise Guide, Thomas E. Billings, Sreenivas Mullagiri [Integrity Constraints]
59. PROC SQL: Tips and Translations for Data Step Users, Susan Marcella, Gail Jorgensen
60. Labor-saving SQL Constructs, Brian Fairfield-Carter, David Carr
61. Cool SQL Tricks Russ Lavery
63. Fuzzy Matching Programming Techniques Using SAS® Software, Stephen Sloan, Kirk Paul Lafler
64. Getting Your Random Sample in Proc SQL, Richard Severino
65. Confessions of a PROC SQL Instructor, Charu Shankar
66. Fun with PROC SQL, Darryl Putnam [having count(sales)>1) ]
67. PROC SQL: Make it a monster using these powerful functions and options, Arun Raj Vidhyadharan, Sunil Mohan Jairath [IFN, IFC, Colon Modifier]
68. Using Lookup Tables to Match Data, Wenyu Hu, Fang Chen, Liping Zhang
69. Table Lookup Techniques: From the Basics to the Innovative, Art Carpenter
70. Look Up Not Down: Advanced Table Lookups in Base SAS, Jay Iyengar, Josh Horstman
71. Table Lookups: From IF-THEN to Key-Indexing, Art Carpenter
72. Step-by-Step SQL Procedure, Charu Shankar [Boolean Conditions in Summary Functions]
74. Five Secrets of The SQL Goddess, Charu Shankar
75. Time Since Last Dose: Anatomy of a SQL Query, Derek Morgan
76. SQL 101 Course Notes [PharmaSUG]
77. PROC SQL - GET "INTO:" IT!, Kelly Schlessman
78. Let SQL Write SQL Scripts for You – Counts Report, Christine Teng
79. Something for Nothing? Adding Group Descriptive Statistics Using PROC SQL
Subqueries - Presentation, Sunil Gupta
80. An Introduction to SQL in SAS, Pete Lund[Subqueries]
Everything you need to know about subsetting data using SQL Sub-Queries
Curious to learn more about how you can use SQL sub-queries? You're in the right place. SQL queries are the basic and essential aspects of RDBMS CRUD (create, read, update, delete). But when your program or database grows, the need for a sharp-witted query to retrieve data becomes a must-have.
What Is a Subquery in SQL?
A subquery selects rows from one table based on values in another table. SQL subqueries are also called nested queries or inner queries, while the SQL statement containing the subquery is usually referred to as an outer query. Depending on the clause that contains it, a subquery can return a single value or multiple values. Subqueries are most often used in WHERE and HAVING clauses.
Sub-Queries with Where Clause
One of the most basic structures of a sub-query in SQL is within the Where clause. It's the simplest way to define what you are searching for. The select statement returns values as per the sub-query condition(s) and uses it as a parameter for the main query.
proc sql number feedback;
select *
from sashelp.class
where Age > (select avg(Age) from sashelp.class);
quit;
The following PROC SQL query contains a subquery in the HAVING clause:
proc sql number;
select Name, Sex, avg(Age) as AvgAge
from sashelp.class
group by Sex
having calculated AvgAge > (select avg(Age) from sashelp.class);
quit;
The subquery shown above is a single-value subquery. It returns a single value, the average age from the table sashelp.class to the outer query. A subquery can return values for multiple rows but only for a single column. The table that a subquery reference can be either the same as or different from the table that is referenced by the outer query. In this PROC SQL query shown above, the subquery selects data from the same table as the outer query.
The two types of subqueries
Noncorrelated: A noncorrelated subquery is a self-contained subquery that executes independently of the outer query. The simplest type of subquery is a noncorrelated subquery that returns a single value.
Correlated: A dependent subquery is one that requires one or more values to be passed to it by the outer query
Both noncorrelated and correlated subqueries can return either single or multiple values to the outer
query.
Subsetting Data by Using Noncorrelated Subqueries
EXAMPLE: USING SINGLE-VALUE NONCORRELATED SUBQUERIES
The following PROC SQL query is the same query that is used previously.
proc sql;
select Name, Sex, avg(Age) as AvgAge
from sashelp.class
group by Sex
having calculated AvgAge > (select avg(Age) from sashelp.class);
quit;
PROC SQL always evaluates a noncorrelated subquery before the outer query.
Connect to key SAS® books and references for more information
The Essential PROC SQL Handbook for SAS Users by Katherine Prairie
PROC SQL by Example: Using SQL within SAS by Howard Schreier
SAS® Savvy Class: Maximizing Productivity and Efficiency using Proc SQL