New Clinical/Statistical SAS Programmer  

Both pharma and medical device SAS programmers can benefit from this page.  The process chart and topics below useful to new Clinical SAS® Programmers. Other topics within the pharmaceutical section are more advanced topics such as CDISC.  See also CDISC 101 Mapping Training videosStatistical Analysis and New to SAS® Programming.  See also SAS training session.




______________________________________________

 

SOPs: SAS Requirements, Testing and ValidationMindMap

A. Study Objectives 

1. Understand structure of clinical trials and how data is collected over time.

2. Understand how raw clinical data from case report forms are stored in SAS datasets.

3. Understand how to clean clinical data.

4. Know how to create descriptive statistics.

5. Know how to apply statistical modeling on clinical data.

CDISC for Therapeutic areas   CDISC Glossary

 


B. Pharmaceutical Terms (NCI Glossary)

BaselineSAPTable LookupVisit Windows See Drug Exposure in Clinical Data Reporting

Double-Blinded Clinical Study: Both patient nor the site knows which drug is taken.  Most clinical studies are double-blinded with the sponsor blinded until the study is unblinded.

Triple-Blinded Clinical Study: Patient nor the site or the sponsor knows which drug is taken.  Few clinical studies are classified as triple-blinded.

Active vs Control: Active treatment is the drug that is being tested.  Control is the placebo or standard method of treatment.


Dosing - Fixed (500 mg) or variable (bsa - 5 mg/m2, weight - 5 mg/kg)

Cycles - ex. 3 of days dosed per cycle, start and stop dates per cycle, (stop date - start date + 1) *exdose

Visit - ex. cycles per visit

Single/Multiple Dose Studies: Fixed, Different or Escalation Doses in phases or cycles

Baseline: For all other parameters, the baseline measurement will be the pre-dose value collected
on Day 1 or if not available, then the last value collected before Day 1.  Baseline is not Day 0 since study day 0 is not valid.

Change: Change from baseline at a particular post-baseline time point will be computed as the
value at the post-baseline time point minus the baseline value.  For example, base = 10, post-baseline = 15, change = 5 and percent change is 50% (15-10/10) * 100.  See lab data.

Duration of Exposure (days/weeks): Duration of exposure (weeks) will be computed as the date of the last
dose of study drug minus the date of the first dose of study drug, plus 1 day (that is, the study day associated with the date of the last dose of study drug) divided by 7 days per week.

Cumulative dose (mg): SUM of all dose given during the study treatment exposure across all cycles.  Total dose is calculated as TDOSE=(DAENDT - DASTDT + 1) * DLEVEL;

Average daily dose (mg): Dose taken per day, average if more than one dose, across all cycles.

Dose Intensity (mg): Dose intensity is a function of dose and frequency of administration.  It is defined as the amount of drug delivered per unit of time.

Relative Dose intensity - (actual cumulative dose / planned cumulative dose)*100

Dose reduction: A reduction is defined as a decrease in dose from the protocol planned dose or a decrease from the previous non-zero dose.

Dose interruption: An interruption is defined as a 0 mg/0 tablets dose given on one or more days.

Endpoints - Key study measurements, for oncology for example, overall survival or death, progression free survival (PFS), disease free survival (DFS), objective response rate (ORR), quality of life (QOL) is an example of patient reported outcomes (PRO), Time to Tumor Progression (TTP)

Data Cutoff followed by Snapshot/Data Lock Date - In general, the data cutoff date defines the last visit date to be used in an analysis.  In general, the snapshot date is about 1 or 2 weeks after the data cutoff date to allow time for data entry/data query.  Once a snapshot is taken, then no more data is entered and all datasets are locked.  In the snapshot datasets, if start visit dates exists after the data cutoff date, then those visits or records should be excluded in the snapshot.  For end dates after the cutoff date, end dates may be set to missing.  Generally, censoring rules should also be applied to define any events after the cutoff date to be censored.   See SAS paper for details on subjects on or off study.  

Data Checks may include safety population, patient count, selected or maximum visit and date records.

Case study: A sponsor decides to define the data cutoff to be 29APR2013 which is one day prior to the first subject receiving 'rollover treatment (i.e. a switch from treatment B to A).  As such, all RAW data sets will be filtered to include information obtained from all clinical study visits up to and including 29APR2013.  Sponsors should then identify the specific date variable used for each SDTM as documentation.  Snapshot date may be 07MAY2013 which will give it time to collect data up to 29APR2013.

29APR2013 - Data Cutoff (last visit date)

07MAY2013 - Snapshot date (1 week later to collect the data)

FDA: CBER - Center for Biologics Evaluation and Research, CDER - Center for Drug Evaluation and Research 

See blog  See nih diagram

Understandind Clinical Trials Video​

Basics of Clinical Data Management


(Early) Phase I - small studies (20-80), evaluate optimum dose and minimum tolerability on generally healthy subjects

(Early) Phase II - larger studies (hundreds), evaluate efficacy dose levels and side effects

(Late) Phase III - multi-center studies, compare with standard treatment

(Late) Phase IV - post-market studies

Study Day: Day 1 will be defined as the first date on which study drug was administered.
Positive study days will be counted forward from Day 1. Day -1 will be the date immediately
preceding Day 1, and negative study days will be counted backward from Day -1. Day 0 is not a value value.

Visit Windows: The benefits of using visit windows is to assure only one visit per visit window period based on start and end dates for periods and visit date and other key dates such as conditional chemo, leukapheresis, treatment date, etc.

Measurements will be associated with a visit for summarizing according to the
study day associated with the date on which the information was collected and within a visit window period based on protocol. Target dates and the acceptable range of study days for each visit should be documented. If multiple visits occur within a visit window, the visit occurring closest to the target day will be selected for summarizing. If there is a tie, the earliest visit will be chosen.

NCI-CTCAE (National Cancer Institute - Common Terminology Criteria for Adverse Events)

DSUR (Development Safety Update Report)

DMC (Data Monitoring Committee)




Outliers - Observations that are quite different from other observations in terms of extreme values such as lowest or highest values or a value outside of the model line.  This results in a non-symmetrical distribution curve such as right or left skewed. Both the difference of the extreme values and the number of extreme values are evaluated.  Examples are Valid natural variation, measurement errors, incorrect selection of a sample which means entry criteria was not met or data entry error.


Biomarker - a characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes or pharmacological responses to a therapeutic intervention.  For example, a general health check may include assessment of vital signs such as blood pressure, heart rate, cholesterol, triglycerides and fasting glucose levels. Body measurements such as weight, body mass index (BMI), and waist-to-hip ratio are routinely used for assessing conditions such as obesity and metabolic disorders.  In cancer research, genetic studies are valuable because genetic abnormalities so often underlie the development of cancer. Certain DNA or RNA markers may therefore help in the detection and treatment of specific cancers. Changes in these biomarker values indicate the decrease or increase of cancer cells.  Biomaker data is similar to PK data where small changes are measured within a short period.  See SAS paper. See background article

Immunology - Immunology is the branch of biomedical science that deals with the response of an organism to antigenic challenge and its recognition of what is self and what is not. It deals with the defense mechanisms including all physical, chemical and biological properties of the organism that help it to combat its susceptibility to foreign organisms, material, etc. See info.

Leukapheresis - the selective removal of leukocytes from withdrawn blood, which is then retransfused into the donor.

1. Treatment has no responserefractory – unmanageable, Refractory describes a condition that does not get better with treatment such as test drug or chemo. Your doctor may also say your cancer is resistant. 

Refractory - describes a condition that does not get better with treatment. Your doctor may also say your cancer is resistant.

2. Treatment had some response but cancer came backrelapse - (of a sick or injured person) deteriorate after a period of improvement. 

3. Treatment response cured cancerremission - all signs of your cancer are gone

Complete remission - Complete response Oncology Disappearance of all signs and symptoms of disease–eg, cancer, multiple sclerosis, with normalization of all biochemical and radiologic parameters, as well as a negative repeat biopsy–pathologic remission.

Non-Hodgkin lymphoma - (also called non-Hodgkin’s lymphoma or NHL) is cancer that develops in the lymphatic system from cells called lymphocytes, which are a type of white blood cell that helps the body fight infections.  NHL can develop in many parts of the body, including the lymph nodes, bone marrow, spleen, thymus and digestive tract.  T-cells specially destroy healthy cells.

Chemotherapy (often abbreviated to chemo and sometimes CTX or CTx) is a category of cancer treatment that uses chemical substances, especially one or more anti-cancer drugs (chemotherapeutic agents) that are given as part of a standardized chemotherapy regimen.

The spread of cancer from one part of the body to another is called metastasis.


PhUSE Oncology Site  Cart-T

Biomarker Use by CDISC Standards [Presentation]

What are Biomarkers

Transforming Biomarker Data into an SDTM based Dataset, Kiran Cherukur

Implementation of STDM Pharmacogenomics/Genetics Domains on Genetic Variation Data, Linghui Zhang

Biomarker as essential part of clinical development, Renuka Chinthapally [Presentation]

Path to a Metadata-driven standards environment - collection to submission [poster]

Relapsed and Refractory (2)   What do lymph nodes do?

Analysis of Oncology Studies for Programmers and Statisticians, Kevin Lee [Video]


C. Sample Clinical Trials Study 

Timeline: Start study, End study, SDTMs/ADaMs, Database lock and TLGs  

SAP - Statistical Analysis Plan, Template , PhUSE

CRF - Case Report Forms, Randomized Controlled Trial

Example Data

  Use a picture, face or story to convey a mission to improve lives.








D. Three Types of Data Collected, See CDM

Set up Clinical Data Management System

1) One record per patient, ex. demog

2) Measurements during protocol visits, may need to sort and subset to get one record per patient such as first dose date or lab baseline flag, ex. vitals, labs, ex, pk/pd, etc.

3) Measurements any time during the study, ex. adverse events, con meds

 
Baseline / Screening

Study 

Followup 

 Collect: Demo, History and Con Meds

 Goal: Assure comparable groups


 Collect: Drug Admin, Efficacy Measurement, Labs, Adverse Events and Con Meds

 Goal: Compare groups - a. Calculate differences in time to 30% decrease in sum of all lesions, b. Counts and percentages of adverse events  

Collect: Status (Dead, LOT, Completed Treatment/Study)

Goal: Track completion of each patient

 




E. Three Types of Joins, See DATA Step, Proc SQL

Relationship Model (webgraphviz.com)

Getting an Overview of SAS® Data in Three Steps, Yu Fu, Shirmeen Virji, Miriam McGaugh

Automatically create diagrams showing the structure and performance of your SAS code, Philip Mason



1) One to one record, ex. demog with first dose date from ex

2) One to many records, ex. demog with vitals

3) Many to many records using Proc SQL, maybe required if one visit date is used as reference to anther visit date by visit name, ex. adverse events with con meds

SDTM SpecificationADaM Specification

TLGs Table Shells - Tables, Lists and Graphs     QC Checklist

 

F. Two Types of Analysis, See Statistical Analysis for collection of macros and ODS Statistical Graphics

Introduction to Stats Course Notes with SAS Datasets

 

 








 

1) Efficacy - based on the primary and secondary endpoints, ex. change in lab measurements from baseline, survival rates

2) Safety - based on adverse events and subject disposition

Waterfall Charts 


I. Procs for working with Categorical Data:

Descriptives:

1) Proc Freq (numeric or character variables)

 Single variable: oneway tabulation

 Two or more variables: crosstabs

2) Basic Statistical Tests for categorical data:

 One variable (with 2 or more levels)

 Proc Freq (binomial test for two-level variable)

 Proc Freq (chi-square goodness of fit test)

 Two variables (each with 2 or more levels), independent groups

 Proc Freq (chi-square test of equal proportions, or chi-square test of independence)

 Two paired variables (square tables, e.g., 2x2, 3x3, etc)

 Proc Freq (McNemar test of symmetry)

3) Graphs for categorical data:

 Proc Sgplot (bar charts)

 Proc Sgplot (compare means, i.e., sample proportions, across categories)

4) Modeling (outcome variable is categorical):

 Proc Logistic: Logistic regression models for binary or ordinal outcome variables

 Proc Genmod: Generalized linear models for count, binary, or other outcome variables (exponential family of distributions); predictors may be nominal, ordinal, or continuous.

 Proc Glimmix: Generalized linear mixed models for count or binary outcome variable, including random effects, or correlation matrix for longitudinal or clustered data (exponential family); predictors may be nominal, ordinal, or continuous.


II. Procs for working with Continuous data:

1) Descriptives: Proc Means, Proc Univariate

2) Basic statistical tests:

 One Sample

 Proc Univariate (one-sample t-test, nonparametric tests)

 Proc ttest (one-sample t-test)

 Two Independent Samples

 Proc ttest (independent samples t-test)

 Proc Npar1way (Wilcoxon non-parametric analog of t-test)

 Paired Data (correlated data)

 Proc ttest (paired t-test)

 Three or More Independent Samples

 Proc GLM (oneway analysis of variance (ANOVA))

 Proc Npar1way (Kruskal-Wallis non-parametric analog of oneway ANOVA)

3) Modeling:

 Proc Reg: Linear regression models for continuous outcome variable, continuous, ordinal or binary predictors (prior creation of dummy variables required for categorical predictors with more than 2 levels, interactions must be created prior to running model)

 Proc GLM: Linear models for continuous outcome variable, predictors may be nominal, ordinal, or continuous.

Proc Mixed: Linear mixed models for continuous dependent variable, longitudinal or clustered data; predictors may be nominal, ordinal, or continuous.

 Proc Nlin: Nonlinear models for different types of dependent variables.

 Proc Nlmixed: Nonlinear mixed models


III. Graphing:

 Proc Univariate (histograms, qqplots) for one-sample data

 Proc Sgplot (histograms)

 Proc Sgplot (boxplots for continuous variables for each level of a categorical variable)

 Proc Sgplot (barcharts, showing mean and standard deviation or standard error of mean)

 Proc Sgplot (bivariate scatter plots, regression plots) for two related variables

 Proc Sgscatter (scatterplot matrix)


G. Five Types of Tables, See Clinical Data Reporting

1) Data Listing, ex. demog characteristics

2) Counts and Frequency of categorical data, ex. proportion of patients with adverse events

3) Summary Table of continuous data, ex. descriptive statistics (mean, sd, min, max) by visit 

4) Statistical Analysis Table to model data, ex. descriptive statistics by visit with p-value, survival analysis using Kaplan-Meier (Proc LIFETEST) 

5) Graphs, ex. lab scatter plot

6) 7 Types of Regression Techniques you should know!

______________________________________________

Introduction to CDISC

1. CDISC Introduction Presentation

2. Introduction - Introduction to the CDISC Standards, Sandra Minjoe

3. CDISC: Why SAS® Programmers Need to Know, Victor Sun


4. Toward a Comprehensive CDISC Submission Data Standard

5. A Relational Understanding of SDTM Tables, John R. Gerlach, Glenn O’Brien [HOW]

Clinical Programmer Duties

Responsible for building programs to create SAS datasets from the clinical database, external data sources, and other sources while following the clinical study's protocol or statistical plans.

  1. Primary responsibilities
  2. Build SAS datasets from clinical database.
  3. Develop SAS macros, templates and utilities for data cleaning and reporting.
  4. Utilize SDTM guidelines to build datasets.
  5. Communicate with an internal team to create deliverables for pharmaceutical and biotechnology clients.
  6. Implement analyses specified in the protocol or the Statistical Analysis Plan (SAP) while working with the project statistician.
  7. Ensure CRF meets the guidelines of the protocol and check for consistency and adequacy.
  8. Write SAS programs to generate tables, listings, and figures and analysis datasets.
  9. Review CRF annotations and data specifications.
  10. Work in tandem with Biostatistics and Data Management member on various clinical projects.
  11. Identify and edit checks per the data validation plan or data management plan.
  12. Study management reports using SAS.
  13. Validate the programmed analysis datasets, tables, listing and figures.
  14. Perform analyses defined in the statistical analysis.
  15. Prepare clinical and statistical summary reports.
  16. Communicate with programming and statistics leads.
  17. Utilize SAS programming skills within protocol team and perform all programming required for clinical trial analysis and reporting.
  18. Perform quality control on final reports.
  19. Develop SAS coding and table templates for preparing, processing and analyzing clinical data.
  20. Establish monitoring of data transfers for ongoing trials to identify study conduct or data quality issues.

General SAS Papers


1. Clinical Trials Terminology for SAS Programmers, Sy Truong

 

2. SAS® PROGRAMMER TO CLINICAL SAS PROGRAMMER, Gayatri Karkera, Neha Mohan [Phase I, II, III, IV, Endpoints]

 

3. Success As a Pharmaceutical Statistical Programmer, Sandra Minjoe, Mario Widel

 

4. THE ROLE OF SAS PROGRAMMERS IN CLINICAL TRIAL DATA ANALYSIS, Ming Wang

 

5. SAS® Programming for the Pharmaceutical Industry, Brian C. Shilling, Carol Matthews

 

6. The 5 Most Important Clinical SAS Programming Validation Steps, Brian Shilling

 

7. Pharmaceutical Programming: From CRFs to Tables, Listings and Graphs, a process
overview with real world examples Mark Penniston, Shia Thomas

 

8. Training Statistical Programmers on SAP Review Skills, Sascha Ahrweiler

 

9. Intro to Longitudinal Data: A Grad Student “How-To” Paper, Elisa Priest,Ashley Collinsworth

 

10. Longitudinal Data Techniques: Looking Across Observations, Ronald Cody

 

11. Statistics for Clinical Trial SAS Programmers 1: paired t-test, Kevin Lee

12. Clinical Trial Reporting Using SAS/GRAPH® SG Procedures, Susan Schwartz

 

13. Oncology Trials 101 - The Basics and Then Some, Dave Polus


14. Pharma Company Questions and Answers, J.J. Hantsch

 

15. Careers in Biostatistics and Clinical SAS® Programming An Overview for the Uninitiated, Justina Flavin  [Roles, Responsibilities]


16. Statistical Programming for Dummies


17. Talking Past Each Other? How to Communicate with Medical Writers When Preparing Clinical Research Manuscripts for Journal Submission, Scott Thompson, Stephanie Thompson


18. Managing the Evolution of SAS® Programming, Carey Smoak


19. Empowering SAS® Programmers: The Role of the Manager, Carey Smoak


20. Skills for SAS® programmers in Epidemiology, Philip Holland


21. A Short Introduction to Longitudinal and Repeated Measures Data Analyses, Leanne Goldstein


22. The Baker's Dozen: What Every Biostatistician Needs to Know, AnnMaria De Mars


23. Expediting Access to Critical Pathology Data, Leanne Goldstein, Rebecca Ottesen, Julie Kilburn, Joyce Niland [Metastasis]


24. Good Programming Practice [GPP] in SAS® & Clinical Trials, Srinivas Vanam, Manvitha Yennam, Phaneendhar Vanam [Programming Style]


25. GCP101, Good Clinical Practices OR “Why we do What we do the Way we do it“, Elaine Dempsey


26. CLINICAL PROGRAMMING FOR NOVICE, Ramesh Ayyappath


27. Making of a Stat Programming Project Manager, Manjusha Gode, Ajay Sathe [Presentation] [Work-Life balance: Making it a reality]


28. A Programmer’s Guide to Statistical Procedures, Jim Edgington


29. Industry Standard Good Programming Practice for Clinical Trials (Using SAS), Mark Foxwell


30. SAS® CLASSES FOR NON-PROGRAMMERS IN THE CLINICAL DATA MANAGEMENT WORLD, Susan Hale


31. Good Programming Practices at Every Level, Maria Dalton


32. The Anatomy of Clinical Trials Data: A Beginner’s Guide, Venky Chakravarthy


33. Stretching Data Training Methods: A Case Study in Expanding SDTM Skills, Richard Addy


34. The Super Genius Guide to Generating Dummy Data, Brian Varney


35. Producing a Format Library and Test Data for Case Report Forms using a Data Define Table

36. CRO, TLF, SOP? OMG!: A Beginner’s Guide to the Clinical Research Organization, Mandy Bowen, Otis Evans, Stephen Terry

37. THE ROLE OF SAS PROGRAMMERS IN CLINICAL TRIAL DATA ANALYSIS, Ming Wang

38. Statistics: The Fourth Dimension of a “Statistical Programmer”, Gauri Khatu, Vibhavari Inamda

39. Successfully On-Boarding SAS® Analysts, Aaron Augustine

40. Starting a New SAS Project with Effectiveness and Success, Flora Liu

41. ONBOARDING A JUNIOR STATISTICAL PROGRAMMER IN 10 WEEKS MY STORY FROM UNIVERSITY TO FIRST ANALYSIS DELIVERY, Pieter Coppens 

42. Statistical Programming for Dummies [Presentation]

 

Longitudinal Data and SAS: A Programmer's Guide book SAS examples download

 

SAS Programming in the Pharmaceutical Industry book

Powered by Wild Apricot Membership Software