* Design-based and model-based analysis of complex survey data;
* PART 1: Descriptives and simple tests;
* SAS data set OHC (Occupational Health Care Survey)
Clustered (Hierarchical, Multilevel) data
Complex sampling design: Stratified one-stage and two-stage cluster sampling
In analysis phase the data are treated as one-stage cluster sampling
design with workplaces (establishments) as the sample clusters.
This simplifies calculation and is used as the default in SAS,
SPSS and Mplus procedures.
Features of the data set:
H = 5 strata (Industry type and size of workplace)
m = 250 sample clusters (establishments/workplaces)
n = 7841 persons
p = 12 variables
Data are real survey data and have been anonymized and cleaned
for pedagocical purposes (no missing data, weights are constant)
SPSS use: CSPLAN file (sample plan data set) will be created
in PC session
* Methods
(1) Design-based procedures - accounting for clustering effects
SAS SURVEY design-based procedures
Descriptives: SURVEYMEANS
Test of independence: SURVEYFREQ
Logistic regression: SURVEYLOGISTIC
SPSS Complex Samples module, design-based procedures
CSPLAN - Complex samples plan
DESCRIPTIVES - Means, proportions etc.
CROSSTABS - Frequency tables and tests of independence
CSLOGISTIC - Logistic regression
Mplus (COMPLEX, TWOLEVEL)
Logistic regression
(2) Model-based procedures, hierarchical (multilevel) analysis
SAS Logistic regression
GENMOD (GEE/Exchangeable estimation)
GLIMMIX (Generalixed linear mixed modelling)
SPSS
GENERALIZED LINEAR MODELS - Generalized estimating equations GEE
MIXED MODELS - Generalized linear mixed models
NOTE: See also VLISS Training Key #298;
* SAS code will be worked out in PC session;
* We will also use SPSS in computation;
options nocenter;
* Access to SAS data library:
- Use the "New library" button
- Use the libname statement;
*libname a "Z:\Documents\My SAS Files\9.3\Social Statistics Course 2013";
libname a "I:\Root\USB\HY\Social Statistics Course\Course 2013\SAS Data";
data ohc;
set a.ohc;
run;
* see HELP proc contents;
proc contents data=ohc varnum;
title1 "/*write title*/ ";
title2 "/*write subtitle*/ ";
run;
* see HELP proc surveymeans;
proc surveymeans data=ohc nobs mean;
title1 "/*write title*/";
title2 "/*write subtitle*/";
var /variable list*/;
domain /*subgroup analysis*/;
strata /*stratum variable*/ ;
cluster /*cluster variable*/;
run;
* Let us carry out the same analysis using SPSS;
* see HELP proc surveyfreq;
proc surveyfreq data=ohc;
title1 "/*title*/";
title2 "/*subtitle*/";
tables /*row and column variables*/ / chisq cl;
strata /*stratum variable*/ ;
cluster /*cluster variable*/;
run;
