Some syntax for the Maclean's data:
A) A basic recode
missing values Q2 (9). recode q2 (1=1) (2=2) (5=3) (3=4) (4=5). value labels q2 1 "much more optimistic" 2 "a little more" 3 "unchanged" 4 "a little less" 5 "a lot less optimistic". frequencies variables=Q2 /statistics=All.
B) Recoding into a new variable
recode q2 (1=1) (2=2) (5=3) (3=4) (4=5) intoq2r. value labels q2r 1 "much more optimistic" 2 "a little more" 3 "unchanged" 4 "a little less" 5 "a lot less optimistic". frequencies variables=q2r /statistics=All.
C) A simple Crosstabulation
crosstabs tables = q6 by hhi_1 / cells = column count / statistics = phi.
C1) Merging the Income responses into a new variable
missing values hhi_2 hhi_3 (99). if (hhi_2 =1) inc =1. if (hhi_2 =2) inc =2. if (hhi_2 =3) inc =3. if (hhi_2 =4) inc =4. if (hhi_2 =5) inc =5. if (hhi_2 =6) inc =6. if (hhi_2 =7) inc =7. if (hhi_2 =8) inc =8. if (hhi_2 =9) inc =9. if (hhi_2 =10) inc =10. if (hhi_3 =1) inc =11. if (hhi_3 =2) inc =12. if (hhi_3 =3) inc =13. if (hhi_3 =4) inc =14. if (hhi_3 =5) inc =15. if (hhi_3 =6) inc =16. if (hhi_3 =7) inc =17. if (hhi_3 =8) inc =18. if (hhi_3 =9) inc =19. if (hhi_3 =10) inc =20. if (hhi_3 =11) inc =21. if (hhi_3 =12) inc =22. if (hhi_3 =13) inc =23. frequencies variables = inc.
C2) Recoding Income into a New Variable with Fewer Catergorioes
recode inc (1 thru 5 = 1) (6 thru 10 =2) (11 thru 14 =3) (15 thru 23 = 4) into incat. value label incat 1 "very low" 2 "low" 3 "hi" 4 "very hi". frequencies variables = incat.
C3) Crosstabulating with the New Categorical Variable
crosstabs tables = q6 by incat / cells = column count / statistics = ctau.
D) Entering Crosstabulated Data (From Toronto Star Story on Race and Crime
Example 1: Released at the Scene of Arrest data
data list free / released race count. begin data. 1 1 1458 1 2 4880 2 1 902 2 2 1499 end data. variable labels released 'released on arrest' race 'race'. value labels released 1 'yes' 2 'no'. value labels race 1 'black' 2 'white'. weight by count. crosstabs tables = released by race /cells = column count /statistics = phi.
Example 2: Held for bail hearing data
data list free / bail race count. begin data. 1 1 139 1 2 109 2 1 763 2 2 1390 end data. variable labels bail 'bail hearing' race 'race'. value labels bail 1 'held' 2 'not'. value labels race 1 'black' 2 'white'. weight by count. crosstabs tables = bail by race /cells = column count /statistics = phi.
Some syntax for the CRIC data:
A) Combining several version of a question into a single variable
if (q14_1 = 1) or (q14_2=1) or (q14_3=1) chart=1.
if (q14_1 = 2) or (q14_2=2) or (q14_3=2) chart=2.
value labels chart 1 'yes' 2 'no'.
frequencies variables = chart.
Some further syntax for webstats using the CES1997 data set
Frequencies of the variable used in this
example:
CPSB9 Satisfaction>Way
Democracy WorksInCanada
Valid
Cum
Value Label
Value Frequency
Percent Percent
Percent
Very Satisfied
1 441
11.2 11.2
11.2
Fairly Satisfied
3 1770
44.8 44.8
56.0
Not Very
5
1124 28.5
28.5 84.5
NotAtAll
7 461
11.7 11.7
96.1
D.K.
8 135
3.4 3.4
99.5
Refused
9
18 .5 .5
100.0
------- -------
-------
Total 3949
100.0 100.0
Recode and missing values:
Let’s
say you need the variable to have only two categories before you can do a
crosstabulation and you want to remove the missing values, like
“Don’t know” and “Refused”.
Then, you type:
Missing values CPSB9 (8,9).
Recode CPSB9 (1,3=1)(5,7=2).
Value labels CPSB9 1 ’Satisfied’
2’Not satisfied’.
Frequencies CPSB9
/stats=median mode.
*By entering these commands, you ask the program to leave aside values 8 and 9 (“Don’t know” and “Refused”) which are not useful for your analysis. After having done this, you recode the variable so that categories 1 and 3 become category 1 “satisfied”, and that categories 5 and 7 become category 2 “not satisfied”. Finally, the command “fre” allows you to run the frequencies of the new variable CPSB9 to see what it looks like once transformed and make sure that you did not make any mistake (always double-check). By asking “stats=median mode” the program will produce the median and the mode for this variable.
To merge two variables using the “if” command:
missing values cpsm16a (98, 99).
missing values cpsm16 (998, 999).
numeric income
if (cpsm16a=1) or (cpsm16 le 20) income = 1.
if (cpsm16a=2) or ((cpsm16 gt 20) and (cpsm16 le 30)) income = 2.
if (cpsm16a=3) or ((cpsm16 gt 30) and (cpsm16 le 40)) income = 3.
if (cpsm16a=4) or ((cpsm16 gt 40) and (cpsm16 le 50)) income = 4.
if (cpsm16a=5) or ((cpsm16 gt 50) and (cpsm16 le 60)) income = 5.
if (cpsm16a=6) or ((cpsm16 gt 60) and (cpsm16 le 70)) income = 6.
if (cpsm16a=7) or ((cpsm16 gt 70) and (cpsm16 le 80)) income = 7.
if (cpsm16a=8) or ((cpsm16 gt 80) and (cpsm16 le 90)) income = 8.
if (cpsm16a=9) or ((cpsm16 gt 90) and (cpsm16 le 100)) income = 9.
if (cpsm16a=10) or (cpsm16 gt 100) income = 10.
*By using these command lines, you are merging questions cpsm16 and
cpsm16a into one variable, that will be named “income” and will have 10
categories. The initial problem is
that cpsm16 is an interval level variable (respondents were asked their exact
income in thousands) while cpsm16a is measured at the ordinal level using
categories of income and both comprise only part of the total sample.
In order to have all of the respondents’ income in one variable, this
operation is necessary.
The compute command:
In this example, the variable
“cpsage” will be recoded into “age”, a new variable giving the exact age
of the respondents, not only the year of birth like “cpsage” does.
missing values cpsage (9999).
compute age=1997-cpsage.
*By entering this line, you obtain the age of the respondents. For example, if one is born in 1970, then his/her age is equal to 1997 (year of the survey) minus 1970, thus 27 years old. After this, you can recode the new variable “age” as you wish, keeping in mind that the values have changed.
B) Basic statistical tests:
To do a crosstabulation with gender as the independent and CPSB9 (still transformed) as the dependent:
Missing values CPSB9 (8,9).
Recode CPSB9 (1,3=1)(5,7=2).
Value labels CPSB9 1 ’Satisfied’
2’Not satisfied’.
crosstab
tables=CPSB9 by CPSRGEN
/cells=column
count /statistics=PHI.
*The computer will run a crosstabulation of CPSB9 by CPSRGEN, including the column percentages, and some association measure, PHI.
To do a t-test with CPSRGEN
(gender) and the previously recoded CPSB9:
Missing values CPSB9 (8,9).
Recode CPSB9 (1,3=1)(5,7=2).
Value labels CPSB9 1 ’Satisfied’
2’Not satisfied’.
t-test
groups=CPSRGEN (1,5)
/variables=CPSB9.
/ranges=duncan.
*The numbers in
parentheses on the t-test line are the first (1) and last (5) categories of the
independent, or group variable, in that case gender. You do this to see if men (1) and women (5) have
significantly different views about the way democracy is working in Canada
C: In-class example of a reliability run
WEIGHT
BY cpsnwgt1.missing values cpsj9 (8,9).
recode cpsj9 (1=0) (3=1) (5=.5).
value labels cpsj9 0 "Better Off" .5 "AboutTheSame" 1 "Worse Off".
missing values cpsj10 (8,9).
recode cpsj10 (1=1) (3=0) (5=.5).
value labels cpsj10 0 "Less" .5 "AboutTheSame" 1 More".
missing values pesf6 (998, 999).
compute rating = (pesf6/100).
recode mbse8 (1=0) (8=.5) (2=1).
value labels mbse8 0 "try=well off" .5 "not sure" 1 " hard overcome".
compute has= (mbsc9a/7).
compute should =( mbsc9b/7).
RELIABILITY
/VARIABLES= cpsj9 cpsj10 rating mbse8 has should
/FORMAT=LABELS
/SCALE (first) = cpsj9 cpsj10 rating mbse8 has should
/MODEL=ALPHA
/STATISTICS=DESCRIPTIVE SCALE CORR
/SUMMARY=TOTAL MEANS CORR .
RELIABILITY
/VARIABLES= cpsj9 cpsj10 rating mbse8 has should
/FORMAT=LABELS
/SCALE (second) =cpsj9 cpsj10 rating mbse8 should
/MODEL=ALPHA
/STATISTICS=DESCRIPTIVE SCALE CORR
/SUMMARY=TOTAL MEANS CORR .
COMPUTE aborig = (cpsj9 + cpsj10 + rating + mbse8 + should).
CROSSTABS TABLES = aborig by cpsrgen
/CELLS= column count
/STATISTICS = ctau.
D: In-class examples of Significance Testing
FREQUENCIES VARIABLES=q31 q32 q33 q34. missing values q31 q32 q33 q34 (99). RELIABILITY /VARIABLES=q31 q32 q33 q34 /SCALE(ALPHA)=ALL/MODEL=ALPHA /SUMMARY=TOTAL .
*recoding so ranges from 0 to 1 with 1 = judicial supremacy (judsup)*. recode q31 q32 q33 q34 (1=0) (2=1). compute judsup = mean(q31, q32, q33, q34). freq var = judsup.
*t-test comparing means on judicial supremacy by gender*. t-test groups= gender (1,2) /variables=judsup.
*one way anova comparing means on judicial supremacy by religiousity*. missing values q44 (99).
*Using Webstats specify Ranges = scheffe*. ONEWAY judsup by q44 (1,5) /RANGES = scheffe /STATISTICS=all.
*Using Windows specify /POSTHOC = scheffe*. *Windows can also plot the means*. ONEWAY judsup BY q44 /PLOT MEANS /POSTHOC = SCHEFFE.
freq var = age. missing values age (888). recode age ( low thru 36 =1) (37 thru 52 =2) (53 thru hi = 3) into ager. value labels ager 1 'young' 2 'middle' 3 'older'. freq var = ager.
*recoding the dependent variable into 4 categories*. recode judsup (low thru .25 = 0) (.33 thru .67 = .25) (.75 = .75) (1 = 1) into judsup4.
freq var = judsup4.
crosstabs tables = judsup4 by ager, gender, q44 / cells =column count / statistics = ctau chisq.
*further recoding the dependent variable into two categories.* recode judsup4 (low thru .25 = 0) (.75 thru hi =1) into judsup2. value labels judsup2 0 'less supportive' 1 'more supportive'.
recode q44 (1, 2, 3 = 3) (4=2) (5=1) into relig. value labels relig 1 'low' 2 'med' 3 'hi'.
crosstabs tables = judsup2 by ager, gender, relig / cells =column count / statistics = ctau chisq.
Another example of Significance Testing with CES 1997 using income (per section A above) as dependent.
T-TEST GROUPS=cpsrgen (1,5)
/VARIABLES=income.
ONEWAY income by cpsm3(1,11)
/RANGES = snk
/STATISTICS=all.
RECODE income (1,2=1) (3,4,5=2) (6 thru 10=3).
VALUE LABELS income 1 'low' 2 'med' 3 'high'.
RECODE cpsm3 (1 thru 4=1) (5,6=2) (7,8=3) (9 thru 11=4) into educ.
VALUE LABELS educ 1 'verylow' 2 'low' 3 'med' 4 'high'.
CROSSTABS TABLES = income by cpsrgen, educ
/CELLS = column count expected
/STATISTICS = chisq.
E. In-class examples of Multiple Regression
WEIGHT BY cpsnwgt1.
missing values cpsm16a (98, 99). missing values cpsm16 (998, 999). numeric income. if (cpsm16a=1) or (cpsm16 le 20) income = 1. if (cpsm16a=2) or ((cpsm16 gt 20) and (cpsm16 le 30)) income = 2. if (cpsm16a=3) or ((cpsm16 gt 30) and (cpsm16 le 40)) income = 3. if (cpsm16a=4) or ((cpsm16 gt 40) and (cpsm16 le 50)) income = 4. if (cpsm16a=5) or ((cpsm16 gt 50) and (cpsm16 le 60)) income = 5. if (cpsm16a=6) or ((cpsm16 gt 60) and (cpsm16 le 70)) income = 6. if (cpsm16a=7) or ((cpsm16 gt 70) and (cpsm16 le 80)) income = 7. if (cpsm16a=8) or ((cpsm16 gt 80) and (cpsm16 le 90)) income = 8. if (cpsm16a=9) or ((cpsm16 gt 90) and (cpsm16 le 100)) income = 9. if (cpsm16a=10) or (cpsm16 gt 100) income = 10.
missing values cpsm3 (98, 99).
missing values cpsage (9997, 9999).
missing values pinporr (0, 19, 20).
recode cpsrgen (1=0) (5=1) into female. value labels cpsrgen 0 'male' 1 'female'.
recode cpslang (1=0) (2=1) into french. value labels cpslang 0 'english' 1 'french'.
REGRESSION /STATISTICS ANOVA COEFF OUTS R TOL /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT income /METHOD=ENTER cpsage cpsm3 french female pinporr.
REGRESSION /STATISTICS ANOVA COEFF OUTS R TOL /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT income /METHOD=STEPWISE cpsage cpsm3 french female pinporr.
*recode education into categories*. recode cpsm3 (1 thru 4= 1) (5 thru 8=2) (9 thru 11 =3) into educ. recode income (1,2=1) (3 thru 6 =2) (7 thru 10 = 3). value labels income educ 1 'low' 2 'med' 3 'hi'.
*recode age into categories and code older as highest*. recode cpsage (lowest thru 1948 = 3) (1949 thru 1961 = 2) (1962 thru highest = 1) into older. value labels older 3 ' older' 2 'middle' 1 'younger'.
REGRESSION /STATISTICS ANOVA COEFF OUTS R TOL /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT income /METHOD=ENTER older educ cpslang french pinporr.
*compute interaction term for gender and age*. compute femage = (female*older).
REGRESSION /STATISTICS ANOVA COEFF OUTS R TOL /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT income /METHOD=ENTER older educ french female pinporr femage.
*create dummy variables for region*. recode province (10 thru 13=1) (24=2) (35=3) (46, 47 48 =4) (59=5) (60, 61=6) into region. value labels region 1 'atlantic' 2 'queb' 3 'ont' 4 'midwest' 5 'bc' 6 'north'.
recode region (1=1) (2 thru 6=0) into Atl. recode region (2=1) (1,3 thru 6=0) into Queb. recode region (3=1) (1,2,4,5,6=0) into Ont. recode region (4=1) (1,2,3,5,6=0) into Prairie. recode region (5=1) (1,2,3,4,6=0) into BC. recode region (6=1) (1 thru 5=0) into North.
REGRESSION /STATISTICS ANOVA COEFF OUTS R TOL /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT income /METHOD=ENTER older educ french female pinporr Atl, Queb, Prairie, BC, North.
REGRESSION /STATISTICS ANOVA COEFF OUTS R TOL /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT income /METHOD=STEPWISE older educ french female pinporr Atl, Queb, Prairie, BC, North.
F. An example of many of the same techniques using the IRPP data
weight by newwt600. missing values q18 q19 q20 q21 q22 q23 q26 q28 (8,9). recode prov ('NF' = 0) ('PE' = 1) ('NS' =2) ('NB' =3) ('QC' =4) ('ON' =5) ('MB' =6) ('SK' = 7) ('AB' = 8) ('BC' = 9) ('NT'=10) ('YT' =11) into numprov. recode numprov (0,1,2,3 =1) (4=2) (5=3) (6,7,8,10,11 = 4) (9=5) into region. value labels region 1 'Atlantic' 2 'Quebec' 3 'Ontario' 4 'Praire' 5 'BC'. recode region (1=1) (2,3,4,5=0) into Atl. recode region (2=1) (1,3,4,5=0) into Queb. recode region (4=1) (1,2,3,5=0) into Prairie. recode region (5=1) (1,2,3,4=0) into BC. recode pid (1=1) (2=2) (3=3) (4=4) (5=5) (7 thru hi =6). Value labels pid 1 "Liberal" 2 "Reform" 3 "Progressive Conservative" 4 "NDP" 5 "Bloc Quebecois" 6 "Other". recode pid (2=1) (else=0) into Reform. recode pid (3=1) (else=0) into PC. recode pid (4=1) (else=0) into NDP. recode pid (5=1) (else=0) into BQ. recode pid (6=1) (else=0) into Other. recode q18 (1=1) (2=.75) (3=.5) (4= .25) (5=0) into satis. recode q19 (1=0) (2=.25) (3= .5) (4=.75) (5=1) into doaway. recode q20 (1=0) (2=.25) (3= .5) (4=.75) (5=1) into reduce. recode q21 (1=1) (2=.75) (3=.5) (4= .25) (5=0) into trust. compute court = (satis + doaway + reduce + trust). recode court (low thru 2.0 =1) (2.25 thru 2.75=2) (3 thru hi =3) into catcourt. value labels catcourt 1 'dislike' 2 'middle' 3 'like'. recode q26 (1=1) (5=0) into vriend. recode q28 (5,7=1) (1,3=0) into feeney. value labels vriend feeney 1 "agree" 0 "disagree". numeric sepref. if (q23 =1) & (q24 = 1) sepref = 1. if (q23 =1) & (q24 = 5) sepref = .5. if (q23 =5) & (q24 = 1) sepref = .5. if (q23 =5) & (q24 = 5) sepref = 0. value labels sepref 1 'both' .5 'one' 0 'neither'. compute cases= (vriend+feeney+sepref). freq var = cases. recode q17 (1=1) (2=.5) (4,5=0) into aware. crosstabs tables = court by cases/court by cases by aware / cells = column count /statistics = ctau chisq. recode cases (lo thru 1.5 = 0) (else =1) into catcases. freq var= court. crosstabs tables = catcourt by catcases/ catcourt by catcases by aware / cells = column count /statistics = ctau chisq. compute inter=(aware*cases). REGRESSION /DEPENDENT court /METHOD=ENTER aware cases /METHOD=ENTER aware cases inter.
In-Class Example of Correlation Using CES97
WEIGHT BY cpsnwgt1.
CORRELATIONS /VARIABLES=cpsage mbsj1 pesage /PRINT=TWOTAIL NOSIG /MISSING=PAIRWISE . GRAPH /SCATTERPLOT(MATRIX)=cpsage pesage mbsj1 /MISSING=LISTWISE .
Missing values cpsage pesage mbsj1 (9997, 9999).
CORRELATIONS /VARIABLES=cpsage mbsj1 pesage /PRINT=TWOTAIL NOSIG /MISSING=PAIRWISE .
GRAPH /SCATTERPLOT= pesage with cpsage /MISSING= LISTWISE.
GRAPH /SCATTERPLOT(MATRIX)=cpsage pesage mbsj1 /MISSING=LISTWISE .
GRAPH /SCATTERPLOT(XYZ)=pesage WITH cpsage WITH mbsj1 /MISSING=LISTWISE .
Missing values pesf6 (998, 999). CORRELATIONS /VARIABLES=cpsa1 to cpsm19 with pesf6.
Experimental Syntax with CRIC data
numeric chartgay. if (q14_1 =1) or (q14_2 =1) or (q14_3 =1) chartgay =1. if (q14_1 =2) or (q14_2 =2) or (q14_3 =2) chartgay =2.
variable label chartgay 'Charger should protect gays'.
value labels chartgay 1 'support' 2 'not support'.
numeric cond. if (q14_1 =1) or (q14_1 =2) cond =1. if (q14_2 =1) or (q14_2 =2) cond =2. if (q14_3 =1) or (q14_3 =2) cond =3.
variable label cond 'experimental condition of gay question'. value labels cond 1 'prohibit' 2 'equalrts' 3 'court'.
crosstabs tables =chartgay by cond / cells = colum count /statistics = chisq.
missing values q14_1 q14_2 q14_3 (88, 99). frequencies variables = q14_1 q14_2 q14_3.
Experimental Syntax with IRPP data (NB:UNIANOVA may not run on Webstats)
weight by newwt. recode q8a q8b (8=9).
numeric native. if (q8a =1) or (q8b =1) native =0. if (q8a =5) or (q8b=5) native =1. If (q8a =9) or (q8b=9) native= 9.
value labels native 0 'Should be treated just like any other Canadian' 1 ' Unique rights should be preserved' 9 'na' .
missing values native (9).
RECODE ran2 (1 thru 499=1) (500 thru 1000=0) into ran2x. VALUE LABELS ran2x 1 'constitution' 0 'no mention'.
recode q17 (1=1) (else =0). value labels q17 1 'veryaware of SC' 0 'notveryaware'.
oneway q17 by ran2x (0,1).
CROSSTABS /TABLES= native by ran2x /CELLS= COUNT column /statistics= corr phi chisq.
anova variables = native by ran2x (0,1) q17 (0,1).
UNIANOVA native BY ran2x q17 /RANDOM = ran2x q17 /METHOD = SSTYPE(3) /INTERCEPT = INCLUDE /PLOT = PROFILE( ran2x*q17 ) /EMMEANS = TABLES(OVERALL) /EMMEANS = TABLES(ran2x) /EMMEANS = TABLES(q17) /EMMEANS = TABLES(ran2x*q17) /CRITERIA = ALPHA(.05) /DESIGN = ran2x q17 ran2x*q17 .
CROSSTABS /TABLES= native by ran2x by q17 /CELLS= COUNT column /statistics= corr phi chisq.
An Example of Statistical Control with the Macleans data set.
CROSSTABS /TABLES=q6 BY p1000f1 /FORMAT= AVALUE TABLES /STATISTIC=CHISQ PHI CTAU /CELLS= COUNT COLUMN .
*trichotomize age*. recode age (1 thru 5 =1) (6 thru 8 =2) (9 thru 11 =3) into ager. value labels ager 1 'young' 2 'middle' 3 'older'.
Freq var = ager3.
CROSSTABS /TABLES=q6 BY p1000f1 BY AGER3 /FORMAT= AVALUE TABLES /STATISTIC=CHISQ PHI CTAU /CELLS= COUNT COLUMN .
*recode age into 4 cats*. recode age (1 thru 4 =1) (5 thru 7 =2) (8,9 =3) (10,11 =4) into ager4. value labels ager 1 'under35' 2 '35 to 49' 3 '50-60' 4 '60+'.
freq var ager4.
CROSSTABS /TABLES=q6 BY p1000f1 BY AGER4 /FORMAT= AVALUE TABLES /STATISTIC=CHISQ PHI CTAU /CELLS= COUNT COLUMN .
Syntax for computing rural-urban variable using Macleans data
Unfortunately the relevant variable in CRIC (q46) is not available
numeric nrururb. string rururb (a1). compute rururb = substr (p926f5, 2, 1). recode rururb ('0' = 0) ('1','2','3','4','5','6','7','8','9'=1)into urban. value labels urban 0 'rural' 1 'urban'.
frequencies variables = rururb urban.
(to use in CES use pcode rather than p926f5) numeric nrururb. string rururb (a1). compute rururb = substr (pcode, 2, 1). recode rururb ('0' = 0) ('1','2','3','4','5','6','7','8','9'=1)into urban. value labels urban 0 'rural' 1 'urban'. frequencies variables = rururb urban.