Substantive Example
Does a person's regligious background relate to their sexual behavior?
Or, more to the point, was Billy Joel right when he sang ``Catholic
girls start much too late?".
We can actually study this question "scientifically"
using the NSLY used in one or more homework assignments.
Here is an annotated log file for the analysis, including
data cleaning, summary statistics, and estimates.
Uses of the results (hypothesis test, predictions, and
conclusion) come later. (Sound familiar??)
. keep relig fage* sex
. * to simplify matters, look only at women
. drop if sex==1
(6403 observations deleted)
. * missing values of relig coded as missing
. drop if relig<0
(20 observations deleted)
. * religion is coded into 10 categories, will collapse to 4
. tab relig
IN WHAT|
RELIGION WAS|
R RAISED -| Freq. Percent Cum.
------------+-----------------------------------
0 | 221 3.53 3.53
1 | 261 4.17 7.70
2 | 1796 28.68 36.37
3 | 95 1.52 37.89
4 | 320 5.11 43.00
5 | 523 8.35 51.35
6 | 167 2.67 54.02
7 | 2159 34.47 88.49
8 | 58 0.93 89.41
9 | 663 10.59 100.00
------------+-----------------------------------
Total | 6263 100.00
. gen none = cond(relig==0,1,0)
. gen prot = cond(relig<7&relig>0,1,0)
. gen cath = cond(relig==7,1,0)
. * Apologize for grouping everyone else into one bundle
. gen oth = cond(relig>7,1,0)
. * Hey, why not have a recoded religion variable (4 values)?
. gen relig2 = none + 2*prot + 3*cath + 4*oth
. tab relig2
relig2| Freq. Percent Cum.
------------+-----------------------------------
1 | 221 3.53 3.53
2 | 3162 50.49 54.02
3 | 2159 34.47 88.49
4 | 721 11.51 100.00
------------+-----------------------------------
Total | 6263 100.00
. * Hey, who the heck is raised a 2? We need labels!!
. label define religions 1 "none" 2 "protest" 3 "catholic" 4 "non_chrst"
. label values relig2 religions
. tab relig2
relig2| Freq. Percent Cum.
------------+-----------------------------------
none | 221 3.53 3.53
protest | 3162 50.49 54.02
catholic | 2159 34.47 88.49
non_chrs | 721 11.51 100.00
------------+-----------------------------------
Total | 6263 100.00
. descr fage*
3. fage83 byte 8.0g F - AGE @FIRST SEXUAL INTERCOUR
4. fage84 byte 8.0g F - AGE FIRST SEXUAL INTERCOURS
5. fage85 byte 8.0g F AGE 1ST HAD SEXUAL INTERCOURS
. * first the person is asked if they have every had
. * sexual intercourse. IF YES for the first time, then the person is
. * asked age at first intercourse. IF NO or YES in earlier year then
. * fage is coded as negative.
. gen byte age = fage83 if fage83>0
(1259 missing values generated)
. * this adds people who said yes for first time in 1984
. replace age = fage84 if fage84>0
(2522 real changes made)
. * now 1985
. replace age = fage85 if fage85>0
(245 real changes made)
. * The variable age is now the age of first intercourse of
. * all people who have had intercourse by 1985
. tab age
age| Freq. Percent Cum.
------------+-----------------------------------
2 | 1 0.02 0.02
3 | 1 0.02 0.04
8 | 3 0.05 0.09
9 | 4 0.07 0.16
10 | 11 0.20 0.36
11 | 11 0.20 0.55
12 | 29 0.52 1.07
13 | 95 1.70 2.77
14 | 236 4.22 6.99
15 | 521 9.31 16.30
16 | 1020 18.23 34.54
17 | 1059 18.93 53.47
18 | 1134 20.27 73.74
19 | 620 11.08 84.82
20 | 370 6.61 91.44
21 | 231 4.13 95.57
22 | 124 2.22 97.78
23 | 62 1.11 98.89
24 | 32 0.57 99.46
25 | 18 0.32 99.79
26 | 9 0.16 99.95
27 | 3 0.05 100.00
------------+-----------------------------------
Total | 5594 100.00
. regress age none cath oth
Source | SS df MS Number of obs = 5594
---------+------------------------------ F( 3, 5590) = 31.10
Model | 460.623661 3 153.54122 Prob > F = 0.0000
Residual | 27598.6357 5590 4.93714414 R-squared = 0.0164
---------+------------------------------ Adj R-squared = 0.0159
Total | 28059.2594 5593 5.0168531 Root MSE = 2.222
------------------------------------------------------------------------------
age | Coef. Std. Err. t P>|t| [95 Conf. Interval]
---------+--------------------------------------------------------------------
none | -.4101829 .160575 -2.554 0.011 -.7249723 -.0953935
cath | .566983 .0659714 8.594 0.000 .4376534 .6963127
oth | .3732208 .0980449 3.807 0.000 .1810148 .5654268
_cons | 17.2053 .0412396 417.204 0.000 17.12446 17.28615
------------------------------------------------------------------------------
. * significant t values indicate that that age
. * differs for that religion significantly compared to
. * protestant christians
. * If you ever want to see hatVar(beta_hat), use the matrix "get(VCE)" command
. matrix eVbhat = get(VCE)
. matrix l eVbhat
symmetric eVbhat[4,4]
none cath oth _cons
none .02578433
cath .0017007 .00435223
oth .0017007 .0017007 .00961279
_cons -.0017007 -.0017007 -.0017007 .0017007
. * Why don't we include dummy variable for each religion?
. regress age none cath oth prot
Source | SS df MS Number of obs = 5594
---------+------------------------------ F( 3, 5590) = 31.10
Model | 460.623661 3 153.54122 Prob > F = 0.0000
Residual | 27598.6357 5590 4.93714414 R-squared = 0.0164
---------+------------------------------ Adj R-squared = 0.0159
Total | 28059.2594 5593 5.0168531 Root MSE = 2.222
------------------------------------------------------------------------------
age | Coef. Std. Err. t P>|t| [95 Conf. Interval]
---------+--------------------------------------------------------------------
none | -.7834037 .1788735 -4.380 0.000 -1.134065 -.4327422
cath | .1937622 .1027795 1.885 0.059 -.0077254 .3952499
oth | (dropped)
prot | -.3732208 .0980449 -3.807 0.000 -.5654268 -.1810148
_cons | 17.57853 .0889499 197.623 0.000 17.40415 17.7529
------------------------------------------------------------------------------
. * Answer: because X'X is not invertible!!!!