Number of kids | |
---|---|
(Intercept) | 0.153 |
(0.086) | |
age | 0.035 *** |
(0.002) | |
nobs | 2849 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
POL51
December 4, 2024
The confidence interval
Are we sure it’s not zero?
Making inferences from data
Say we wanted to know the relationship between number of kids and age:
Number of kids | |
---|---|
(Intercept) | 0.153 |
(0.086) | |
age | 0.035 *** |
(0.002) | |
nobs | 2849 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
For every year older, you have .035 more kids
How uncertain should we be? Boostrapping to the rescue:
Bootstrap | Coefficient | Estimate |
---|---|---|
1 | age | 0.0351 |
2 | age | 0.0310 |
3 | age | 0.0344 |
4 | age | 0.0365 |
5 | age | 0.0336 |
6 | age | 0.0335 |
7 | age | 0.0342 |
8 | age | 0.0357 |
9 | age | 0.0353 |
10 | age | 0.0342 |
How much might our estimate of lm(childs ~ age)
vary? Look at the standard error
mean | standard_error |
---|---|
0.0345 | 0.00164 |
You might report it is .0345, +/- 2 standard errors
Notice! This is what the (NUMBERS) mean in the regression table; they are the standard error of the coefficient estimate
We can also look at the 95% confidence interval – we are 95% “confident” the effect of age on the number of children a person has is between…
The last (and most controversial) way to quantify uncertainty is statistical significance
We want to make a binary decision: are we confident enough in this estimate to say that it is significant?
Or are we too uncertain, given sampling variability?
Is the result significant, or not significant?
Imagine we run an experiment on TV ads, estimate the effect of the treatment on voter turnout, and the 95% confidence interval for that effect
The experiment went well so we are confident in causal sense, but there is still uncertainty from sampling
Three scenarios for the results: same effect size, different 95% confidence intervals
Small range: great! Effect is precise; ads are worth it!
Big range: OK-ish! Effect could be tiny, or huge; are the ads worth it? At least we know they help (positive)
Crosses zero: awful! Ads could work (+), they could do nothing (0), or they could be counterproductive (-)
When the 95% CI crosses zero we are so uncertain we are unsure whether effect is positive, zero, or negative
Researchers worry so much about this that it is conventional to report whether the 95% CI of an effect estimate crosses zero
When a 95% CI for an estimate doesn’t cross zero, we say that the estimate is statistically significant
If the 95% CI crosses zero, the estimate is not statistically significant
Statistical significance is a blunt instrument: our first two studies are quite different, but we would say they are both statistically significant
You could have an estimate with a 95% CI that barely escapes crossing zero, and call that statistically significant
And another estimate with a 95% CI that barely crosses zero, and call that not statistically significant
That’s pretty arbitrary!
And it all hinges on the size of the confidence interval
if we made a 98% CI, or a 95.1% CI, we might conclude different things are and aren’t significant
The 95% CI is a convention; where does it come from?
Fisher (1925), who came up with it, says:
It is convenient to take this point [95% CI] as a limit in judging whether [an effect] is to be considered significant or not. (Fisher 1925)
But that other options are plausible:
If one in twenty [95% CI] does not seem high enough odds, we may, if we prefer it, draw the line at one in fifty [98% CI]… or one in a hundred [99% CI]…Personally, the writer prefers…[95% CI] (Fisher 1926)
So arbitrary! Why do it?
Sometimes we have to make the call:
This is a huge topic of debate in social science, other proposals we can’t cover:
The stars (*
) in regression output tell you whether an estimate’s confidence interval crosses zero and at what level of confidence
This is done with the p-value (which we don’t cover), the mirror image of the confidence interval
Number of kids | |
---|---|
(Intercept) | 0.153 |
(0.086) | |
age | 0.035 *** |
(0.002) | |
nobs | 2849 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
*
) p < .05 = the 95% confidence interval does not cross zero**
) p < .01 = the 99% confidence interval does not cross zero***
) p < .001 = the 99.9% confidence interval does not cross zeroNumber of kids | |
---|---|
(Intercept) | 0.153 |
(0.086) | |
age | 0.035 *** |
(0.002) | |
nobs | 2849 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
The estimate is statistically significant at the…
*
) p < .05 = the 95% confidence level**
) p < .01 = the 99% confidence level***
) p < .001 = the 99.9% confidence levelNumber of kids | |
---|---|
(Intercept) | 0.153 |
(0.086) | |
age | 0.035 *** |
(0.002) | |
nobs | 2849 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |