Controls

POL51

Juan Tellez

UC Davis

November 14, 2024

Plan for today

Controlling for confounds

Intuition

Limitations

Where are we so far?

Want to estimate the effect of X on Y

Elemental confounds get in our way

DAGs to model causal process

figure out which variables to control for and which to avoid

Do waffles cause divorce?

Remember, we have strong reason to believe the South is confounding the relationship between Waffle Houses and divorce rates:

Solution

We need to control (for) the South (just like Lincoln)

It has a bad influence on divorce, waffle house locations (and the integrity of the union)

But how do we do control (for) the South? And what does that even mean?

We’ve already done it

One way to adjust/control for backdoor paths is with multiple regression:

In general: \(Y = \alpha + \beta_1X_1 + \beta_2X_2 + \dots\)

In this case: \(Y = \alpha + \beta_1Waffles + \beta_2South\)

In multiple regression, coefficients (\(\beta_i\)) are different: they describe the relationship between X1 and Y, after adjusting for the X2, X3, X4, etc.

What does it mean to control?

\(Y = \alpha + \beta_1Waffles + \beta_2South\)

Three ways of thinking about \(\color{red}{\beta_1}\) here:

  • The relationship between Waffles and Divorce, controlling for the South

  • The relationship between Waffles and Divorce that cannot be explained by the South

  • The relationship between Waffles and Divorce, comparing among similar states (South vs. South, North vs. North)

Does this actually work?

Only way to know for sure is with made-up data, where we know the effects ex ante:

fake = tibble(south = sample(c(0, 1), size = 50, replace = TRUE), 
              waffle = rnorm(n = 50, mean = 20, sd = 4) + 10 * south,
              divorce = rnorm(n = 50, mean = 20, sd = 2) + 8 * south) 

What do we know?

fake = tibble(south = sample(c(0, 1), size = 50, replace = TRUE), 
              waffle = rnorm(n = 50, mean = 20, sd = 4) + 10*south,
              divorce = rnorm(n = 50, mean = 20, sd = 2) + 8*south) 

We know that waffles have 0 effect on divorce

We know that the south has an effect of 10 on divorce

We know that the south has an effect of 8 on waffles

Controlling for the South

Fit a naive model without controls:

naive_waffles = lm(divorce ~ waffle, data = fake)
tidy(naive_waffles)
termestimatestd.errorstatisticp.value
(Intercept)10.3  2     5.154.82e-06
waffle0.5290.07966.642.62e-08

Our estimate is confounded: should be zero (or very close)

Controlling for the South

Fit a better model, controlling for the South:

control_waffles = lm(divorce ~ waffle + south, data = fake)
tidy(control_waffles)
termestimatestd.errorstatisticp.value
(Intercept)20.8   1.73  12    6.58e-16
waffle-0.03970.0819-0.4850.63    
south7.59  0.87  8.73 2.15e-11

Our estimate is closer to the truth: quite close to zero

Display the results

We can display the results in a regression table, using the huxreg() function from the huxtable package:

huxreg(naive_waffles,  control_waffles)
(1)(2)
(Intercept)10.319 ***20.808 ***
(2.003)   (1.735)   
waffle0.529 ***-0.040    
(0.080)   (0.082)   
south        7.595 ***
        (0.870)   
N50        50        
R20.479    0.801    
logLik-124.527    -100.445    
AIC255.054    208.891    
*** p < 0.001; ** p < 0.01; * p < 0.05.

Regression tables

  • Regression tables are the standard way to compare models side-by-side
  • Coefficient estimates, size of sample, and other info (later in course)
(1)(2)
(Intercept)10.319 ***20.808 ***
(2.003)   (1.735)   
waffle0.529 ***-0.040    
(0.080)   (0.082)   
south        7.595 ***
        (0.870)   
N50        50        
R20.479    0.801    
logLik-124.527    -100.445    
AIC255.054    208.891    
*** p < 0.001; ** p < 0.01; * p < 0.05.

Comparison

  • Model 1 has no controls: just the relationship between Waffle Houses and Divorce

  • Model 2 controls/adjusts for: the state being in the South

  • the effect of Waffle Houses on Divorce changes with controls

  • Model 2 estimate is smaller, closer to zero

Naive modelControl South
(Intercept)10.319 ***20.808 ***
(2.003)   (1.735)   
waffle0.529 ***-0.040    
(0.080)   (0.082)   
south        7.595 ***
        (0.870)   
nobs50        50        
*** p < 0.001; ** p < 0.01; * p < 0.05.

Interpretation

  • No controls: every additional Waffle House = .5 more divorces per capita

  • With controls: after adjusting for the South, every additional Waffle House = 0.04 fewer divorces per capita

Naive modelControl South
(Intercept)10.319 ***20.808 ***
(2.003)   (1.735)   
waffle0.529 ***-0.040    
(0.080)   (0.082)   
south        7.595 ***
        (0.870)   
nobs50        50        
*** p < 0.001; ** p < 0.01; * p < 0.05.

comparing states in the same part of the country (south / not south), every additional Waffle House = 0.04 fewer divorces per capita

Another example: 🚽 and 💰

How much does having an additional bathroom boost a house’s value?

price bedrooms bathrooms sqft_living waterfront
899000 4 2 2580 FALSE
435000 2 1 1260 FALSE
657000 4 2 2180 FALSE
590000 3 4 1970 FALSE
605000 3 2 2010 FALSE
528000 2 1 840 TRUE
315000 3 2 2500 FALSE
739900 5 2 3290 FALSE

Another example: 🚽 and 💰

A huge effect:

no_controls = lm(price ~ bathrooms, data = house_prices)
huxreg("No controls" = no_controls)
No controls
(Intercept)10708.309    
(6210.669)   
bathrooms250326.516 ***
(2759.528)   
N21613        
R20.276    
logLik-304117.741    
AIC608241.481    
*** p < 0.001; ** p < 0.01; * p < 0.05.

The problem

We are comparing houses with more and fewer bathrooms. But houses with more bathrooms tend to be larger! So house size is confounding the relationship between 🚽 and 💰

What happens if we control for how large a house is?

controls = lm(price ~ bathrooms + sqft_living, data = house_prices)
huxreg("No controls" = no_controls, "Controls" = controls)
No controlsControls
(Intercept)10708.309    -39456.614 ***
(6210.669)   (5223.129)   
bathrooms250326.516 ***-5164.600    
(2759.528)   (3519.452)   
sqft_living        283.892 ***
        (2.951)   
N21613        21613        
R20.276    0.493    
logLik-304117.741    -300266.206    
AIC608241.481    600540.413    
*** p < 0.001; ** p < 0.01; * p < 0.05.

What happens if we control for how large a house is?

No controlsControls
(Intercept)10708.309    -39456.614 ***
(6210.669)   (5223.129)   
bathrooms250326.516 ***-5164.600    
(2759.528)   (3519.452)   
sqft_living        283.892 ***
        (2.951)   
nobs21613        21613        
*** p < 0.001; ** p < 0.01; * p < 0.05.

adjusting for size, additional bathrooms have a much smaller (even negative!) relationship to price

What’s going on?

In our made-up world, if we control for the South we can get back the uncounfounded estimate of Divorce ~ Waffles

But what’s lm() doing under-the-hood that makes this possible?

What’s going on?

  • lm() is estimating \(South \rightarrow Divorce\) and \(South \rightarrow Waffles\)

  • it is then subtracting out or removing the effect of South on Divorce and Waffles

  • what’s left is the relationship between Waffles and Divorce, adjusting for the influence of the South on each

Visualizing controlling for the South

This is the confounded relationship between waffles and divorce (zoomed out)

Add the south

We can see what we already know: states in the South tend to have more divorce, and more waffles

Effect of south on divorce

\(South \rightarrow Divorce = 10\) How much higher, on average, divorce is in the South than the North

Remove effect of South on divorce

Regression subtracts out the effect of the South on divorce

Next: effect of South on waffles

\(South \rightarrow Waffles = 8\) How many more, on average, Waffle Houses there are in the South than the North

Subtract out the effect of south on waffles

Regression subtracts out the effect of the South on waffles

What’s left over?

The true effect of waffles on divorce \(\approx\) 0

The other confounds

The perplexing pipe

Remember, with a perplexing pipe, controlling for Z blocks the effect of X on Y:

Simulation

Let’s make up some data to show this: every unit of foreign aid increases corruption by 8; every unit of corruption increases the number of protest by 4

fake_pipe = tibble(aid = rnorm(n = 200, mean = 10), 
                   corruption = rnorm(n = 200, mean = 10) + 8 * aid, 
                   protest = rnorm(n = 200, mean = 10) + 4 * corruption)

What is the true effect of aid on protest? Tricky since the effect runs through corruption

For every unit of aid, corruption increases by 8; and for every unit of corruption, protest increases by 4…

The effect of aid on protest is \(4 \times 8 = 32\)

The data

aid corruption protest
10.89 96.18 394.09
10.71 96.07 396.25
9.43 82.24 339.30
13.60 119.00 486.65
9.35 83.47 340.71
9.34 83.46 343.73

Bad controls

Remember, with a pipe controlling for Z (corruption) is a bad idea

Let’s fit two models, where one makes the mistake of controlling for corruption

right_model = lm(protest ~ aid, data = fake_pipe)
bad_control = lm(protest ~ aid + corruption, data = fake_pipe)

Bad controls

  • Notice how the model that mistakenly controls for Z tells you that X basically has no effect on Y (wrong)
  • The model that doesn’t control for Z is closer to the truth
Correct modelBad control
(Intercept)43.805 ***8.595 ***
(2.957)   (0.966)   
aid32.583 ***-0.530    
(0.293)   (0.602)   
corruption        4.075 ***
        (0.074)   
nobs200        200        
*** p < 0.001; ** p < 0.01; * p < 0.05.

The exploding collider

Remember, with an exploding collider, controlling for M creates strange correlations between X and Y:

Simulation

Let’s make up some data to show this:

fake_collider = tibble(x = rnorm(n = 100, mean = 10), 
                   y = rnorm(n = 100, mean = 10),
                   m = rnorm(n = 100, mean = 10) + 8 * x + 4 * y)
  • X has an effect of 8 on M

  • Y has an effect of 4 on M

  • X has no effect on Y

The data

x y m
9.765385 9.714265 127.0328
12.082301 10.857936 149.0461
7.940696 8.910833 107.3517
9.894526 10.810412 132.0944
9.943018 10.676781 132.3096
11.200857 10.711525 141.5535

Bad controls

What’s the true effect of X on Y? it’s zero

Remember, with a collider controlling for M is a bad idea

Let’s fit two models, where one makes the mistake of controlling for M

right_model = lm(y ~ x, data = fake_collider)
collided_model = lm(y ~ x + m, data = fake_collider)

Bad controls

  • Notice how the model that mistakenly controls for M tells you that X has a strong, negative effect on Y (wrong)
  • The model that doesn’t control for M is closer to the truth
Correct modelCollided!
(Intercept)9.235 ***-2.513 ***
(0.824)   (0.345)   
x0.088    -1.969 ***
(0.083)   (0.054)   
m        0.248 ***
        (0.006)   
nobs100        100        
*** p < 0.001; ** p < 0.01; * p < 0.05.

Colliding as sample selection

Most of the time when we see a collider, it’s because we’re looking at a weird sample of the population we’re interested in

Examples: the non-relationship between height and scoring, among NBA players; the (alleged) negative correlation between how surprising and reliable findings are, among published research

Hiring at Google

Imagine Google wants to hire the best of the best, and they have two criteria: interpersonal skills, and technical skills

Say Google can measure how socially and technically skilled someone is (0-100)

fake_google = tibble(social_skills = rnorm(n = 200, mean = 50, sd = 10), 
                     tech_skills = rnorm(n = 200, mean = 50, sd = 10))

The two are causally unrelated: one does not affect the other; improving someone’s social skills would not hurt their technical skills

The data

social_skills tech_skills
68.75 62.41
46.25 36.08
42.20 64.85
50.82 39.91
48.28 66.39
38.20 53.77
54.25 49.62
40.50 36.41

Simulate the hiring process

Now imagine that they add up the two skills to see a person’s overall quality:

fake_google %>% 
  mutate(total_score = social_skills + tech_skills)
social_skillstech_skillstotal_score
42  53.995.9
52.245.998.1
44.122.166.2
45.858.9105  
49.657.6107  
46.352.799  
50.950.9102  
32.856.589.3
33.941  74.9
59  37.696.6
59.443.7103  
60.853.3114  
48.976.6125  
43.741.985.6
55.343.398.7
50.241  91.2
50.653  104  
61.450  111  
53  53  106  
38.647.886.3
36.253.890  
50.464.6115  
42.459.5102  
50.247.898  
42.664.9107  
44.544.789.2
40.760.7101  
45.652.297.8
51.546.497.9
65.335.5101  
45.132.277.3
65  57.5122  
58.771.9131  
54.249.6104  
40.549.489.9
44.751.896.5
58.350.9109  
48.366.4115  
51.452.7104  
41.842.784.5
41.549.190.6
60  47.2107  
40.454.995.3
65.450.5116  
53.554.5108  
50.846.397.2
43.554.397.8
58.138.896.9
57.555  113  
47.634.982.5
58  49.1107  
61  47.6109  
52.340.292.5
49.764.4114  
33.542.676.1
39.459  98.4
59  52.1111  
41.136.577.6
39.862.1102  
38.857.896.6
36.162.698.7
42.264.8107  
29.534.463.9
43.651.995.5
46.148.494.5
52.658.1111  
60.549.9110  
46.454.2101  
45.140.185.2
47  59  106  
42  35.477.5
68.762.4131  
39.252.191.3
65.433  98.4
48.645.193.7
45.856.5102  
27.952.580.4
48.256.9105  
52.944.497.3
42.459.9102  
52.542.795.1
32.944.577.4
44.739.283.9
52.148  100  
42.446.488.8
54.157.7112  
45.949.295.1
57.851.9110  
40.536.476.9
37  60.597.5
39.548.688.1
53.253.5107  
53.247.8101  
49.440.489.7
68.951.1120  
56.635.391.9
41.351.392.6
56.552.6109  
52.351.5104  
62.134.997  
30  61.291.2
53.729.983.5
57.753.8112  
56.444.9101  
50.839.990.7
40  59.599.5
59.334.894.1
46.658.4105  
43.232.575.7
40.951.992.7
49.266  115  
58.454.9113  
46.951.698.6
58.247.2105  
48.833.682.3
47.168  115  
38.661.299.8
53.952.1106  
42  40.582.5
43.646.490  
44.541.385.8
46.236.182.3
56.841.197.9
48.957.1106  
56.760.6117  
54.533.788.2
46.954.3101  
63.958.4122  
50.252.9103  
55.557.8113  
58.755.7114  
37.558.195.6
60  49.9110  
76.236.5113  
39.252.791.9
50.159.4110  
63.663.3127  
59.544.6104  
24.158.182.2
57.451.3109  
68.247.3116  
46.243.389.5
59.640.8100  
39.353.692.9
62  50.3112  
45.750.496  
42.255.197.3
54.140.794.8
48.164.5113  
49.447.797.1
37.763.2101  
47.252.299.4
54.441.796.1
59.346.2106  
50.931  81.9
46.473.9120  
34.649.484  
51.345  96.3
40  52.192.1
51.632.484  
35.349  84.2
36.630.867.4
64.256.4121  
44.556.4101  
38.432.170.5
58.348.7107  
64.644.5109  
49.655.2105  
57.145.5103  
35.351.887.1
44.740.485.1
58.558.3117  
43.542.485.9
45.756.3102  
49.557.7107  
47.851.499.2
51  67  118  
38.335  73.3
54.141.996  
47.955.7104  
44.436.280.6
63.250.3114  
50.839.990.7
38.253.892  
58.142.2100  
59.855.2115  
41  31.972.8
54.340.694.9
49.654.4104  
47.251  98.1
57.567.9125  
39.357.496.7
51.556  108  
48.539.487.9
45.361  106  
51.741.793.4
49.746.496.1
44.952.597.4
37.243.680.8
50.739.189.8

Simulating the hiring process

Now imagine that Google only hires people who are in the top 15% of quality (in this case that’s 112.8 or higher)

social_skillstech_skillstotal_scorehired
42  53.995.9no
52.245.998.1no
44.122.166.2no
45.858.9105  no
49.657.6107  no
46.352.799  no
50.950.9102  no
32.856.589.3no
33.941  74.9no
59  37.696.6no
59.443.7103  no
60.853.3114  yes
48.976.6125  yes
43.741.985.6no
55.343.398.7no
50.241  91.2no
50.653  104  no
61.450  111  no
53  53  106  no
38.647.886.3no
36.253.890  no
50.464.6115  yes
42.459.5102  no
50.247.898  no
42.664.9107  no
44.544.789.2no
40.760.7101  no
45.652.297.8no
51.546.497.9no
65.335.5101  no
45.132.277.3no
65  57.5122  yes
58.771.9131  yes
54.249.6104  no
40.549.489.9no
44.751.896.5no
58.350.9109  no
48.366.4115  yes
51.452.7104  no
41.842.784.5no
41.549.190.6no
60  47.2107  no
40.454.995.3no
65.450.5116  yes
53.554.5108  no
50.846.397.2no
43.554.397.8no
58.138.896.9no
57.555  113  yes
47.634.982.5no
58  49.1107  no
61  47.6109  no
52.340.292.5no
49.764.4114  yes
33.542.676.1no
39.459  98.4no
59  52.1111  no
41.136.577.6no
39.862.1102  no
38.857.896.6no
36.162.698.7no
42.264.8107  no
29.534.463.9no
43.651.995.5no
46.148.494.5no
52.658.1111  no
60.549.9110  no
46.454.2101  no
45.140.185.2no
47  59  106  no
42  35.477.5no
68.762.4131  yes
39.252.191.3no
65.433  98.4no
48.645.193.7no
45.856.5102  no
27.952.580.4no
48.256.9105  no
52.944.497.3no
42.459.9102  no
52.542.795.1no
32.944.577.4no
44.739.283.9no
52.148  100  no
42.446.488.8no
54.157.7112  no
45.949.295.1no
57.851.9110  no
40.536.476.9no
37  60.597.5no
39.548.688.1no
53.253.5107  no
53.247.8101  no
49.440.489.7no
68.951.1120  yes
56.635.391.9no
41.351.392.6no
56.552.6109  no
52.351.5104  no
62.134.997  no
30  61.291.2no
53.729.983.5no
57.753.8112  no
56.444.9101  no
50.839.990.7no
40  59.599.5no
59.334.894.1no
46.658.4105  no
43.232.575.7no
40.951.992.7no
49.266  115  yes
58.454.9113  yes
46.951.698.6no
58.247.2105  no
48.833.682.3no
47.168  115  yes
38.661.299.8no
53.952.1106  no
42  40.582.5no
43.646.490  no
44.541.385.8no
46.236.182.3no
56.841.197.9no
48.957.1106  no
56.760.6117  yes
54.533.788.2no
46.954.3101  no
63.958.4122  yes
50.252.9103  no
55.557.8113  yes
58.755.7114  yes
37.558.195.6no
60  49.9110  no
76.236.5113  yes
39.252.791.9no
50.159.4110  no
63.663.3127  yes
59.544.6104  no
24.158.182.2no
57.451.3109  no
68.247.3116  yes
46.243.389.5no
59.640.8100  no
39.353.692.9no
62  50.3112  yes
45.750.496  no
42.255.197.3no
54.140.794.8no
48.164.5113  yes
49.447.797.1no
37.763.2101  no
47.252.299.4no
54.441.796.1no
59.346.2106  no
50.931  81.9no
46.473.9120  yes
34.649.484  no
51.345  96.3no
40  52.192.1no
51.632.484  no
35.349  84.2no
36.630.867.4no
64.256.4121  yes
44.556.4101  no
38.432.170.5no
58.348.7107  no
64.644.5109  no
49.655.2105  no
57.145.5103  no
35.351.887.1no
44.740.485.1no
58.558.3117  yes
43.542.485.9no
45.756.3102  no
49.557.7107  no
47.851.499.2no
51  67  118  yes
38.335  73.3no
54.141.996  no
47.955.7104  no
44.436.280.6no
63.250.3114  yes
50.839.990.7no
38.253.892  no
58.142.2100  no
59.855.2115  yes
41  31.972.8no
54.340.694.9no
49.654.4104  no
47.251  98.1no
57.567.9125  yes
39.357.496.7no
51.556  108  no
48.539.487.9no
45.361  106  no
51.741.793.4no
49.746.496.1no
44.952.597.4no
37.243.680.8no
50.739.189.8no

General population

No relationship between social and technical skills among all job candidates

Collided!

If we only look at Google workers we see a trade-off between social and technical skills:

Limitations

It’s cool that we can control for a confound, or avoid colliders/pipes and get back the truth

But there are big limitations we must keep in mind when evaluating research:

  • We need to know what to control for (confident in our DAG)
  • We need to have data on the controls (e.g., data on Z)
  • We need our data to measure the variable well (e.g., # of homicides a good proxy for crime?)

Stuff that’s hard to measure

Ability is a likely fork for the effect of Education on Earnings; but how do you measure ability?

🚨 Your turn: pipes, colliders 🚨

Using the templates in the class script:

  • Make a realistic pipe scenario

  • Use models to show that everything goes wrong when you mistakenly control for the pipe

  • Make a realistic fork scenario

  • Use models to show that everything goes wrong when you fail to control for the fork

10:00