POL51
November 13, 2024
Why DAG?
Identifying effects
ggdag()
We want to identify the effect of X (waffles) on Y (divorce)
We can use our model to identify that effect, BUT:
We also know that lurking variables might make things go awry (the South)
We know that the DAG on the left will produce the spurious correlation on the right
Regardless of whether or not waffles cause divorce
Controlling for the wrong thing can close a perplexing pipe – this erases part or all of the effect that X has on Y
Or open up an exploding collider – creates weird correlation between X and Y
Are the police more likely to use deadly force against people of color?
Black Americans are 3.23 times more likely than white Americans to be killed by police (Schwartz and Jahn, 2020)
Yet there are big debates about how exactly to estimate this bias (and the extent to which it exists)
Fryer (2019) finds that Blacks and Hispanics are 50% more likely to be stopped by police, but that conditional on being stopped by the police, there are no racial differences in officer-involved shootings
Fryer used extensive controls about the nature of the interaction, time of day, and hundreds of factors that I’ve captured with Confounds
Fryer shows that once you account for the indirect effect, the direct effect is basically not there – once the police has stopped someone, they do not use deadly force more often against Minorities than Whites
But what if police are more likely to stop people they believe are “suspicious” AND use force against people they find “suspicious”? THEN conditioning on the stop is equivalent to conditioning on a collider
We’d like to know if Minorities are killed more than Whites in police interactions once they are stopped
But controlling for being stopped creates collider bias
Super tough to estimate the effect of race ➡️ police abuse with observational data!
We have to be careful and slow
Think carefully about what the DAG probably looks like
Use the DAG to figure out what we need to control
(and what must be left alone)
Next time: how to actually control for stuff
DAGs can also help us see why experiments “work”:
Person | Shown an ad? | Democrats thermometer |
---|---|---|
1 | Yes | 58.3 |
2 | No | 12.05 |
3 | Yes | 57.82 |
4 | No | 90.98 |
5 | No | 94.64 |
Experiments seem simple…
But the outcome can be very complex …
And yet we can still identify the effect because nothing causes you to receive the experimental treatment; it is random!
Say the ad experiment was implemented on TikTok, and younger people are more likely to use TikTok than older people
This means Age is now a fork
Judea Pearl’s back-door criterion ties this all together
Confounding caused by existence of an open “back door” path from X to Y
A backdoor path is a non-causal path from X to Y
Need to close back-doors and keep front-doors open
A backdoor path can involve a chain of variables – like the fork, but with more steps
Here we have a backdoor path between X and Y that runs through a
, b
, and m
We can identify X \(\rightarrow\) Y by controlling for any variable in the backdoor path to break the chain: m
, a
, or b