In-class example
Here’s the code we’ll be using in class. Download it and store it with the rest of your materials for this course. If simply clicking doesn’t trigger download, you should right-click and select “save link as…”

class: make up a relationship
We use `rnorm`

to simulate data. Three arguments: number of draws, mean, standard deviation:

rnorm (n = 5 , mean = 10 , sd = 2 )

`[1] 12.597806 9.037005 8.582153 10.629605 9.079209`

We made up this data:

fake_election = tibble (party_share = rnorm (n = 500 , mean = 50 , sd = 5 ),
funding = rnorm (n = 500 , mean = 20000 , sd = 4000 ) + 2000 * party_share)
fake_election

```
# A tibble: 500 × 2
party_share funding
<dbl> <dbl>
1 53.0 123590.
2 57.0 136940.
3 52.3 123661.
4 43.4 104649.
5 46.2 117465.
6 46.1 115563.
7 46.7 117569.
8 41.2 106682.
9 48.9 113562.
10 52.4 125225.
# ℹ 490 more rows
```

We can plot it:

ggplot (fake_election, aes (x = party_share, y = funding)) + geom_point () + geom_smooth (method = "lm" )

Notice we made the causal effect equal 2000 dollars per percent of the vote won. We can estimate this and get pretty close using OLS:

mod = lm (funding ~ party_share, data = fake_election)
tidy (mod)

```
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 19344. 1815. 10.7 5.05e- 24
2 party_share 2016. 36.1 55.8 1.73e-216
```

It’s close but not perfect because there is “noise” in our data. These numbers are randomly generated!

class: confounds
Here we want to make it so a third variable, “the south”, confounds the relationship between the number of waffle houses in a state and the divorce rate:

fake = tibble (south = sample (c (0 , 1 ), size = 50 , replace = TRUE ),
waffle = rnorm (n = 50 , mean = 20 , sd = 4 ) + 10 * south,
divorce = rnorm (n = 50 , mean = 20 , sd = 2 ) + 8 * south)
fake

```
# A tibble: 50 × 3
south waffle divorce
<dbl> <dbl> <dbl>
1 0 27.3 18.1
2 1 27.2 27.6
3 1 32.8 30.9
4 0 19.3 16.5
5 1 25.3 28.6
6 0 18.6 21.2
7 0 17.2 17.3
8 0 21.1 21.0
9 1 31.7 28.1
10 0 15.0 20.9
# ℹ 40 more rows
```

We can plot:

ggplot (fake, (aes (x = waffle, y = divorce))) + geom_point () + geom_smooth (method = "lm" )

We can model to retrieve the confounded estimate:

lm (divorce ~ waffle, data = fake) |> broom:: tidy ()

```
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 11.1 1.84 6.05 0.000000208
2 waffle 0.515 0.0716 7.20 0.00000000368
```