Data visualization II

POL51

Juan F. Tellez

University of California, Davis

December 5, 2023

Plan for today

  • A graph for every season
  • The five(ish) graphs
  • Making graphs pretty or ugly

A graph for every season

  • There are many graphs out there

  • Each one works best in a specific context

  • Each one combines different aesthetics and geometries

Key plotting questions

  • What am I trying to show?

    • (a distribution, a relationship, a comparison, an amount)
  • What kind of variables do I have?

    • (continuous, discrete, something in between)
  • What aesthetics and geometries do I need for this plot?

    • (x-axis, y-axis, color, size, shape, etc.)

What kind of variable do I have?

General Social Survey
id age degree race num_kids
1 47 Bachelor White 3
2 61 High School White 0
3 72 Bachelor White 2
4 43 High School White 4
5 55 Graduate White 2

Continuous variables take on lots of values (GDP, population, income, age, etc.)

What kind of variable do I have?

General Social Survey
id age degree race num_kids
1 47 Bachelor White 3
2 61 High School White 0
3 72 Bachelor White 2
4 43 High School White 4
5 55 Graduate White 2

Discrete or categorical variables takes on a few values, often “qualitative” (yes/no, 1/0, race, etc.)

Graph 1 - the scatterplot

The scatterplot visualizes the relationship between two continuous variables

Shows every point in the data, reveals trends and outliers

The grammar of scatterplots

The data
gdpPercap lifeExp
974.58 43.83
5937.03 76.42
6223.37 72.30
4797.23 42.73
12779.38 75.32
Mapping the data
Data Aesthetic Geometry
gdpPercap x geom_point()
lifeExp y geom_point()


ggplot(gap_07, aes(x = gdpPercap, y = lifeExp)) +
  geom_point()

Typically, we put the cause on the x-axis and the effect on the y-axis

Scatterplots are for continuous variables

Plot is uninformative because continent is discrete (i.e., a category)

Graph 2: the time series

The time series uses a line to show you how a variable (y-axis) moves over time (x-axis)

The grammar of time series

The data
year avg_yrs
1952 49.06
1957 51.51
1962 53.61
1967 55.68
1972 57.65
Mapping the data
Data Aesthetic Geometry
year x geom_line()
avg_yrs y geom_line()


Notice the new geometry, geom_line()

The time series

ggplot(life_yr, aes(x = year, y = avg_yrs)) +
  geom_line()

🚨 Your turn: 💸Recession💸 🚨

  • Look at the economics dataset from the tidyverse package

  • Type and run economics to see the data

  • Type and run ?economics to read about the data

  • Make a time series of unemployment over time

  • Can you identify the recessions?

05:00

Multiple time series

Sometimes we observe multiple units over time; how can we visualize these?

country continent year lifeExp pop gdpPercap
Bolivia Americas 1977 50.023 5079716 3548.098
Egypt Africa 1997 67.217 66134291 4173.182
United States Americas 1977 73.380 220239000 24072.632
United States Americas 1997 76.810 272911760 35767.433
Germany Europe 1982 73.800 78335266 22031.533

Start from scratch

ggplot(data = gapminder_sample) 

Add aesthetics

ggplot(data = gapminder_sample, aes(x = year, y = lifeExp)) 

Add geometry: 🤢

ggplot(data = gapminder_sample, aes(x = year, y = lifeExp)) +
  geom_line()

using color to separate lines

ggplot(data = gapminder_sample, aes(x = year, y = lifeExp, color = country)) + 
  geom_line()

Multiple time series

ggplot(data = gapminder_sample, aes(x = year, y = lifeExp, color = country)) + 
  geom_line()

These are useful for comparing trends across units (countries, places, people, etc)

Graph 3: the histogram

A histogram shows you how a continuous variable is distributed

Interpreting histograms

The grammar of histograms

The data
lifeExp
43.83
76.42
72.30
42.73
75.32
Mapping the data
Data Aesthetic Geometry
lifeExp x geom_histogram()


Notice the new geometry, geom_histogram(); and that a histogram only uses the x-axis!

The histogram

ggplot(gap_07, aes(x = lifeExp)) + geom_histogram()

🚨 Your turn: organs 🫁🧠 🚨

In some countries, when you die it is assumed you want to donate your organs

  • To not donate, you have to opt out

  • In other countries, when you die it is assumed you do not want to donate your organs

  • To not donate, you have to opt in

Sample from organdata
country donors opt
Ireland 18.2 In
Denmark 14.0 In
Germany 13.9 In
Germany 12.8 In
Spain NA Out

🚨 Your turn: organs 🫁🧠 🚨

Using the organdata dataset:

  1. Make a histogram of country’s organ donation rate (donors)

  2. Then set the fill aesthetic to opt, whether donors have to opt in or opt out of donating. How does the graph change?

05:00

Graph 4: the barplot

Barplots place a category (place, country, person, etc) on one axis and a quantity (amount, average, median, etc.) on another

Useful for making comparisons, highlighting differences

The grammar of barplots

The data
marital tv
No answer 2.56
Never married 3.11
Separated 3.55
Divorced 3.09
Widowed 3.91
Mapping the data
Data Aesthetic Geometry
tv x geom_col()
marital y geom_col()


Note

You could switch the x and y mapping around, but I think categories look better on the y-axis

The barplot

ggplot(tv, aes(y = marital, x = tv)) + 
  geom_col()

Graph 5: the boxplot

Boxplots compare distributions of continuous variables across groups

Compare distributions: the boxplot

Boxplots contain a lot of info 🥵:

  • bold line is the median observation
  • box is the middle 50% of observations
  • thin lines show you min and max value, except…
  • the dots, which are outlier observations

The grammar of boxplots

The data
continent lifeExp
Asia 43.83
Europe 76.42
Africa 72.30
Africa 42.73
Americas 75.32
Mapping the data
Data Aesthetic Geometry
contient y geom_boxplot()
lifeExp x geom_boxplot()


Note

You could switch the x and y mapping around, but I think categories look better on the y-axis

The boxplot

ggplot(gapminder, aes(y = continent, x = lifeExp)) + geom_boxplot()

The five(-ish) graphs

Graph aes() geom_ Purpose
Scatterplot x = cause, y = effect point() Relationships
Time series x = date, y = variable line() Trends
Histogram x = cont. variable histogram() Distributions
Barplot y = category, x = quantity col() Compare amounts
Boxplot y = category, x = cont. variable boxplot() Compare distributions

Know how and when to use which!

Making better graphs

  • We’ve barely scratched the surface; there’s many more aesthetics, geometries, and layers in ggplot()

  • Here are some of my favorite ones

  • And some ideas for making graphs better

Showing “movement” using panels

We can use panels to show movement of a variable across time, space, etc.

Using facet_wrap

ggplot(gapminder, aes(x = lifeExp)) + geom_histogram()

Using facet_wrap

ggplot(gapminder, aes(x = lifeExp)) + geom_histogram() + 
  facet_wrap(vars(year)) 

Note

Make sure the facetting variable is wrapped in vars()!

Make aesthetics static

ggplot(gap_07, aes(x = gdpPercap, y = lifeExp)) + 
  geom_point(size = 4, color = "orange", shape = 2) 

Take your aesthetics out of aes() and into geom() to make them static

Ridge plots (better than grouped histograms)

library(ggridges)
ggplot(gapminder, aes(y = continent, x = lifeExp)) + geom_density_ridges() 

Ease visual comparison + kinda looks like the Joy Division album

Beeswarm plots (alternative boxplots)

library(ggbeeswarm)
ggplot(gapminder, aes(y = continent, x = lifeExp)) +  geom_quasirandom() 

Beeswarm plots tell us something boxplots don’t: the number observations by group; used recently by the NYT

Use different color and fill scales

ggplot(gapminder, aes(x = lifeExp, fill = continent)) + geom_histogram() + 
  scale_fill_brewer(palette = "Blues") 

scale_fill_brewer() for fill, scale_color_brewer for color

My favorite scale (right now)

ggplot(gapminder, aes(x = lifeExp, fill = continent)) + geom_histogram() + 
  scale_fill_viridis_d(option = "magma") 

scale_fill_viridis_d for discrete variables, scale_fill_viridis_d for continuous

Many other themes

theme_spongeBob() from tvthemes package, many more online

🔥 Coding challenge 🔥

Right now you probably can’t make a nice graph, so:

  1. Make the ugliest graph you can

  2. Post it on Slack

  3. Winner will get a shockingly small amount of extra credit

20:00