id | age | degree | race | num_kids |
---|---|---|---|---|
1 | 47 | Bachelor | White | 3 |
2 | 61 | High School | White | 0 |
3 | 72 | Bachelor | White | 2 |
4 | 43 | High School | White | 4 |
5 | 55 | Graduate | White | 2 |
POL51
University of California, Davis
September 30, 2024
There are many graphs out there
Each one works best in a specific context
Each one combines different aesthetics and geometries
What am I trying to show?
What kind of variables do I have?
What aesthetics and geometries do I need for this plot?
id | age | degree | race | num_kids |
---|---|---|---|---|
1 | 47 | Bachelor | White | 3 |
2 | 61 | High School | White | 0 |
3 | 72 | Bachelor | White | 2 |
4 | 43 | High School | White | 4 |
5 | 55 | Graduate | White | 2 |
Continuous variables take on lots of values (GDP, population, income, age, etc.)
id | age | degree | race | num_kids |
---|---|---|---|---|
1 | 47 | Bachelor | White | 3 |
2 | 61 | High School | White | 0 |
3 | 72 | Bachelor | White | 2 |
4 | 43 | High School | White | 4 |
5 | 55 | Graduate | White | 2 |
Discrete or categorical variables takes on a few values, often “qualitative” (yes/no, 1/0, race, etc.)
The scatterplot visualizes the relationship between two continuous variables
Shows every point in the data, reveals trends and outliers
gdpPercap | lifeExp |
---|---|
974.58 | 43.83 |
5937.03 | 76.42 |
6223.37 | 72.30 |
4797.23 | 42.73 |
12779.38 | 75.32 |
Data | Aesthetic | Geometry |
---|---|---|
gdpPercap | x | geom_point() |
lifeExp | y | geom_point() |
Plot is uninformative because continent is discrete (i.e., a category)
The time series uses a line to show you how a variable (y-axis) moves over time (x-axis)
year | avg_yrs |
---|---|
1952 | 49.06 |
1957 | 51.51 |
1962 | 53.61 |
1967 | 55.68 |
1972 | 57.65 |
Data | Aesthetic | Geometry |
---|---|---|
year | x | geom_line() |
avg_yrs | y | geom_line() |
Notice the new geometry, geom_line()
Look at the economics
dataset from the tidyverse
package
Type and run economics
to see the data
Type and run ?economics
to read about the data
Make a time series of unemployment over time
Can you identify the recessions?
05:00
Sometimes we observe multiple units over time; how can we visualize these?
country | continent | year | lifeExp | pop | gdpPercap |
---|---|---|---|---|---|
Bolivia | Americas | 1977 | 50.023 | 5079716 | 3548.098 |
Egypt | Africa | 1997 | 67.217 | 66134291 | 4173.182 |
United States | Americas | 1977 | 73.380 | 220239000 | 24072.632 |
United States | Americas | 1997 | 76.810 | 272911760 | 35767.433 |
Germany | Europe | 1982 | 73.800 | 78335266 | 22031.533 |
color
to separate linesThese are useful for comparing trends across units (countries, places, people, etc)
A histogram shows you how a continuous variable is distributed
lifeExp |
---|
43.83 |
76.42 |
72.30 |
42.73 |
75.32 |
Data | Aesthetic | Geometry |
---|---|---|
lifeExp | x | geom_histogram() |
Notice the new geometry, geom_histogram()
; and that a histogram only uses the x-axis!
In some countries, when you die it is assumed you want to donate your organs
To not donate, you have to opt out
In other countries, when you die it is assumed you do not want to donate your organs
To not donate, you have to opt in
country | donors | opt |
---|---|---|
Ireland | 18.2 | In |
Denmark | 14.0 | In |
Germany | 13.9 | In |
Germany | 12.8 | In |
Spain | NA | Out |
Using the organdata
dataset:
Make a histogram of country’s organ donation rate (donors
)
Then set the fill aesthetic to opt
, whether donors have to opt in or opt out of donating. How does the graph change?
05:00
Barplots place a category (place, country, person, etc) on one axis and a quantity (amount, average, median, etc.) on another
Useful for making comparisons, highlighting differences
marital | tv |
---|---|
No answer | 2.56 |
Never married | 3.11 |
Separated | 3.55 |
Divorced | 3.09 |
Widowed | 3.91 |
Data | Aesthetic | Geometry |
---|---|---|
tv | x | geom_col() |
marital | y | geom_col() |
Note
You could switch the x and y mapping around, but I think categories look better on the y-axis
Boxplots compare distributions of continuous variables across groups
Boxplots contain a lot of info 🥵:
continent | lifeExp |
---|---|
Asia | 43.83 |
Europe | 76.42 |
Africa | 72.30 |
Africa | 42.73 |
Americas | 75.32 |
Data | Aesthetic | Geometry |
---|---|---|
contient | y | geom_boxplot() |
lifeExp | x | geom_boxplot() |
Note
You could switch the x and y mapping around, but I think categories look better on the y-axis
Graph | aes() | geom_ | Purpose |
---|---|---|---|
Scatterplot | x = cause, y = effect | point() | Relationships |
Time series | x = date, y = variable | line() | Trends |
Histogram | x = cont. variable | histogram() | Distributions |
Barplot | y = category, x = quantity | col() | Compare amounts |
Boxplot | y = category, x = cont. variable | boxplot() | Compare distributions |
Know how and when to use which!
We’ve barely scratched the surface; there’s many more aesthetics, geometries, and layers in ggplot()
Here are some of my favorite ones
And some ideas for making graphs better
We can use panels to show movement of a variable across time, space, etc.
Note
Make sure the facetting variable is wrapped in vars()
!
Take your aesthetics out of aes()
and into geom()
to make them static
Ease visual comparison + kinda looks like the Joy Division album
Beeswarm plots tell us something boxplots don’t: the number observations by group; used recently by the NYT
scale_fill_brewer()
for fill
, scale_color_brewer
for color
scale_fill_viridis_d
for discrete variables, scale_fill_viridis_d
for continuous
theme_spongeBob()
from tvthemes
package, many more online
Right now you probably can’t make a nice graph, so:
Make the ugliest graph you can
Post it on Slack
Winner will get a shockingly small amount of extra credit
20:00