Data visualization
In-class example
Here’s the code we’ll be using in class. Download it and store it with the rest of your materials for this course. If simply clicking doesn’t trigger download, you should right-click and select “save link as…”.
- Day one:
Five graphs
- Day two:
Day 2
Making graphs with ggplot
Here we will walk through how to make some of the basic graphs in R.
The code chunk below loads our libraries and prepares the data.
# load libraries -------------------------------------------------------------------------
library(tidyverse)
library(gapminder) # install this if you don't have it!
library(ggbeeswarm) # install this if you don't have it!
library(moderndive)
# clean data ------------------------------------------------------------------------
# subset data to focus on 2007
gap_07 =
gapminder %>%
filter(year == 2007)
# calculate average life span by year
life_yr =
gapminder %>%
select(year, lifeExp) %>%
group_by(year) %>%
summarise(avg_yrs = mean(lifeExp))
# calculate average life expectancy by continent
life_region =
gap_07 %>%
group_by(continent) %>%
summarise(avg_yrs = mean(lifeExp))
# calculate average life expectancy by continent-year
life_region_yr =
gapminder %>%
group_by(continent, year) %>%
summarise(avg_yrs = mean(lifeExp))
The scatterplot
The scatterplot puts two variables on the same graph so we can see how they move together:
# the fancy, final product
ggplot(gap_07, aes(x = gdpPercap, y = lifeExp,
color = continent, size = pop)) +
geom_point()
We can add labels using labs()
and a theme
using theme_bw()
:
ggplot(gap_07, aes(x = gdpPercap, y = lifeExp,
color = continent, size = pop)) +
geom_point() +
labs(x = "GDP per capita ($USD, inflation-adjusted)",
y = "Life expectancy (in years)",
title = "Wealth and Health Around the World",
subtitle = "Data from 2007. Source: gapminder package.") +
theme_bw()
Note that there are many more themes out there.
The time series
The time series shows you how a variable moves over time, using a line. For this one we will use the life_yr
data object we constructed above, which looks like this:
## look at the data
life_yr
## # A tibble: 12 × 2
## year avg_yrs
## <int> <dbl>
## 1 1952 49.1
## 2 1957 51.5
## 3 1962 53.6
## 4 1967 55.7
## 5 1972 57.6
## 6 1977 59.6
## 7 1982 61.5
## 8 1987 63.2
## 9 1992 64.2
## 10 1997 65.0
## 11 2002 65.7
## 12 2007 67.0
We make the plot below:
# make the plot
ggplot(life_yr, aes(x = year, y = avg_yrs)) +
geom_line() +
theme_bw()
Multiple time series
Sometimes we have data where we observe multiple units moving over time, like so:
life_region_yr
## # A tibble: 60 × 3
## # Groups: continent [5]
## continent year avg_yrs
## <fct> <int> <dbl>
## 1 Africa 1952 39.1
## 2 Africa 1957 41.3
## 3 Africa 1962 43.3
## 4 Africa 1967 45.3
## 5 Africa 1972 47.5
## 6 Africa 1977 49.6
## 7 Africa 1982 51.6
## 8 Africa 1987 53.3
## 9 Africa 1992 53.6
## 10 Africa 1997 53.6
## # … with 50 more rows
We can draw separate time series for each unit by either using the color
aesthetic to separate the lines:
ggplot(life_region_yr, aes(x = year, y = avg_yrs, color = continent)) +
geom_line() +
theme_bw()
Or the group
aesthetic:
ggplot(life_region_yr, aes(x = year, y = avg_yrs, group = continent)) +
geom_line() +
theme_bw()
Histogram with geom_histogram()
ggplot(gap_07, aes(x = lifeExp)) +
geom_histogram() +
theme_bw()
Grouped histogram
ggplot(gap_07, aes(x = lifeExp, fill = continent)) +
geom_histogram() +
theme_bw()
Barplot with geom_col()
ggplot(life_region_yr, aes(x = continent, y = avg_yrs)) +
geom_col()
The boxplot
ggplot(data = gapminder, aes(x = continent, y = lifeExp)) +
geom_boxplot()
Other layers and features
Multi-panel plots with facet_wrap()
ggplot(gapminder, aes(x = lifeExp, fill = continent)) +
geom_histogram() +
facet_wrap(vars(year)) +
theme_minimal()
Make aesthetics static within the geometries
ggplot(gap_07, aes(x = gdpPercap, y = lifeExp)) + theme_light() +
geom_point(size = 3, color = "orange", shape = 2, alpha = .5) #<<
Take your aesthetics out of aes()
and into geom()
to make them static
Show a trend-line using geom_smooth()
ggplot(evals, aes(x = age, y = score)) +
geom_point() + theme_bw() + labs(x = "Professor age", y = "Student evals") +
geom_smooth() #<<
Trend lines can reveal patterns in “clumpy” data
Show separate trend-lines
ggplot(evals, aes(x = age, y = score, color = gender)) + #<<
geom_point() + theme_bw() + labs(x = "Professor age", y = "Student evals") +
geom_smooth() #<<
Relationships can look different within groups
Use different color and fill scales
ggplot(gapminder, aes(x = lifeExp, fill = continent)) +
geom_histogram() +
scale_fill_brewer(palette = "Blues") #<<
fill_brewer()
for fill
, color_brewer
for color
Many other themes
library(tvthemes)
ggplot(gapminder, aes(x = lifeExp, fill = continent)) +
geom_histogram() + theme_spongeBob() + labs(title = "Horrible") +
scale_fill_spongeBob()
theme_spongeBob()
from tvthemes
package, many more online