POL51
University of California, Davis
September 30, 2024
Packages are where most of our functions and data live
Check out my guide
Or type this into the console and hit return/enter (note the quotation marks!):
(1868 - 1963)
American sociologist
historian
civil rights advocate
Data visualization specialist?
For better or worse, data carries weight
Visualizing data is an effective way to convince, argue, tell stories (and mislead)
Graphs, maps, diagrams and other visuals are everywhere
R
country | continent | year | lifeExp | pop | gdpPercap |
---|---|---|---|---|---|
Afghanistan | Asia | 1952 | 29 | 8425333 | 779 |
Afghanistan | Asia | 1957 | 30 | 9240934 | 821 |
Afghanistan | Asia | 1962 | 32 | 10267083 | 853 |
Afghanistan | Asia | 1967 | 34 | 11537966 | 836 |
Afghanistan | Asia | 1972 | 36 | 13079460 | 740 |
country | continent | year | lifeExp | pop | gdpPercap |
---|---|---|---|---|---|
Afghanistan | Asia | 1952 | 29 | 8425333 | 779 |
Afghanistan | Asia | 1957 | 30 | 9240934 | 821 |
Afghanistan | Asia | 1962 | 32 | 10267083 | 853 |
Afghanistan | Asia | 1967 | 34 | 11537966 | 836 |
Afghanistan | Asia | 1972 | 36 | 13079460 | 740 |
In a dataset, rows are observations
The data we observe for Afghanistan in the year 1952
id | age | degree | race | sex |
---|---|---|---|---|
1 | 47 | Bachelor | White | Male |
2 | 61 | High School | White | Male |
3 | 72 | Bachelor | White | Male |
4 | 43 | High School | White | Female |
5 | 55 | Graduate | White | Female |
In survey data, an observation is typically a person who took the survey (a respondent)
country | continent | year | lifeExp | pop | gdpPercap |
---|---|---|---|---|---|
Afghanistan | Asia | 1952 | 29 | 8425333 | 779 |
Afghanistan | Asia | 1957 | 30 | 9240934 | 821 |
Afghanistan | Asia | 1962 | 32 | 10267083 | 853 |
Afghanistan | Asia | 1967 | 34 | 11537966 | 836 |
Afghanistan | Asia | 1972 | 36 | 13079460 | 740 |
In a dataset, columns are variables
Life expectancy and GDP per capita are some of the variables in our data
Graphs have an internal logic, or grammar that connects data to visuals
Data = variables in a dataset
Aesthetic = visual property of a graph (position, shape, color, etc.)
Geometry = representation of an aesthetic (point, line, text, etc.)
Data | Aesthetic | Geometry |
---|---|---|
GDP per capita | Position(x-axis) | Point |
Life expectancy | Position (y-axis) | Point |
Continent | Color | Point |
Population | Size | Point |
Take the data,
map it onto an aesthetic,
and visualize it with a geometry
Data | aes() | geom_ |
---|---|---|
gdpPercap | x | geom_point() |
lifeExp | y | geom_point() |
continent | color | geom_point() |
pop | size | geom_point() |
Use the variable names exactly as they appear in the data, map them onto the exact function names in R
ggplot()
: our first function 😢ggplot
: specify the dataaes()
to map variables to aesthetics+
aes()
aes()
labs()
Notice that text is placed within quotation marks!
There are many more themes, here are a few
Tell ggplot()
the data we want to plot
Map all variables onto aesthetics within aes()
Add layers like geom_point()
and theme_bw()
using +
Add labels to each point by mapping country names onto the label
aesthetic within aes()
Add geom_text
layer to your plot to plot the names
05:00
Data | Aesthetic | Geometry |
---|---|---|
gdpPercap | x | geom_point() |
lifeExp | y | geom_point() |
continent | color | geom_point() |
pop | size | geom_point() |
country | label | geom_text() |
Take your data, map it onto an aesthetic, represent with a geometry
year | winner | win_party | ec_pct | popular_pct | two_term |
---|---|---|---|---|---|
1824 | John Quincy Adams | D.-R. | 0.32 | 0.31 | FALSE |
1828 | Andrew Jackson | Dem. | 0.68 | 0.56 | TRUE |
1832 | Andrew Jackson | Dem. | 0.77 | 0.55 | TRUE |
1836 | Martin Van Buren | Dem. | 0.58 | 0.51 | FALSE |
Make a plot of presidential election results using the elections_historic
dataset
% of popular vote (x-axis, popular_pct
) and % of electoral college vote (y-axis, ec_pct
)
map the winner’s party to the color
aesthetic, whether or not president served two terms to shape
, and add labels to each point (use winner_label
)
05:00
Data | Aesthetic | Geometry |
---|---|---|
popular_pct | x | geom_point() |
ec_pct | y | geom_point() |
win_party | color | geom_point() |
two_term | shape | geom_point() |
winner_label | label | geom_text() |
Take your data, map it onto an aesthetic, represent with a geometry