POL51
September 30, 2024
Why visualize data?
Telling The Truth™️ with data
The grammar of graphics (our first graph)
(1868 - 1963)
American sociologist
historian
civil rights advocate
Data visualization specialist?
Data carries weight in our society
Visualizing data is an effective way to convey information, convince, argue
Visualization can be used to tell The Truth™️ (or not)
What’s not true here?
Selectively presenting data is one way of not telling The Truth™️
Averages (left) are useful, but can be misleading
Raw data (right) can be more informative
“graphs that don’t go to zero are a thought crime” (Fox, 2014)
is this necessarily true though?
Counterpoint: both of these graphs contain useful information
But is zooming out useful here? Is “temperature at which dinosaurs went extinct” valid context for us now?
There’s no one-size-fits-all answer
All visuals highlight some aspects of the data, and obscure others
But some visuals are more truthful than others; beware!
country | continent | year | lifeExp | pop | gdpPercap |
---|---|---|---|---|---|
Afghanistan | Asia | 1952 | 29 | 8425333 | 779 |
Albania | Europe | 1952 | 55 | 1282697 | 1601 |
Algeria | Africa | 1952 | 43 | 9279525 | 2449 |
Angola | Africa | 1952 | 30 | 4232095 | 3521 |
Argentina | Americas | 1952 | 62 | 17876956 | 5911 |
Data on life expectancy, GDP per capita, and population for countries around the world
country | continent | year | lifeExp | pop | gdpPercap |
---|---|---|---|---|---|
Afghanistan | Asia | 1952 | 29 | 8425333 | 779 |
Albania | Europe | 1952 | 55 | 1282697 | 1601 |
Algeria | Africa | 1952 | 43 | 9279525 | 2449 |
Angola | Africa | 1952 | 30 | 4232095 | 3521 |
Argentina | Americas | 1952 | 62 | 17876956 | 5911 |
In a dataset, rows are observations
The data we observe for Afghanistan in the year 1952
id | age | degree | race | sex |
---|---|---|---|---|
1 | 47 | Bachelor | White | Male |
2 | 61 | High School | White | Male |
3 | 72 | Bachelor | White | Male |
4 | 43 | High School | White | Female |
5 | 55 | Graduate | White | Female |
In survey data, an observation is typically a person who took the survey (a respondent)
country | continent | year | lifeExp | pop | gdpPercap |
---|---|---|---|---|---|
Afghanistan | Asia | 1952 | 29 | 8425333 | 779 |
Albania | Europe | 1952 | 55 | 1282697 | 1601 |
Algeria | Africa | 1952 | 43 | 9279525 | 2449 |
Angola | Africa | 1952 | 30 | 4232095 | 3521 |
Argentina | Americas | 1952 | 62 | 17876956 | 5911 |
In a dataset, columns are variables
Life expectancy and GDP per capita are some of the variables in our data
Graphs have an internal logic, or grammar that connects data to visuals
Data = variables in a dataset
Aesthetic = visual property of a graph (position, shape, color, etc.)
Geometry = representation of an aesthetic (point, line, text, etc.)
Data | Aesthetic | Geometry |
---|---|---|
GDP per capita | Position(x-axis) | Point |
Life expectancy | Position (y-axis) | Point |
Continent | Color | Point |
Population | Size | Point |
Take the data,
map it onto an aesthetic,
and visualize it with a geometry
Data | aes() | geom_ |
---|---|---|
gdpPercap | x | geom_point() |
lifeExp | y | geom_point() |
continent | color | geom_point() |
pop | size | geom_point() |
Use the variable names exactly as they appear in the data, map them onto the exact function names in R
ggplot()
: our first function 😢ggplot
: specify the dataOur data is named gap_07
(The Gapminder dataset for the year 2007)
aes()
to map variables to aestheticsNote
aes()
goes within ggplot()
+
aes()
aes()
labs()
Notice that text is placed within quotation marks!
labs()
Notice that text is placed within quotation marks!
Tell ggplot()
the data we want to plot
Map all variables onto aesthetics within aes()
Add layers like geom_point()
and labs()
using +
Add labels to each point by mapping country
onto the label
aesthetic within aes()
Add a text geometry layer to your plot to plot the names
05:00
Data | Aesthetic | Geometry |
---|---|---|
gdpPercap | x | geom_point() |
lifeExp | y | geom_point() |
continent | color | geom_point() |
pop | size | geom_point() |
country | label | geom_text() |
Take your data, map it onto an aesthetic, represent with a geometry