mapcan
to create
choropleth mapsThe mapcan()
function returns a data frame with
geographic data that can be used in ggplot2
.
Visualizing spatial data in R often involves working with large,
specialized shape files that require a fair amount of conversion and
manipulation before they are ready for use in ggplot
.
mapcan
has done most of the heavy lifting, providing
flexible, ggplot
-ready geographic data.
At the most basic level, mapcan()
requires two
arguments: boundaries
and type
.
boundaries
: set boundaries = province
for geographic data at the province level,
boundaries = census
for data at the census division level,
or boundaries = ridings
for geographic data at the federal
riding level.
type
: to produce geographic data for a standard
choropleth map, set type = standard
. For population
cartogram data (for maps that alter the geography based on the
population size at the province or census division level), set
type = cartogram
. For tile grid maps data at the federal
riding level, set type = bins
. Note: while
type = cartogram
will provide data
(i.e. coordinates) for use in tile grid maps,
mapcan::riding_binplot()
is a convenient function for
creating actual tile grid maps ggplot
.
By default, mapcan()
will provide geographic data for
the entire country. You may wish to either exclude the territories from
your map or create a map of only one province:
province
: to produce geographic data for only one
province (or territory), provide the province
argument with
a provincial alpha code (options are NL
, PE
,
NS
, NB
, QC
, ON
,
MB
, SK
, AB
, BC
,
YT
, NT
, and NU
). For example,
setting province = BC
will return geographic data only for
British Columbia.
territories
: set territories = FALSE
to
exclude the territorities.
mapcan(boundaries = province,
type = standard) %>%
head()
#> long lat order hole piece pr_sgc_code group pr_english
#> 1 2278204 1261783 1 FALSE 1 10 10.1 Newfoundland and Labrador
#> 2 2294272 1251745 2 FALSE 1 10 10.1 Newfoundland and Labrador
#> 3 2300435 1250534 3 FALSE 1 10 10.1 Newfoundland and Labrador
#> 4 2306656 1243299 4 FALSE 1 10 10.1 Newfoundland and Labrador
#> 5 2315548 1241630 5 FALSE 1 10 10.1 Newfoundland and Labrador
#> 6 2313319 1237013 6 FALSE 1 10 10.1 Newfoundland and Labrador
#> pr_french pr_alpha
#> 1 Terre-Neuve-et-Labrador NL
#> 2 Terre-Neuve-et-Labrador NL
#> 3 Terre-Neuve-et-Labrador NL
#> 4 Terre-Neuve-et-Labrador NL
#> 5 Terre-Neuve-et-Labrador NL
#> 6 Terre-Neuve-et-Labrador NL
mapcan()
gives us a data frame with necessary components
(longitude, latitude, order, hole, piece, and group) for use with
geom_polygon()
in the ggplot
package. It also
provides four different variables to describe the province that make it
easy to merge this data with provincial data of your choice.
ggplot
To create a plot with data from mapcan()
, the following
aesthetic mappings are required: x = long
(longitude),
y = lat
(latitude), group = group
(this tells
geom_polygon()
how to group observations—in this case,
provinces). Let’s initialize the plot:
pr_map <- mapcan(boundaries = province,
type = standard) %>%
ggplot(aes(x = long, y = lat, group = group))
pr_map
This doesn’t tell us much. We need to add a geom
to
visualize the map.
geom_polygon
to plot the coordinatesTo visualize the geographic data with ggplot
, use
geom_polygon()
. It is important to also specify
coord_fixed()
—this fixes the relationship between longitude
(the x-axis) and latitude (the y-axis):
You will notice that the axis text has no substantive significance. You
can remove it, along with the axis ticks and background grid using
theme_mapcan
function, a ggplot
theme that is
part of the mapcan
package:
pr_map +
theme_mapcan() +
## Add a title
ggtitle("Map of Canada with Provincial/Territorial Boundaries")
Though beautiful, this map is not very informative (unless you are unfamiliar with the shape of Canada). Let’s add some province-level data.
It is relatively straightforward to merge your own province-level
statistics into the geographic data that mapcan()
provides.
To illustrate, we will work with the province_pop_annual
data frame that is included in the mapcan
package. This
dataset provides annual provincial/territorial population estimates
dating back to 1971. Let’s use the most recent population data, from
2017:
pop_2017 <- mapcan::province_pop_annual %>%
filter(year == 2017)
head(pop_2017)
#> year province population
#> 1 2017 Canada 36708083
#> 2 2017 Newfoundland and Labrador 528817
#> 3 2017 Prince Edward Island 152021
#> 4 2017 Nova Scotia 953869
#> 5 2017 New Brunswick 759655
#> 6 2017 Quebec 8394034
The next step is to attach these numbers to every point on the
polygons of the provinces. To do this, we first create the required
geographic with mapcan()
, then we use
inner_join()
from the dplyr
package to merge
in the pop_2017
data:
pr_geographic <- mapcan(boundaries = province,
type = standard)
pr_geographic <- inner_join(pr_geographic,
pop_2017,
by = c("pr_english" = "province"))
To colour the provinces according to their population size, set the
population variable as a fill
aesthetic. Because population
is a continuous variable (and because I don’t like the default colours),
I will use scale_fill_viridis_c()
colour scale to colour
the map.
pr_geographic %>%
ggplot(aes(x = long, y = lat, group = group, fill = population)) +
geom_polygon() +
coord_fixed() +
theme_mapcan() +
scale_fill_viridis_c(name = "Population") +
ggtitle("Canadian Population by Province")
### Creating a choropleth map with federal riding boundaries
To create a map with federal riding boundaries, we specify
boundaries = ridings
. For the sake of illustration, let’s
also look at just one province: British Columbia.
bc_ridings <- mapcan(boundaries = ridings,
type = standard,
province = BC)
head(bc_ridings)
#> long lat order hole piece riding_code group
#> 11743 -1917601 428903.3 11743 FALSE 1 59001 59001.1
#> 11744 -1917311 426212.4 11744 FALSE 1 59001 59001.1
#> 11745 -1922364 417586.0 11745 FALSE 1 59001 59001.1
#> 11746 -1924519 416810.5 11746 FALSE 1 59001 59001.1
#> 11747 -1942173 424886.9 11747 FALSE 1 59001 59001.1
#> 11748 -1939550 430489.9 11748 FALSE 1 59001 59001.1
#> riding_name_english riding_name_french pr_sgc_code pr_english
#> 11743 Abbotsford Abbotsford 59 British Columbia
#> 11744 Abbotsford Abbotsford 59 British Columbia
#> 11745 Abbotsford Abbotsford 59 British Columbia
#> 11746 Abbotsford Abbotsford 59 British Columbia
#> 11747 Abbotsford Abbotsford 59 British Columbia
#> 11748 Abbotsford Abbotsford 59 British Columbia
#> pr_french pr_alpha centroid_long centroid_lat
#> 11743 Colombie-Britannique BC -1929376 423992.3
#> 11744 Colombie-Britannique BC NA NA
#> 11745 Colombie-Britannique BC NA NA
#> 11746 Colombie-Britannique BC NA NA
#> 11747 Colombie-Britannique BC NA NA
#> 11748 Colombie-Britannique BC NA NA
ggplot(bc_ridings, aes(x = long, y = lat, group = group)) +
geom_polygon() +
coord_fixed() +
theme_mapcan() +
ggtitle("British Columbia \nFederal Electoral Ridings")
Like with province-level statistics above, we can also merge our own
riding-level statistics into the riding-level geographic data that
mapcan()
has produced. We will work with the
federal_election_results
data frame that is included in the
mapcan
package. This dataset provides federal election
results for all elections dating back to 1997. We will use the results
of 2015 federal election to colour the ridings in British Columbia.
Note: At the moment, mapcan()
only provides
geographic data for the electoral boundaries (2013 Representation Order)
of the 2015 federal election.
bc_results <- mapcan::federal_election_results %>%
# Restrict data to include just 2015 election results from BC
filter(election_year == 2015 & pr_alpha == "BC")
head(bc_results)
#> riding_name_english riding_name_french
#> 1 Abbotsford Abbotsford
#> 2 Burnaby North--Seymour Burnaby North--Seymour
#> 3 Burnaby South Burnaby South
#> 4 Cariboo--Prince George Cariboo--Prince George
#> 5 Central Okanagan--Similkameen--Nicola Central Okanagan--Similkameen--Nicola
#> 6 Chilliwack--Hope Chilliwack--Hope
#> riding_code population voter_turnout candidate election_year
#> 1 59001 96819 69.7 Fast, Ed 2015
#> 2 59002 100632 70.3 Beech, Terry 2015
#> 3 59003 105037 60.8 Stewart, Kennedy 2015
#> 4 59004 108252 67.8 Doherty, Todd 2015
#> 5 59005 104398 71.0 Albas, Dan 2015
#> 6 59006 92734 69.7 Strahl, Mark 2015
#> party pr_alpha pr_french pr_sgc_code pr_english
#> 1 Conservative BC Colombie-Britannique 59 British Columbia
#> 2 Liberal BC Colombie-Britannique 59 British Columbia
#> 3 NDP BC Colombie-Britannique 59 British Columbia
#> 4 Conservative BC Colombie-Britannique 59 British Columbia
#> 5 Conservative BC Colombie-Britannique 59 British Columbia
#> 6 Conservative BC Colombie-Britannique 59 British Columbia
Next, we merge the two data frames (i.e. the geographic data and the election results data):
To colour the ridings according the winning party of the 2015
election, set the party
variable as a fill
aesthetic:
bc_riding_map <- bc_ridings %>%
ggplot(aes(x = long, y = lat, group = group, fill = party)) +
geom_polygon() +
coord_fixed() +
theme_mapcan() +
ggtitle("British Columbia \n2015 Federal Electoral Results")
The colours are not ideal. We can easily provide our own custom
colours that correspond to the colours associated with the different
parties with ggplot
’s scale_fill_manual()
:
bc_riding_map +
scale_fill_manual(name = "Winning party",
values = c("blue", "springgreen3", "red", "Orange"))