Using mapcan to create choropleth maps

Description

The mapcan() function returns a data frame with geographic data that can be used in ggplot2.

Overview

Visualizing spatial data in R often involves working with large, specialized shape files that require a fair amount of conversion and manipulation before they are ready for use in ggplot. mapcan has done most of the heavy lifting, providing flexible, ggplot-ready geographic data.

Arguments

At the most basic level, mapcan() requires two arguments: boundaries and type.

  • boundaries: set boundaries = province for geographic data at the province level, boundaries = census for data at the census division level, or boundaries = ridings for geographic data at the federal riding level.

  • type: to produce geographic data for a standard choropleth map, set type = standard. For population cartogram data (for maps that alter the geography based on the population size at the province or census division level), set type = cartogram. For tile grid maps data at the federal riding level, set type = bins. Note: while type = cartogram will provide data (i.e. coordinates) for use in tile grid maps, mapcan::riding_binplot() is a convenient function for creating actual tile grid maps ggplot.

By default, mapcan() will provide geographic data for the entire country. You may wish to either exclude the territories from your map or create a map of only one province:

  • province: to produce geographic data for only one province (or territory), provide the province argument with a provincial alpha code (options are NL, PE, NS, NB, QC, ON, MB, SK, AB, BC, YT, NT, and NU). For example, setting province = BC will return geographic data only for British Columbia.

  • territories: set territories = FALSE to exclude the territorities.

Examples

Creating a choropleth map with provincial boundaries

mapcan(boundaries = province,
       type = standard) %>%
  head()
#>      long     lat order  hole piece pr_sgc_code group                pr_english
#> 1 2278204 1261783     1 FALSE     1          10  10.1 Newfoundland and Labrador
#> 2 2294272 1251745     2 FALSE     1          10  10.1 Newfoundland and Labrador
#> 3 2300435 1250534     3 FALSE     1          10  10.1 Newfoundland and Labrador
#> 4 2306656 1243299     4 FALSE     1          10  10.1 Newfoundland and Labrador
#> 5 2315548 1241630     5 FALSE     1          10  10.1 Newfoundland and Labrador
#> 6 2313319 1237013     6 FALSE     1          10  10.1 Newfoundland and Labrador
#>                 pr_french pr_alpha
#> 1 Terre-Neuve-et-Labrador       NL
#> 2 Terre-Neuve-et-Labrador       NL
#> 3 Terre-Neuve-et-Labrador       NL
#> 4 Terre-Neuve-et-Labrador       NL
#> 5 Terre-Neuve-et-Labrador       NL
#> 6 Terre-Neuve-et-Labrador       NL

mapcan() gives us a data frame with necessary components (longitude, latitude, order, hole, piece, and group) for use with geom_polygon() in the ggplot package. It also provides four different variables to describe the province that make it easy to merge this data with provincial data of your choice.

Basic ingredients for ggplot

To create a plot with data from mapcan(), the following aesthetic mappings are required: x = long (longitude), y = lat (latitude), group = group (this tells geom_polygon() how to group observations—in this case, provinces). Let’s initialize the plot:

pr_map <- mapcan(boundaries = province,
       type = standard) %>%
  ggplot(aes(x = long, y = lat, group = group))
pr_map

This doesn’t tell us much. We need to add a geom to visualize the map.

Using geom_polygon to plot the coordinates

To visualize the geographic data with ggplot, use geom_polygon(). It is important to also specify coord_fixed()—this fixes the relationship between longitude (the x-axis) and latitude (the y-axis):

pr_map <- pr_map +
  geom_polygon() +
  coord_fixed()
pr_map

You will notice that the axis text has no substantive significance. You can remove it, along with the axis ticks and background grid using theme_mapcan function, a ggplot theme that is part of the mapcan package:

pr_map +
  theme_mapcan() +
  ## Add a title
  ggtitle("Map of Canada with Provincial/Territorial Boundaries")

Though beautiful, this map is not very informative (unless you are unfamiliar with the shape of Canada). Let’s add some province-level data.

Incorporate province-level statistics

It is relatively straightforward to merge your own province-level statistics into the geographic data that mapcan() provides. To illustrate, we will work with the province_pop_annual data frame that is included in the mapcan package. This dataset provides annual provincial/territorial population estimates dating back to 1971. Let’s use the most recent population data, from 2017:

pop_2017 <- mapcan::province_pop_annual %>%
  filter(year == 2017)

head(pop_2017)
#>   year                  province population
#> 1 2017                    Canada   36708083
#> 2 2017 Newfoundland and Labrador     528817
#> 3 2017      Prince Edward Island     152021
#> 4 2017               Nova Scotia     953869
#> 5 2017             New Brunswick     759655
#> 6 2017                    Quebec    8394034

The next step is to attach these numbers to every point on the polygons of the provinces. To do this, we first create the required geographic with mapcan(), then we use inner_join() from the dplyr package to merge in the pop_2017 data:

pr_geographic <- mapcan(boundaries = province,
       type = standard)


pr_geographic <- inner_join(pr_geographic, 
           pop_2017, 
           by = c("pr_english" = "province"))

To colour the provinces according to their population size, set the population variable as a fill aesthetic. Because population is a continuous variable (and because I don’t like the default colours), I will use scale_fill_viridis_c() colour scale to colour the map.

pr_geographic %>%
  ggplot(aes(x = long, y = lat, group = group, fill = population)) +
  geom_polygon() +
  coord_fixed() +
  theme_mapcan() +
  scale_fill_viridis_c(name = "Population") +
  ggtitle("Canadian Population by Province")

### Creating a choropleth map with federal riding boundaries

Generate geographic data with riding boundaries

To create a map with federal riding boundaries, we specify boundaries = ridings. For the sake of illustration, let’s also look at just one province: British Columbia.

bc_ridings <- mapcan(boundaries = ridings,
       type = standard,
       province = BC)

head(bc_ridings)
#>           long      lat order  hole piece riding_code   group
#> 11743 -1917601 428903.3 11743 FALSE     1       59001 59001.1
#> 11744 -1917311 426212.4 11744 FALSE     1       59001 59001.1
#> 11745 -1922364 417586.0 11745 FALSE     1       59001 59001.1
#> 11746 -1924519 416810.5 11746 FALSE     1       59001 59001.1
#> 11747 -1942173 424886.9 11747 FALSE     1       59001 59001.1
#> 11748 -1939550 430489.9 11748 FALSE     1       59001 59001.1
#>       riding_name_english riding_name_french pr_sgc_code       pr_english
#> 11743          Abbotsford         Abbotsford          59 British Columbia
#> 11744          Abbotsford         Abbotsford          59 British Columbia
#> 11745          Abbotsford         Abbotsford          59 British Columbia
#> 11746          Abbotsford         Abbotsford          59 British Columbia
#> 11747          Abbotsford         Abbotsford          59 British Columbia
#> 11748          Abbotsford         Abbotsford          59 British Columbia
#>                  pr_french pr_alpha centroid_long centroid_lat
#> 11743 Colombie-Britannique       BC      -1929376     423992.3
#> 11744 Colombie-Britannique       BC            NA           NA
#> 11745 Colombie-Britannique       BC            NA           NA
#> 11746 Colombie-Britannique       BC            NA           NA
#> 11747 Colombie-Britannique       BC            NA           NA
#> 11748 Colombie-Britannique       BC            NA           NA

Plot geographic data with riding boundaries

ggplot(bc_ridings, aes(x = long, y = lat, group = group)) +
  geom_polygon() +
  coord_fixed() +
  theme_mapcan() +
  ggtitle("British Columbia \nFederal Electoral Ridings")

Incorporate riding-level statistics

Like with province-level statistics above, we can also merge our own riding-level statistics into the riding-level geographic data that mapcan() has produced. We will work with the federal_election_results data frame that is included in the mapcan package. This dataset provides federal election results for all elections dating back to 1997. We will use the results of 2015 federal election to colour the ridings in British Columbia.

Note: At the moment, mapcan() only provides geographic data for the electoral boundaries (2013 Representation Order) of the 2015 federal election.

bc_results <- mapcan::federal_election_results %>%
  # Restrict data to include just 2015 election results from BC
  filter(election_year == 2015 & pr_alpha == "BC")

head(bc_results)
#>                     riding_name_english                    riding_name_french
#> 1                            Abbotsford                            Abbotsford
#> 2                Burnaby North--Seymour                Burnaby North--Seymour
#> 3                         Burnaby South                         Burnaby South
#> 4                Cariboo--Prince George                Cariboo--Prince George
#> 5 Central Okanagan--Similkameen--Nicola Central Okanagan--Similkameen--Nicola
#> 6                      Chilliwack--Hope                      Chilliwack--Hope
#>   riding_code population voter_turnout        candidate election_year
#> 1       59001      96819          69.7         Fast, Ed          2015
#> 2       59002     100632          70.3     Beech, Terry          2015
#> 3       59003     105037          60.8 Stewart, Kennedy          2015
#> 4       59004     108252          67.8    Doherty, Todd          2015
#> 5       59005     104398          71.0       Albas, Dan          2015
#> 6       59006      92734          69.7     Strahl, Mark          2015
#>          party pr_alpha            pr_french pr_sgc_code       pr_english
#> 1 Conservative       BC Colombie-Britannique          59 British Columbia
#> 2      Liberal       BC Colombie-Britannique          59 British Columbia
#> 3          NDP       BC Colombie-Britannique          59 British Columbia
#> 4 Conservative       BC Colombie-Britannique          59 British Columbia
#> 5 Conservative       BC Colombie-Britannique          59 British Columbia
#> 6 Conservative       BC Colombie-Britannique          59 British Columbia

Next, we merge the two data frames (i.e. the geographic data and the election results data):

bc_ridings <- inner_join(bc_results, bc_ridings, by = "riding_code")

To colour the ridings according the winning party of the 2015 election, set the party variable as a fill aesthetic:

bc_riding_map <- bc_ridings %>%
  ggplot(aes(x = long, y = lat, group = group, fill = party)) +
  geom_polygon() +
  coord_fixed() +
  theme_mapcan() +
  ggtitle("British Columbia \n2015 Federal Electoral Results")

The colours are not ideal. We can easily provide our own custom colours that correspond to the colours associated with the different parties with ggplot’s scale_fill_manual():

bc_riding_map +
  scale_fill_manual(name = "Winning party",
                    values = c("blue", "springgreen3", "red", "Orange"))