In this guide you will acquire the skills needed to process and present spatial data in R. The objectives of the guide are as follows

  1. Understand how spatial data are processed in R
  2. Learn basic data wrangling operations on spatial data
  3. Learn how to make a map in R

This lab guide follows closely and supplements the material presented in Chapters 1, 2.1, 2.2, 8 and 9 in the textbook Geocomputation with R (GWR) and class Handout 5.


Assignment 5 is due by 10:00 am, February 14th on Canvas. See here for assignment guidelines. You must submit an .Rmd file and its associated .html file. Name the files: yourLastName_firstInitial_asgn05. For example: brazil_n_asgn05.

Open up a R Markdown file


Download the Lab template into an appropriate folder on your hard drive (preferably, a folder named ‘Lab 5’), open it in R Studio, and type and run your code there. The template is also located on Canvas under Files. Change the title (“Lab 5”) and insert your name and date. Don’t change anything else inside the YAML (the stuff at the top in between the ---). Also keep the grey chunk after the YAML. For a rundown on the use of R Markdown, see the assignment guidelines

Installing and loading packages


You’ll need to install the following packages in R. You only need to do this once, so if you’ve already installed these packages, skip the code. Also, don’t put these install.packages() commands in your R Markdown document. Copy and paste the code in the R Console. We’ll talk about what functions these packages provide as they come up in the guide.

install.packages("sf")
install.packages("tigris")
install.packages("tmap")
install.packages("RColorBrewer")

You’ll need to load the following packages using library(). Unlike installing, you will always need to load packages whenever you start a new R session.

library(tidyverse)
library(tidycensus)
library(flextable)
library(sf)
library(tigris)
library(tmap)
library(RColorBrewer)

Spatial data in R


The main package we will use for handling spatial data in R is the tidy friendly sf package. sf stands for simple features. What is a feature? A feature is thought of as a thing, or an object in the real world, such as a building or a tree. A county can be a feature. As can a city and a neighborhood. Features have a geometry describing where on Earth the features are located, and they have attributes, which describe other properties. Think back to Lab 3 - we were working with counties. The difference between what we were doing then and what we will be doing in this lab is that counties in Lab 3 had attributes (e.g. percent Hispanic, total population), but they did not have geometries. As such, we could not put them on a map because we didn’t have their specific geographic coordinates. This is what separates nonspatial and spatial data in R.

Bringing in spatial data


sf is the specific type of data object that deals with spatial information in R. Think back to Lab 1 when we discussed the various ways R stores data - sf is just another way. But please note that spatial data themselves outside of R can take on many different formats. We’ll be primarily working with shapefiles in this class. Shapefiles are not the only type of spatial data, but they are the most commonly used. Let’s be clear here: sf objects are R specific and shapefiles are a general format of spatial data. This is like tibbles are R specific and csv files are a general format of non spatial data.

We will be primarily working with census geographic data in this lab and pretty much all future labs. If you need a reminder of the Census geographies, go back to Handout 3. There are two major packages for bringing in Census shapefiles into R: tidycensus and tigris.

tidycensus


In Lab 3, we worked with the tidycensus package and the Census API to bring in Census data into R. Fortunately, we can use the same commands to bring in Census geographic data. First, load in your Census API key. If you have not already installed your Census API key, add the install = TRUE argument in census_api_key().

census_api_key("YOUR API KEY GOES HERE")

Then use the get_acs() command to bring in California tract-level median household income, total foreign-born population, and total population from the 5-year 2018-2022 American Community Survey (ACS). Remember that “E” at the end of the variable indicates “Estimate” and “M” indicates margin of errors.

ca.tracts <- get_acs(geography = "tract", 
              year = 2022,
              variables = c(medincome = "B19013_001", 
                            fb = "B05012_003", totp = "B05012_001"), 
              state = "CA",
              survey = "acs5",
              output = "wide",
              geometry = TRUE)

The only difference between the code above and what we used in Lab 3 is we have one additional argument added to the get_acs() command: geometry = TRUE. This command tells R to bring in the spatial features associated with the geography you specified in the command, in our case California tracts. You can further narrow your geographic scope to the county level by typing in county = as an argument. For example, to get just Sacramento county tracts, you would type in county = "Sacramento". Type in ca.tracts to see what we’ve got.

ca.tracts
## Simple feature collection with 9129 features and 8 fields (with 20 geometries empty)
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -124.4096 ymin: 32.53444 xmax: -114.1312 ymax: 42.00948
## Geodetic CRS:  NAD83
## First 10 features:
##          GEOID                                               NAME medincomeE
## 1  06077005127 Census Tract 51.27; San Joaquin County; California     102440
## 2  06077003406 Census Tract 34.06; San Joaquin County; California      38497
## 3  06077004402 Census Tract 44.02; San Joaquin County; California      89167
## 4  06077001700    Census Tract 17; San Joaquin County; California      51083
## 5  06077000401  Census Tract 4.01; San Joaquin County; California      57500
## 6  06077003404 Census Tract 34.04; San Joaquin County; California      53567
## 7  06001423000      Census Tract 4230; Alameda County; California      86917
## 8  06001450200      Census Tract 4502; Alameda County; California     107111
## 9  06001422200      Census Tract 4222; Alameda County; California     105750
## 10 06001406201   Census Tract 4062.01; Alameda County; California      50417
##    medincomeM  fbE fbM totpE totpM                       geometry
## 1       17341 2243 399  7580  1068 MULTIPOLYGON (((-121.2871 3...
## 2        4680  988 300  3768   698 MULTIPOLYGON (((-121.309 38...
## 3        7193 1714 508  5903   937 MULTIPOLYGON (((-121.2734 3...
## 4       10253 1195 424  4245   843 MULTIPOLYGON (((-121.2664 3...
## 5       20110  469 204  2856   338 MULTIPOLYGON (((-121.3133 3...
## 6        9885 1622 449  7182  1024 MULTIPOLYGON (((-121.2951 3...
## 7       20998  755 251  4549   851 MULTIPOLYGON (((-122.282 37...
## 8       23797 1925 385  5272   670 MULTIPOLYGON (((-121.9224 3...
## 9       29978  548 154  3280   612 MULTIPOLYGON (((-122.2942 3...
## 10      19883 2459 513  5072   747 MULTIPOLYGON (((-122.2343 3...


The object looks much like a basic tibble, but with a few differences.

  • You’ll find that the description of the object now indicates that it is a simple feature collection with 8 fields (attributes or columns of data). There are 9129 features, in this case, census tracts.
  • The Geometry Type indicates that the spatial data are in MULTIPOLYGON form (as opposed to points or lines, the other basic vector data forms, which were discussed in Handout 5).
  • The Bounding Box indicates the spatial extent of the features (from left to right, for example, California tracts go from a longitude of -124.4096 to -114.1312).
  • Geodetic CRS is related to the coordinate reference system, which we’ll touch on later in the quarter.
  • The final difference is that the data frame contains the column geometry. This geometry is what makes this data frame spatial. Remember that a tibble is a data frame. Hence, an sf object is basically a tibble, or has tibble like qualities. This means that we can use nearly all of the functions we’ve learned in the past three labs on sf objects. Hooray for consistency!

We can find out the object’s class

class(ca.tracts)
## [1] "sf"         "data.frame"

Here we find that the object is an sf data frame.

tigris package


Another package that allows us to bring in census geographic boundaries is tigris. Here is a list of all the geographies you can download through this package. Let’s bring in the boundaries for Sacramento city. Remember from Handout 3 that cities are designated as places by the Census. Use the places() function to get all places in California.

pl <- places(state = "CA", cb = TRUE, year=2022)


The cb = TRUE argument tells R to download a generalized cartographic boundary file, which drastically reduces the size of the data (compare the file size when you don’t include cb = TRUE). For example, it eliminates all areas that are strictly covered by water (e.g. lakes). The argument year=2022 tells R to bring in the boundaries for that year (census geographies can change from year to year). When using the multi-year ACS, best to use the end year of the period. In the get_acs() command above we used year=2022, so also use year=2022 in the places() command. Note that unlike the tidycensus package, tigris does not allow you to attach attribute data (e.g. Hispanic, total population, etc.) to geometric features.

Take a glimpse of pl

glimpse(pl)
## Rows: 1,611
## Columns: 13
## $ STATEFP    <chr> "06", "06", "06", "06", "06", "06", "06", "06", "06", "06",…
## $ PLACEFP    <chr> "55156", "22804", "02924", "16350", "43224", "31414", "5992…
## $ PLACENS    <chr> "02411359", "02410455", "02409738", "02410232", "02410875",…
## $ AFFGEOID   <chr> "1600000US0655156", "1600000US0622804", "1600000US0602924",…
## $ GEOID      <chr> "0655156", "0622804", "0602924", "0616350", "0643224", "063…
## $ NAME       <chr> "Palmdale", "Escondido", "Arvin", "Corona", "Los Alamitos",…
## $ NAMELSAD   <chr> "Palmdale city", "Escondido city", "Arvin city", "Corona ci…
## $ STUSPS     <chr> "CA", "CA", "CA", "CA", "CA", "CA", "CA", "CA", "CA", "CA",…
## $ STATE_NAME <chr> "California", "California", "California", "California", "Ca…
## $ LSAD       <chr> "25", "25", "25", "25", "25", "25", "25", "25", "25", "25",…
## $ ALAND      <dbl> 274705334, 96726218, 12482061, 103320711, 10380819, 3394103…
## $ AWATER     <dbl> 629569, 276445, 0, 53926, 161830, 10948, 3904619, 31054, 10…
## $ geometry   <MULTIPOLYGON [°]> MULTIPOLYGON (((-118.2877 3..., MULTIPOLYGON (…

We see a geometry column, which indicates we have a spatial data frame.

We can use filter() to keep Sacramento city. We will filter on the variable NAME to keep Sacramento.

sac.city <- filter(pl, NAME == "Sacramento")

The argument NAME == "Sacramento" tells R to keep cities with the exact city name “Sacramento”. Make sure we got what we wanted.

glimpse(sac.city)
## Rows: 1
## Columns: 13
## $ STATEFP    <chr> "06"
## $ PLACEFP    <chr> "64000"
## $ PLACENS    <chr> "02411751"
## $ AFFGEOID   <chr> "1600000US0664000"
## $ GEOID      <chr> "0664000"
## $ NAME       <chr> "Sacramento"
## $ NAMELSAD   <chr> "Sacramento city"
## $ STUSPS     <chr> "CA"
## $ STATE_NAME <chr> "California"
## $ LSAD       <chr> "25"
## $ ALAND      <dbl> 255491817
## $ AWATER     <dbl> 5413141
## $ geometry   <MULTIPOLYGON [°]> MULTIPOLYGON (((-121.5601 3...

Let’s use use the function counties() to bring in 2022 county boundaries.

cnty <- counties(state = "CA", cb = TRUE, year=2022)

Take a look at the data to make sure it contains what we want.

glimpse(cnty)
## Rows: 58
## Columns: 13
## $ STATEFP    <chr> "06", "06", "06", "06", "06", "06", "06", "06", "06", "06",…
## $ COUNTYFP   <chr> "037", "097", "001", "045", "015", "055", "063", "101", "02…
## $ COUNTYNS   <chr> "00277283", "01657246", "01675839", "00277287", "01682074",…
## $ AFFGEOID   <chr> "0500000US06037", "0500000US06097", "0500000US06001", "0500…
## $ GEOID      <chr> "06037", "06097", "06001", "06045", "06015", "06055", "0606…
## $ NAME       <chr> "Los Angeles", "Sonoma", "Alameda", "Mendocino", "Del Norte…
## $ NAMELSAD   <chr> "Los Angeles County", "Sonoma County", "Alameda County", "M…
## $ STUSPS     <chr> "CA", "CA", "CA", "CA", "CA", "CA", "CA", "CA", "CA", "CA",…
## $ STATE_NAME <chr> "California", "California", "California", "California", "Ca…
## $ LSAD       <chr> "06", "06", "06", "06", "06", "06", "06", "06", "06", "06",…
## $ ALAND      <dbl> 10515988166, 4080091725, 1910010507, 9082586851, 2606049835…
## $ AWATER     <dbl> 1785003207, 497303747, 216909647, 961786019, 578810831, 104…
## $ geometry   <MULTIPOLYGON [°]> MULTIPOLYGON (((-118.6044 3..., MULTIPOLYGON (…

To get Sacramento county, we use the filter() function. Similar to pl, we will filter on the variable NAME to keep Sacramento.

sac.county <- filter(cnty, NAME == "Sacramento")

If you want to get tract boundaries from tigris use the function tracts(). Note that in both tracts() and get_acs(), you can get tracts within a specific county within CA by specifying the county = argument. For example, tracts(state = "CA", county = "Sacramento", cb = TRUE, year=2022) will yield census tracts in Sacramento county.

Guess what? You earned another badge! Yipee!!

Reading from your hard drive


Directly reading spatial files using an API is great, but doesn’t exist for many spatial data sources. You’ll often have to download a spatial data set, save it onto your hard drive and read it into R. The function for reading spatial files from your hard drive as sf objects is st_read().

Let’s bring in two shapefiles I created that contains (1) median housing values for census tracts in Sacramento county from the 2018-2022 ACS and (2) Sacramento county parks. I zipped up the files and uploaded it onto Github. Make sure your current working directory is pointed to the appropriate folder on your hard drive (use setwd()).

download.file(url = "https://raw.githubusercontent.com/crd150/data/master/lab5files.zip", destfile = "lab5files.zip")


If you are having problems with the above code, I also uploaded the zip file on Canvas in the Week 5 Lab folder. To manually unzip zipped files on a Mac, check here. For Windows, check here.

After unzipping the folder, you should see SacramentoCountyTracts, Parks and californiatractsrace files in your current working directory. Note that the shapefile is actually not a single file but is represented by multiple files. For SacramentoCountyTracts, you should see four files named SacramentoCountyTracts with shp, dbf, prj, and shx extensions. These files are all connected to one another, so don’t manually alter these files. Only open and alter these files in R. Moreover, if you want to remove a shapefile from your hard drive, delete all the associated files not just one. For Parks, you will see five associated files. We will bring in the non-spatial data file californiatractsrace.csv a little later.

Make sure your working directory is now pointed to the lab5files folder. Then bring in the Sacramento County tract shapefile using the function st_read(), which is from the sf package. You’ll need to add the .shp extension so that the function knows it’s reading in a shapefile.

sac.county.tracts <- st_read("SacramentoCountyTracts.shp", stringsAsFactors = FALSE)

The argument stringsAsFactors = FALSE tells R to keep any variables that look like a character as a character and not a factor, which we won’t use much, if at all, in this class.

Take a look

glimpse(sac.county.tracts )
## Rows: 363
## Columns: 4
## $ GEOID    <chr> "06067009105", "06067008905", "06067009110", "06067007209", "…
## $ NAME     <chr> "Census Tract 91.05; Sacramento County; California", "Census …
## $ medhval  <dbl> 332700, 371500, 161900, 379300, 419500, 468400, 582700, 50400…
## $ geometry <POLYGON [US_survey_foot]> POLYGON ((6738187 1967860, ..., POLYGON …

Has a geometry column, so we know it is spatial. Bring in the parks file.

parks <- st_read("Parks.shp", stringsAsFactors = FALSE)

Take a look

glimpse(parks)
## Rows: 741
## Columns: 43
## $ OBJECTID   <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, …
## $ PARK_ID    <int> 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010,…
## $ LANDMARK   <chr> "ANCIL HOFFMAN GOLF COURSE", "HOGBACK ISLAND FISHING ACCESS…
## $ DESCRIPTIO <chr> "GOLF COURSE", "BOAT LAUNCH", "PARKWAY", "PARK", "PARK", "P…
## $ DISTRICT   <chr> NA, "SAC COUNTY", "SAC COUNTY", "COSUMNES CSD", "CITY OF SA…
## $ PARK       <chr> "Ancil Hoffman Golf Course", "Hogback Island Fishing Access…
## $ PARK_ADDRE <chr> "6700 TARSHES DR", "1500 GRAND ISLAND RD", "7929 LA RIVIERA…
## $ ZIP_CODE   <chr> "95608", "95690", "95864", "95757", "95833", "95670", "9582…
## $ AMPHITHEAT <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ PUBLIC_POO <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LAP_SWIM   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SWIM_LESSO <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SPRAY_PARK <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ DOG_PARK   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ART_CULTUR <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ DESTINATIO <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ DISC_GOLF  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ GOLF       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HORSEBACK_ <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ INDOOR_FAC <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ INDOOR_F_1 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ INDOOR_F_2 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LIGHTED_TE <chr> NA, NA, NA, NA, NA, NA, NA, "Yes", NA, NA, NA, NA, NA, NA, …
## $ NATURE_CEN <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ PICNIC_REN <chr> NA, NA, NA, NA, "Yes", NA, NA, "Yes", NA, NA, "Yes", NA, "Y…
## $ SENIOR_PRO <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SKATE_BIKE <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SPORTS_COM <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ WALKING_TR <chr> NA, NA, "Yes", NA, NA, "Yes", NA, NA, "Yes", NA, NA, NA, NA…
## $ CYCLING_TR <chr> NA, NA, "Yes", NA, NA, "Yes", NA, NA, "Yes", NA, NA, NA, NA…
## $ RUNNING_TR <chr> NA, NA, "Yes", NA, NA, "Yes", NA, NA, "Yes", NA, NA, NA, NA…
## $ VOLUNTEER_ <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ YEAR_ROUND <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ YEAR_ROU_1 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ZOO        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ UPDATED    <chr> NA, "8/2012", "12/3/12", NA, "12/2012", "12/3/12", NA, "12/…
## $ NOTES      <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LONG_DISTR <chr> NA, "Sacramento County Regional Parks", "Sacramento County …
## $ PARK_DISTR <chr> NA, "22", "22", "17", "06", "22", NA, "06", "22", NA, "06",…
## $ ADDRESSID  <int> 553142, 310123, 207322, 302976, 538217, 199945, NA, 417668,…
## $ SHAPE_Leng <dbl> 11904.8115, 3977.6496, 12451.2700, 876.5463, 1308.4914, 511…
## $ SHAPE_Area <dbl> 7130139.52, 445005.74, 1556528.08, 43736.12, 102181.60, 118…
## $ geometry   <MULTIPOLYGON [US_survey_foot]> MULTIPOLYGON (((6759145 198..., M…

What form are these spatial data in?

Data Wrangling


There is a lot of stuff behind the curtain of how R handles spatial data as simple features, but the main takeaway is that sf objects are data frames. This means you can use many of the functions we’ve learned in the past couple labs to manipulate sf objects, and this includes our best buddy the pipe %>% operator. For example, let’s do the following data wrangling tasks on ca.tracts.

  1. Drop the margins of errors
  2. Rename the variables
  3. Calculate percent foreign born

We do all of this in one line of continuous code using the pipe operator %>%

ca.tracts <- ca.tracts %>%
            select(-medincomeM, -fbM, -totpM) %>%
            rename(medincome = medincomeE, fb = fbE, totp = totpE) %>%
            mutate(pfb = 100*(fb/totp))

Notice that we’ve already used all of the functions above for nonspatial data wrangling.

Another important operation is to join attribute data to an sf object. For example, let’s say you wanted to add 2018-2022 ACS tract level percent race/ethnicity, which is located in the californiatractsrace.csv file we downloaded earlier. Bring this file in using our familiar friend read_csv().

ca.race <- read_csv("californiatractsrace.csv")

Take a look at the dataset

glimpse(ca.race)
## Rows: 9,129
## Columns: 5
## $ GEOID    <chr> "06001400100", "06001400200", "06001400300", "06001400400", "…
## $ pnhwhite <dbl> 68.981340, 69.864928, 58.551344, 64.048621, 41.478855, 43.974…
## $ pnhblk   <dbl> 5.230957, 1.769912, 9.414487, 11.664329, 25.145606, 26.190476…
## $ pnhasn   <dbl> 14.591618, 11.690731, 11.300943, 8.719028, 9.369461, 6.365403…
## $ phisp    <dbl> 5.108596, 7.405682, 10.108560, 7.152875, 15.497594, 6.511176,…

Remember, were dealing with data frames here, so we can use left_join(), which we covered in Lab 3, to join the non spatial data frame ca.race to the spatial data frame sac.county.tracts.

sac.county.tracts <- sac.county.tracts %>%
  left_join(ca.race, by = "GEOID")

Take a peek

glimpse(sac.county.tracts)
## Rows: 363
## Columns: 8
## $ GEOID    <chr> "06067009105", "06067008905", "06067009110", "06067007209", "…
## $ NAME     <chr> "Census Tract 91.05; Sacramento County; California", "Census …
## $ medhval  <dbl> 332700, 371500, 161900, 379300, 419500, 468400, 582700, 50400…
## $ pnhwhite <dbl> 30.63192, 44.85925, 23.91304, 59.69185, 64.92731, 64.90752, 6…
## $ pnhblk   <dbl> 3.3916458, 11.1599122, 34.2995169, 1.5407555, 8.1856204, 1.23…
## $ pnhasn   <dbl> 15.530168, 7.925734, 4.685990, 3.354871, 4.003187, 7.262854, …
## $ phisp    <dbl> 41.163870, 30.664803, 32.946860, 27.460239, 14.419438, 13.965…
## $ geometry <POLYGON [US_survey_foot]> POLYGON ((6738187 1967860, ..., POLYGON …

Note that you cannot join two sf objects together using left_join(). Always use left_join() to join a regular non spatial data frame to a spatial object. For example, you will get an error if you try to join the ca.tracts to sac.county.tracts

sac.county.tracts <- sac.county.tracts %>%
  left_join(ca.tracts, by = "GEOID")
## Error: y should not have class sf; for spatial joins, use st_join

The error message tells you that you can use the function st_join() to join two spatial data objects. We’ll learn more about st_join() in the next lab.

You can also use the exploratory data analysis functions we learned about in Lab 4. For example, what is the correlation between median housing value and percent black?

sac.county.tracts %>%
  summarize(Correlation = cor(medhval, pnhblk, use = "complete.obs"))
## Simple feature collection with 1 feature and 1 field
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 6601243 ymin: 1768598 xmax: 6840876 ymax: 2030363
## Projected CRS: NAD83 / California zone 2 (ftUS)
##   Correlation                       geometry
## 1  -0.3960333 POLYGON ((6608864 1788526, ...

We get the correlation, but it’s not quite as clean as when we calculated descriptive statistics in Lab 4. That’s because the resulting object after you do summarize() is still a spatial object. To coerce it into a non spatial object, use the function st_drop_geometry(), which unsurprisingly drops the object’s geometry. This way you can use flextable() to create presentation ready tables.

sac.county.tracts %>%
  summarize(Correlation = cor(medhval, pnhblk, use = "complete.obs")) %>%
  st_drop_geometry() %>%
  flextable() %>%
  colformat_double(digits = 2) 

Correlation

-0.40

The main takeaway: sf objects are data frames, so you can use many of the functions you’ve learned in the past couple of labs on these objects. Here’s a badge!

Saving shapefiles


To save an sf object to a file, use the function st_write() and specify at least two arguments, the sf object you want to save and a file name in quotes with the file extension. You’ll also need to specify delete_layer = TRUE which overwrites the existing file if it already exists in your current working directory. Make sure you’ve set your directory to the folder you want your file to be saved in. Type in getwd() to see your current directory and use setwd() to set the directory.

Let’s save sac.county.tracts as a shapefile named saccountytractslab5.shp.

st_write(sac.county.tracts, "saccountytractslab5.shp", delete_layer = TRUE)

Check your current working directory to see if saccountytractslab5.shp and its associated files were saved.

You can save your sf object in a number of different data formats other than shp. We won’t be concerned too much with these other formats in this class, but you can see a list of them here.

Mapping in R


Now that you’ve got your spatial data in and wrangled, the next natural step is to map something. There are several functions in R that can be used for mapping. We won’t go through all of them, but GWR outlines in Table 9.1 the range of mapping packages available in R. The package we’ll rely on in this class for mapping is tmap.

The foundation function for mapping in tmap is tm_shape(). You then build on tm_shape() by adding one or more elements, all taking on the form of tm_. Let’s make a choropleth map of median housing values.

tm_shape(sac.county.tracts) +
  tm_polygons(col = "medhval", style = "quantile")

You first put the dataset sac.county.tracts inside tm_shape(). Because you are plotting polygons, you use tm_polygons() next. If you are plotting points, you will use tm_dots(). If you are plotting lines, you will use tm_lines(). The argument col = "medhval" tells R to shade the tracts by the variable medhval. tmap allows users to specify the classification style with the style argument. The argument style = "quantile" tells R to break up the shading into quantiles, or equal groups of 5. Seven of the most useful classification styles are described in the bullet points below (taken from GWR):

  • style = pretty, the default setting, rounds breaks into whole numbers where possible and spaces them evenly
  • style = equal divides input values into bins of equal range, and is appropriate for variables with a uniform distribution (not recommended for variables with a skewed distribution as the resulting map may end-up having little color diversity)
  • style = quantile ensures the same number of observations fall into each category (with the potential down side that bin ranges can vary widely)
  • style = jenks identifies groups of similar values in the data and maximizes the differences between categories
  • style = cont (and order) present a large number of colors over continuous color field, and are particularly suited for continuous rasters (order can help visualize skewed distributions)
  • style = sd divides the values by standard deviations above and below the mean.
  • style = cat was designed to represent categorical values and assures that each category receives a unique color

The importance of choosing the appropriate classification scheme is discussed in Handout 5. You’ll get some practice trying out other classification schemes in this week’s assignment. Note that tmap is smart enough to detect the presence of missing values, and shades them gray and labels them on the legend.

You can overlay multiple features on one map. For example, we can add park polygons on top of county tracts, providing a visual association between parks and percent white. Here, we add another tm_shape() and tm_polygons() to the above code.

tm_shape(sac.county.tracts) +
  tm_polygons(col = "medhval", style = "quantile") +
  tm_shape(parks) +
    tm_polygons(col = "green")

Color Scheme


Don’t like the yellow/brown color scheme? We can change the color scheme using the argument palette = within tm_polygons(). The argument palette = defines the color ranges associated with the bins as determined by the style argument. Below we use the color scheme “Reds” using palette = "Reds".

tm_shape(sac.county.tracts) +
  tm_polygons(col = "medhval", style = "quantile", palette = "Reds") 

See Ch. 9.2.4 in GWR for a fuller discussion on color and other schemes you can specify.

In addition to the built-in palettes, customized color ranges can be created by specifying a vector with the desired colors as anchors. This will create a spectrum of colors in the map that range between the colors specified in the vector. For instance, if we used c(“red”, “blue”), the color spectrum would move from red to purple, then to blue, with in between shades. In our example:

tm_shape(sac.county.tracts) +
  tm_polygons(col = "medhval", style = "quantile",palette =  c("red","blue")) 

Not exactly a pretty picture. In order to capture a diverging scale, we insert “white” in between red and blue.

tm_shape(sac.county.tracts) +
  tm_polygons(col = "medhval", style = "quantile",palette = c("red","white", "blue")) 

A preferred approach to select a color palette is to chose one of the schemes contained in the RColorBrewer package. These are based on the research of cartographer Cynthia Brewer (see the colorbrewer2 web site for details). RColorBrewer makes a distinction between sequential scales (for a scale that goes from low to high), diverging scales (to highlight how values differ from a central tendency), and qualitative scales (for categorical variables). For each scale, a series of single hue and multi-hue scales are suggested. In the RColorBrewer package, these are referred to by a name (e.g., the “Reds” palette we used above is an example). The full list is contained in the RColorBrewer documentation.

There are two very useful commands in this package. One sets a color palette by specifying its name and the number of desired categories. The result is a character vector with the hex codes of the corresponding colors.

For example, we select a sequential color scheme going from blue to green, as BuGn, by means of the command brewer.pal, with the number of categories (6) and the scheme as arguments. The resulting vector contains the HEX codes for the colors.

pal <- brewer.pal(6,"BuGn")
pal
## [1] "#EDF8FB" "#CCECE6" "#99D8C9" "#66C2A4" "#2CA25F" "#006D2C"

Using this palette in our map yields the following result.

tm_shape(sac.county.tracts) +
  tm_polygons(col = "medhval", style = "quantile", palette="BuGn") 

The command display.brewer.pal() allows us to explore different color schemes before applying them to a map. For example:

display.brewer.pal(6,"BuGn")

Legend


There are many options to change the formatting of the legend. Often, the automatic title for the legend is not intuitive, since it is simply the variable name (in our case, medhval). This can be customized by setting the title argument in tm_polygons(). Let’s change the legend title to “Housing values”

tm_shape(sac.county.tracts) +
  tm_polygons(col = "medhval", style = "quantile",palette = "Reds", 
              title = "Housing values") 

Another important aspect of the legend is its positioning. This is handled through the tm_layout() function. This function has a vast number of options, as detailed in the documentation. Also check the help documentation for tm_layout() to see the complete list of settings and examples in Ch. 9.2.5 in GWR. There are also specialized subsets of layout functions, focused on specific aspects of the map, such as tm_legend(), tm_style() and tm_format(). We illustrate the positioning of the legend.

The default is to position the legend inside the map. Often, this default solution is appropriate, but sometimes further control is needed. The legend.position argument in the tm_layout() function moves the legend around the map, and it takes on a vector of two string variables that determine both the horizontal position (“left”, “right”, or “center”) and the vertical position (“top”, “bottom”, or “center”). The default is “right” and “bottom”. But, we can change it to, say, top right.

tm_shape(sac.county.tracts) +
  tm_polygons(col = "medhval", style = "quantile",palette = "Reds",
              title = "Housing values") +
  tm_layout(legend.position =  c("right", "top"))

Yuck. We can leave it at the bottom right. Or there is also the option to position the legend outside the frame of the map. This is accomplished by setting legend.outside to TRUE (the default is FALSE) in tm_layout().

tm_shape(sac.county.tracts) +
  tm_polygons(col = "medhval", style = "quantile",palette = "Reds", 
              title = "Housing values") +
  tm_layout(legend.outside = TRUE)

We can also customize the size of the legend, its alignment, font, etc. We refer to GWR for specifics.

Title


Another functionality of the tm_layout() function is to set a title for the map, and specify its position, size, etc. For example, we can set the title using main.title, and the size using main.title.size as in the example below. We made the font size a bit smaller (0.95) in order not to overwhelm the map.

tm_shape(sac.county.tracts) +
  tm_polygons(col = "medhval", style = "quantile",palette = "Reds",
              title = "Housing values") +
  tm_layout(main.title = "2018-22 Median Housing Values in Sacramento County",
              main.title.size = 0.95, legend.outside = TRUE)

You can change the title position using main.title.position. For example, we center the title by specifying “center”.

tm_shape(sac.county.tracts) +
  tm_polygons(col = "medhval", style = "quantile",palette = "Reds",
              title = "Housing values") +
  tm_layout(main.title = "2018-22 Median Housing Values in Sacramento County",
              main.title.size = 0.95, main.title.position="center", 
              legend.outside = TRUE)

Scale bar and arrow


We need to add the other key map elements described in Handout 5. Here is where we start adding more layout functions after tm_polygons() using the + operator. First, the scale bar, which you can add using the function tm_scale_bar()

tm_shape(sac.county.tracts, unit = "mi") +
  tm_polygons(col = "medhval", style = "quantile",palette = "Reds",
              title = "Housing values") +
  tm_layout(main.title = "2018-22 Median Housing Values in Sacramento County",
              main.title.size = 0.95, main.title.position="center", 
              legend.outside = TRUE) +
  tm_scale_bar(breaks = c(0, 5, 10), text.size  = 0.75, 
               position = c("right", "bottom")) 

The argument breaks within tm_scale_bar() tells R the distances to break up and end the bar. Make sure you use reasonable break points - the Sacramento county area is not, for example, 200 miles wide, so you should not use c(0,100,200) (try it and see what happens. You won’t like it). Note that the scale is in miles (were in America!). The default is in kilometers (the rest of the world!), but you can specify the units within tm_shape() using the argument unit. Here, we used unit = "mi" to designate distance in the scale bar measured in miles. The position = argument locates the scale bar on the bottom right of the map. The argument text.size = controls the size of the scale bar. We decrease the size by 25%.


The next element is the north arrow, which we can add using the function tm_compass(). You can control for the type, size and location of the arrow within this function. We place a 4-star arrow on the top left of the map.

tm_shape(sac.county.tracts, unit = "mi") +
  tm_polygons(col = "medhval", style = "quantile",palette = "Reds",
              title = "Housing values") +
  tm_layout(main.title = "2018-22 Median Housing Values in Sacramento County",
              main.title.size = 0.95, main.title.position="center", 
              legend.outside = TRUE) +
  tm_scale_bar(breaks = c(0, 5, 10), text.size  = 0.75, 
               position = c("right", "bottom"))  +
  tm_compass(type = "4star", position = c("left", "top")) 

Other features


We can make the map prettier by changing a variety of settings. We can eliminate the frame around the map using the argument frame = FALSE with tm_layout. We also add back the parks.

sacmhval.map <- tm_shape(sac.county.tracts, unit = "mi") +
  tm_polygons(col = "medhval", style = "quantile",palette = "Reds",
              title = "Housing values") +
  tm_layout(main.title = "2018-22 Median Housing Values in Sacramento County",
              main.title.size = 0.95, main.title.position="center", 
              legend.outside = TRUE, frame = FALSE, ) +
  tm_scale_bar(breaks = c(0, 5, 10), text.size  = 0.75, 
               position = c("right", "bottom"))  +
  tm_compass(type = "4star", position = c("left", "top")) +  
  tm_shape(parks) +
  tm_polygons(col = "green")

sacmhval.map

Notice that we stored the map into an object called sacmhval.map. R is an object-oriented language, so everything you make in R are objects that can be stored for future manipulation. This includes maps. You should see sacmhval.map in your Environment window. By storing the map, you can access it anytime during your current R session.

Multiple map objects can be arranged in a single ‘metaplot’ with tmap_arrange(). For example, let’s map housing values and percent Hispanic in Sacramento county. Let’s create the percent Hispanic map, and save it into an object named sacphisp.map.

sacphisp.map <- tm_shape(sac.county.tracts, unit = "mi") +
    tm_polygons(col = "phisp", style = "quantile",palette = "Reds",
                title = "Housing values") +
    tm_layout(main.title = "2018-22 Percent Hispanic in Sacramento County",
              main.title.size = 0.95, main.title.position="center", 
              legend.outside = TRUE, frame = FALSE, ) +
    tm_scale_bar(breaks = c(0, 5, 10), text.size  = 0.75, 
                 position = c("right", "bottom"))  +
    tm_compass(type = "4star", position = c("left", "top")) +  
    tm_shape(parks) +
    tm_polygons(col = "green")

Then use tmap_arrange() to map sacmhval.map and sacphisp.map side by side

tmap_arrange(sacmhval.map, sacphisp.map)

Check the full list of tm_ elements here.

Saving maps


You can save your maps a couple of ways.

  1. On the plotting screen where the map is shown, click on Export and save it as either an image or pdf file.
  2. Use the function tmap_save()

For option 2, we can save the map object sacmhval.map as such

tmap_save(sacmhval.map, "saccountyhval.png")

Specify the tmap object and a filename with an extension. It supports pdf, eps, svg, wmf, png, jpg, bmp and tiff. The default is png. Also make sure you’ve set your directory to the folder that you want your map to be saved in.

Interactive maps


So far we’ve created static maps. That is, maps that don’t “move”. But, we’re all likely used to Google or Bing maps - maps that we can move around and zoom into. You can make interactive maps in R using the package tmap.

To make your tmap object interactive, use the function tmap_mode(). Type in “view” inside this function.

tmap_mode("view")

Now that the interactive mode has been ‘turned on’, all maps produced with tm_shape() will launch. Let’s view our saved sacmhval.map interactively.

sacmhval.map

Click on above the map and a larger window should open up.

Besides interactivity, another important benefit of tmap_mode() is that it provides a basemap, which was discussed in Handout 5. The function of a basemap is to provide background detail necessary to orient the location of the map. In the static maps we produced earlier, Sacramento county was sort of floating in white space. As you can see in the interactive map above we’ve added geographic context to the surrounding area.

The default basemap in tmap_mode() is CartoDB.Positron. You can change the basemap through the tm_basemap() function. For example, let’s change the basemap to an OpenStreetMap.

sacmhval.map + tm_basemap("OpenStreetMap")

For a complete list of basemaps with previews, see here. There are a lot of cool ones, so please test them out.

You can save your interactive map using the same methods described for static maps. To switch back to plotting mode (noninteractive), type in

tmap_mode("plot")

You’ve completed your introduction to tmap. Whew! Badge? Yes, please, you earned it! Time to celebrate!

Assignment 5


Download and open the Assignment 5 R Markdown Script. The script can also be found on Canvas (Files - Week 5 - Assignment). Any response requiring a data analysis task must be supported by code you generate to produce your result. Just examining your various objects in the “Environment” section of R Studio is insufficient—you must use scripted commands. Submit the Rmd and its knitted html files on Canvas.


  1. CalEnviroScreen is a mapping tool that helps identify California communities that are most affected by many sources of pollution, and where people are often especially vulnerable to pollution’s effects. Download the CalEnviroScreen California tract-level shapefile from the the official CES website. The file is also located on Canvas in the Week 5 Assignment folder. Make sure to download it in an appropriate folder on your hard drive. Unzip the folder.
  1. Read the file into R and keep tracts in Yolo county. (1 point)
  2. Make a static presentation-ready map of the variable PM2_5 for Yolo county tracts, which is the level of pollution based on PM 2.5 levels (higher PM 2.5, more pollution), using quantile breaks. Summarize in your own words the geographic distribution of pollution burden in Yolo county based on the map. (2 points)
  3. Produce static presentation-ready maps of PM2_5 using three classifications other than quantile breaks. Describe if and how your general impression of where high and low pollution burden neighborhoods are located in the county change across the different classifications. (6 points)
  1. We are gradually receding from a global health crisis marked by profound human suffering. Analyzing the demographic, health and job characteristics of communities reveals that some places were disproportionately affected by this suffering. In particular, we found that communities with no or unstable internet access are at a disadvantage given that employment, medical care (telemedicine), education and other important services were and still are primarily conducted online at home. Let’s visually examine the association between internet access and race/ethnicity in Sacramento county census tracts.
  1. Using the Census API, bring in 2018-2022 total, non-Hispanic white, non-Hispanic black, non-Hispanic Asian, and Hispanic populations for census tracts in Sacramento county. Check Lab 3 for the appropriate variable IDs. Make sure to bring in spatial data by using geometry = TRUE. Also make sure to clean the data. Calculate the percent Asian, white, Black and Hispanic variables. (2 points)
  2. Bring into R the 2018-2022 percent of households without a cable, Fiber Optic, or DSL internet connection for Sacramento county census tracts from PolicyMap (from the PolicyMap site, click on Quality of Life, Computer and Internet Access, Household Internet Access, Without High Speed Internet Access, and No Cable, Fiber Optic or DSL). Make sure to clean the data. (1 point)
  3. Merge the percent of households without a broadband, cable, Fiber Optic, or DSL internet connection to the sf census tracts object you created in (a). (1 point)
  4. Create static presentation-ready maps of percent non-Hispanic black, non-Hispanic white, non-Hispanic Asian, and Hispanic using quantile breaks. (4 points)
  5. Create a static presentation-ready map of the percent of households without a broadband, cable, Fiber Optic, or DSL internet connection using quantile breaks. (1 point)
  6. Based on a visual comparison of the maps you created in (d) and (e), summarize in your own words the association between racial/ethnic composition and percent of households without a broadband, cable, Fiber Optic, or DSL internet connection. (2 points)

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Website created and maintained by Noli Brazil