In this guide you will acquire the skills needed to process and present spatial data in R. The objectives of the guide are as follows
This lab guide follows closely and supplements the material presented in Chapters 1, 2.1, 2.2, 8 and 9 in the textbook Geocomputation with R (GWR) and class Handout 5.
Assignment 5 is due by 10:00 am, February 14th on
Canvas. See here for
assignment guidelines. You must submit an .Rmd
file and its
associated .html
file. Name the files:
yourLastName_firstInitial_asgn05. For example: brazil_n_asgn05.
Download the Lab
template into an appropriate folder on your hard drive (preferably,
a folder named ‘Lab 5’), open it in R Studio, and type and run your code
there. The template is also located on Canvas under Files. Change the
title (“Lab 5”) and insert your name and date. Don’t change anything
else inside the YAML (the stuff at the top in between the
---
). Also keep the grey chunk after the YAML. For a
rundown on the use of R Markdown, see the assignment
guidelines
You’ll need to install the following packages in R. You only need to
do this once, so if you’ve already installed these packages, skip the
code. Also, don’t put these install.packages()
commands in
your R Markdown document. Copy and paste the code in the R Console.
We’ll talk about what functions these packages provide as they come up
in the guide.
install.packages("sf")
install.packages("tigris")
install.packages("tmap")
install.packages("RColorBrewer")
You’ll need to load the following packages using
library()
. Unlike installing, you will always need to load
packages whenever you start a new R session.
library(tidyverse)
library(tidycensus)
library(flextable)
library(sf)
library(tigris)
library(tmap)
library(RColorBrewer)
The main package we will use for handling spatial data in R is the tidy friendly sf package. sf stands for simple features. What is a feature? A feature is thought of as a thing, or an object in the real world, such as a building or a tree. A county can be a feature. As can a city and a neighborhood. Features have a geometry describing where on Earth the features are located, and they have attributes, which describe other properties. Think back to Lab 3 - we were working with counties. The difference between what we were doing then and what we will be doing in this lab is that counties in Lab 3 had attributes (e.g. percent Hispanic, total population), but they did not have geometries. As such, we could not put them on a map because we didn’t have their specific geographic coordinates. This is what separates nonspatial and spatial data in R.
sf is the specific type of data object that deals with spatial information in R. Think back to Lab 1 when we discussed the various ways R stores data - sf is just another way. But please note that spatial data themselves outside of R can take on many different formats. We’ll be primarily working with shapefiles in this class. Shapefiles are not the only type of spatial data, but they are the most commonly used. Let’s be clear here: sf objects are R specific and shapefiles are a general format of spatial data. This is like tibbles are R specific and csv files are a general format of non spatial data.
We will be primarily working with census geographic data in this lab and pretty much all future labs. If you need a reminder of the Census geographies, go back to Handout 3. There are two major packages for bringing in Census shapefiles into R: tidycensus and tigris.
In Lab
3, we worked with the tidycensus package and the
Census API to bring in Census data into R. Fortunately, we can use the
same commands to bring in Census geographic data. First, load in your
Census API key. If you have not already installed your Census API key,
add the install = TRUE
argument in
census_api_key()
.
census_api_key("YOUR API KEY GOES HERE")
Then use the get_acs()
command to bring in California
tract-level median household income, total foreign-born population, and
total population from the 5-year 2018-2022 American Community Survey
(ACS). Remember that “E” at the end of the variable indicates “Estimate”
and “M” indicates margin of errors.
ca.tracts <- get_acs(geography = "tract",
year = 2022,
variables = c(medincome = "B19013_001",
fb = "B05012_003", totp = "B05012_001"),
state = "CA",
survey = "acs5",
output = "wide",
geometry = TRUE)
The only difference between the code above and what we used in Lab 3
is we have one additional argument added to the get_acs()
command: geometry = TRUE
. This command tells R to bring in
the spatial features associated with the geography you specified in the
command, in our case California tracts. You can further narrow your
geographic scope to the county level by typing in county =
as an argument. For example, to get just Sacramento county tracts, you
would type in county = "Sacramento"
. Type in
ca.tracts to see what we’ve got.
ca.tracts
## Simple feature collection with 9129 features and 8 fields (with 20 geometries empty)
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -124.4096 ymin: 32.53444 xmax: -114.1312 ymax: 42.00948
## Geodetic CRS: NAD83
## First 10 features:
## GEOID NAME medincomeE
## 1 06077005127 Census Tract 51.27; San Joaquin County; California 102440
## 2 06077003406 Census Tract 34.06; San Joaquin County; California 38497
## 3 06077004402 Census Tract 44.02; San Joaquin County; California 89167
## 4 06077001700 Census Tract 17; San Joaquin County; California 51083
## 5 06077000401 Census Tract 4.01; San Joaquin County; California 57500
## 6 06077003404 Census Tract 34.04; San Joaquin County; California 53567
## 7 06001423000 Census Tract 4230; Alameda County; California 86917
## 8 06001450200 Census Tract 4502; Alameda County; California 107111
## 9 06001422200 Census Tract 4222; Alameda County; California 105750
## 10 06001406201 Census Tract 4062.01; Alameda County; California 50417
## medincomeM fbE fbM totpE totpM geometry
## 1 17341 2243 399 7580 1068 MULTIPOLYGON (((-121.2871 3...
## 2 4680 988 300 3768 698 MULTIPOLYGON (((-121.309 38...
## 3 7193 1714 508 5903 937 MULTIPOLYGON (((-121.2734 3...
## 4 10253 1195 424 4245 843 MULTIPOLYGON (((-121.2664 3...
## 5 20110 469 204 2856 338 MULTIPOLYGON (((-121.3133 3...
## 6 9885 1622 449 7182 1024 MULTIPOLYGON (((-121.2951 3...
## 7 20998 755 251 4549 851 MULTIPOLYGON (((-122.282 37...
## 8 23797 1925 385 5272 670 MULTIPOLYGON (((-121.9224 3...
## 9 29978 548 154 3280 612 MULTIPOLYGON (((-122.2942 3...
## 10 19883 2459 513 5072 747 MULTIPOLYGON (((-122.2343 3...
The object looks much like a basic tibble, but with a few differences.
MULTIPOLYGON
form (as opposed to points or lines, the other
basic vector data forms, which were discussed in Handout 5).We can find out the object’s class
class(ca.tracts)
## [1] "sf" "data.frame"
Here we find that the object is an sf data frame.
Another package that allows us to bring in census geographic
boundaries is tigris. Here
is a list of all the geographies you can download through this package.
Let’s bring in the boundaries for Sacramento city. Remember from Handout
3 that cities are designated as places by the Census. Use the
places()
function to get all places in California.
pl <- places(state = "CA", cb = TRUE, year=2022)
The cb = TRUE
argument tells R to download a generalized
cartographic boundary file, which drastically reduces the size of
the data (compare the file size when you don’t include
cb = TRUE
). For example, it eliminates all areas that are
strictly covered by water (e.g. lakes). The argument
year=2022
tells R to bring in the boundaries for that year
(census geographies can change from year to year). When using the
multi-year ACS, best to use the end year of the period. In the
get_acs()
command above we used year=2022
, so
also use year=2022
in the places()
command.
Note that unlike the tidycensus package,
tigris does not allow you to attach attribute data
(e.g. Hispanic, total population, etc.) to geometric features.
Take a glimpse of pl
glimpse(pl)
## Rows: 1,611
## Columns: 13
## $ STATEFP <chr> "06", "06", "06", "06", "06", "06", "06", "06", "06", "06",…
## $ PLACEFP <chr> "55156", "22804", "02924", "16350", "43224", "31414", "5992…
## $ PLACENS <chr> "02411359", "02410455", "02409738", "02410232", "02410875",…
## $ AFFGEOID <chr> "1600000US0655156", "1600000US0622804", "1600000US0602924",…
## $ GEOID <chr> "0655156", "0622804", "0602924", "0616350", "0643224", "063…
## $ NAME <chr> "Palmdale", "Escondido", "Arvin", "Corona", "Los Alamitos",…
## $ NAMELSAD <chr> "Palmdale city", "Escondido city", "Arvin city", "Corona ci…
## $ STUSPS <chr> "CA", "CA", "CA", "CA", "CA", "CA", "CA", "CA", "CA", "CA",…
## $ STATE_NAME <chr> "California", "California", "California", "California", "Ca…
## $ LSAD <chr> "25", "25", "25", "25", "25", "25", "25", "25", "25", "25",…
## $ ALAND <dbl> 274705334, 96726218, 12482061, 103320711, 10380819, 3394103…
## $ AWATER <dbl> 629569, 276445, 0, 53926, 161830, 10948, 3904619, 31054, 10…
## $ geometry <MULTIPOLYGON [°]> MULTIPOLYGON (((-118.2877 3..., MULTIPOLYGON (…
We see a geometry column, which indicates we have a spatial data frame.
We can use filter()
to keep Sacramento city. We will
filter on the variable NAME to keep Sacramento.
sac.city <- filter(pl, NAME == "Sacramento")
The argument NAME == "Sacramento"
tells R to keep cities
with the exact city name “Sacramento”. Make sure we got what we
wanted.
glimpse(sac.city)
## Rows: 1
## Columns: 13
## $ STATEFP <chr> "06"
## $ PLACEFP <chr> "64000"
## $ PLACENS <chr> "02411751"
## $ AFFGEOID <chr> "1600000US0664000"
## $ GEOID <chr> "0664000"
## $ NAME <chr> "Sacramento"
## $ NAMELSAD <chr> "Sacramento city"
## $ STUSPS <chr> "CA"
## $ STATE_NAME <chr> "California"
## $ LSAD <chr> "25"
## $ ALAND <dbl> 255491817
## $ AWATER <dbl> 5413141
## $ geometry <MULTIPOLYGON [°]> MULTIPOLYGON (((-121.5601 3...
Let’s use use the function counties()
to bring in 2022
county boundaries.
cnty <- counties(state = "CA", cb = TRUE, year=2022)
Take a look at the data to make sure it contains what we want.
glimpse(cnty)
## Rows: 58
## Columns: 13
## $ STATEFP <chr> "06", "06", "06", "06", "06", "06", "06", "06", "06", "06",…
## $ COUNTYFP <chr> "037", "097", "001", "045", "015", "055", "063", "101", "02…
## $ COUNTYNS <chr> "00277283", "01657246", "01675839", "00277287", "01682074",…
## $ AFFGEOID <chr> "0500000US06037", "0500000US06097", "0500000US06001", "0500…
## $ GEOID <chr> "06037", "06097", "06001", "06045", "06015", "06055", "0606…
## $ NAME <chr> "Los Angeles", "Sonoma", "Alameda", "Mendocino", "Del Norte…
## $ NAMELSAD <chr> "Los Angeles County", "Sonoma County", "Alameda County", "M…
## $ STUSPS <chr> "CA", "CA", "CA", "CA", "CA", "CA", "CA", "CA", "CA", "CA",…
## $ STATE_NAME <chr> "California", "California", "California", "California", "Ca…
## $ LSAD <chr> "06", "06", "06", "06", "06", "06", "06", "06", "06", "06",…
## $ ALAND <dbl> 10515988166, 4080091725, 1910010507, 9082586851, 2606049835…
## $ AWATER <dbl> 1785003207, 497303747, 216909647, 961786019, 578810831, 104…
## $ geometry <MULTIPOLYGON [°]> MULTIPOLYGON (((-118.6044 3..., MULTIPOLYGON (…
To get Sacramento county, we use the filter()
function.
Similar to pl, we will filter on the variable NAME to
keep Sacramento.
sac.county <- filter(cnty, NAME == "Sacramento")
If you want to get tract boundaries from tigris use the
function tracts()
. Note that in both tracts()
and get_acs()
, you can get tracts within a specific county
within CA by specifying the county =
argument. For example,
tracts(state = "CA", county = "Sacramento", cb = TRUE, year=2022)
will yield census tracts in Sacramento county.
Guess what? You earned another badge! Yipee!!
Directly reading spatial files using an API is great, but doesn’t
exist for many spatial data sources. You’ll often have to download a
spatial data set, save it onto your hard drive and read it into R. The
function for reading spatial files from your hard drive as
sf objects is st_read()
.
Let’s bring in two shapefiles I created that contains (1) median
housing values for census tracts in Sacramento county from the 2018-2022
ACS and (2) Sacramento county parks.
I zipped up the files and uploaded it onto Github. Make sure your
current working directory is pointed to the appropriate folder on your
hard drive (use setwd()
).
download.file(url = "https://raw.githubusercontent.com/crd150/data/master/lab5files.zip", destfile = "lab5files.zip")
If you are having problems with the above code, I also uploaded the zip file on Canvas in the Week 5 Lab folder. To manually unzip zipped files on a Mac, check here. For Windows, check here.
After unzipping the folder, you should see SacramentoCountyTracts, Parks and californiatractsrace files in your current working directory. Note that the shapefile is actually not a single file but is represented by multiple files. For SacramentoCountyTracts, you should see four files named SacramentoCountyTracts with shp, dbf, prj, and shx extensions. These files are all connected to one another, so don’t manually alter these files. Only open and alter these files in R. Moreover, if you want to remove a shapefile from your hard drive, delete all the associated files not just one. For Parks, you will see five associated files. We will bring in the non-spatial data file californiatractsrace.csv a little later.
Make sure your working directory is now pointed to the
lab5files folder. Then bring in the Sacramento County tract
shapefile using the function st_read()
, which is from the
sf package. You’ll need to add the .shp
extension so that the function knows it’s reading in a shapefile.
sac.county.tracts <- st_read("SacramentoCountyTracts.shp", stringsAsFactors = FALSE)
The argument stringsAsFactors = FALSE
tells R to keep
any variables that look like a character as a character and not a factor, which we won’t
use much, if at all, in this class.
Take a look
glimpse(sac.county.tracts )
## Rows: 363
## Columns: 4
## $ GEOID <chr> "06067009105", "06067008905", "06067009110", "06067007209", "…
## $ NAME <chr> "Census Tract 91.05; Sacramento County; California", "Census …
## $ medhval <dbl> 332700, 371500, 161900, 379300, 419500, 468400, 582700, 50400…
## $ geometry <POLYGON [US_survey_foot]> POLYGON ((6738187 1967860, ..., POLYGON …
Has a geometry column, so we know it is spatial. Bring in the parks file.
parks <- st_read("Parks.shp", stringsAsFactors = FALSE)
Take a look
glimpse(parks)
## Rows: 741
## Columns: 43
## $ OBJECTID <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, …
## $ PARK_ID <int> 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010,…
## $ LANDMARK <chr> "ANCIL HOFFMAN GOLF COURSE", "HOGBACK ISLAND FISHING ACCESS…
## $ DESCRIPTIO <chr> "GOLF COURSE", "BOAT LAUNCH", "PARKWAY", "PARK", "PARK", "P…
## $ DISTRICT <chr> NA, "SAC COUNTY", "SAC COUNTY", "COSUMNES CSD", "CITY OF SA…
## $ PARK <chr> "Ancil Hoffman Golf Course", "Hogback Island Fishing Access…
## $ PARK_ADDRE <chr> "6700 TARSHES DR", "1500 GRAND ISLAND RD", "7929 LA RIVIERA…
## $ ZIP_CODE <chr> "95608", "95690", "95864", "95757", "95833", "95670", "9582…
## $ AMPHITHEAT <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ PUBLIC_POO <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LAP_SWIM <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SWIM_LESSO <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SPRAY_PARK <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ DOG_PARK <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ART_CULTUR <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ DESTINATIO <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ DISC_GOLF <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ GOLF <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HORSEBACK_ <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ INDOOR_FAC <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ INDOOR_F_1 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ INDOOR_F_2 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LIGHTED_TE <chr> NA, NA, NA, NA, NA, NA, NA, "Yes", NA, NA, NA, NA, NA, NA, …
## $ NATURE_CEN <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ PICNIC_REN <chr> NA, NA, NA, NA, "Yes", NA, NA, "Yes", NA, NA, "Yes", NA, "Y…
## $ SENIOR_PRO <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SKATE_BIKE <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SPORTS_COM <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ WALKING_TR <chr> NA, NA, "Yes", NA, NA, "Yes", NA, NA, "Yes", NA, NA, NA, NA…
## $ CYCLING_TR <chr> NA, NA, "Yes", NA, NA, "Yes", NA, NA, "Yes", NA, NA, NA, NA…
## $ RUNNING_TR <chr> NA, NA, "Yes", NA, NA, "Yes", NA, NA, "Yes", NA, NA, NA, NA…
## $ VOLUNTEER_ <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ YEAR_ROUND <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ YEAR_ROU_1 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ZOO <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ UPDATED <chr> NA, "8/2012", "12/3/12", NA, "12/2012", "12/3/12", NA, "12/…
## $ NOTES <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LONG_DISTR <chr> NA, "Sacramento County Regional Parks", "Sacramento County …
## $ PARK_DISTR <chr> NA, "22", "22", "17", "06", "22", NA, "06", "22", NA, "06",…
## $ ADDRESSID <int> 553142, 310123, 207322, 302976, 538217, 199945, NA, 417668,…
## $ SHAPE_Leng <dbl> 11904.8115, 3977.6496, 12451.2700, 876.5463, 1308.4914, 511…
## $ SHAPE_Area <dbl> 7130139.52, 445005.74, 1556528.08, 43736.12, 102181.60, 118…
## $ geometry <MULTIPOLYGON [US_survey_foot]> MULTIPOLYGON (((6759145 198..., M…
What form are these spatial data in?
There is a lot of stuff behind the
curtain of how R handles spatial data as simple features, but the
main takeaway is that sf objects are data frames. This
means you can use many of the functions we’ve learned in the past couple
labs to manipulate sf objects, and this includes our
best buddy the pipe %>%
operator. For example, let’s do
the following data wrangling tasks on ca.tracts.
We do all of this in one line of continuous code using the pipe
operator %>%
ca.tracts <- ca.tracts %>%
select(-medincomeM, -fbM, -totpM) %>%
rename(medincome = medincomeE, fb = fbE, totp = totpE) %>%
mutate(pfb = 100*(fb/totp))
Notice that we’ve already used all of the functions above for nonspatial data wrangling.
Another important operation is to join attribute data to an
sf object. For example, let’s say you wanted to add
2018-2022 ACS tract level percent race/ethnicity, which is located in
the californiatractsrace.csv file we downloaded earlier. Bring
this file in using our familiar friend read_csv()
.
ca.race <- read_csv("californiatractsrace.csv")
Take a look at the dataset
glimpse(ca.race)
## Rows: 9,129
## Columns: 5
## $ GEOID <chr> "06001400100", "06001400200", "06001400300", "06001400400", "…
## $ pnhwhite <dbl> 68.981340, 69.864928, 58.551344, 64.048621, 41.478855, 43.974…
## $ pnhblk <dbl> 5.230957, 1.769912, 9.414487, 11.664329, 25.145606, 26.190476…
## $ pnhasn <dbl> 14.591618, 11.690731, 11.300943, 8.719028, 9.369461, 6.365403…
## $ phisp <dbl> 5.108596, 7.405682, 10.108560, 7.152875, 15.497594, 6.511176,…
Remember, were dealing with data frames here, so we can use
left_join()
, which we covered in Lab 3, to
join the non spatial data frame ca.race to the spatial data
frame sac.county.tracts.
sac.county.tracts <- sac.county.tracts %>%
left_join(ca.race, by = "GEOID")
Take a peek
glimpse(sac.county.tracts)
## Rows: 363
## Columns: 8
## $ GEOID <chr> "06067009105", "06067008905", "06067009110", "06067007209", "…
## $ NAME <chr> "Census Tract 91.05; Sacramento County; California", "Census …
## $ medhval <dbl> 332700, 371500, 161900, 379300, 419500, 468400, 582700, 50400…
## $ pnhwhite <dbl> 30.63192, 44.85925, 23.91304, 59.69185, 64.92731, 64.90752, 6…
## $ pnhblk <dbl> 3.3916458, 11.1599122, 34.2995169, 1.5407555, 8.1856204, 1.23…
## $ pnhasn <dbl> 15.530168, 7.925734, 4.685990, 3.354871, 4.003187, 7.262854, …
## $ phisp <dbl> 41.163870, 30.664803, 32.946860, 27.460239, 14.419438, 13.965…
## $ geometry <POLYGON [US_survey_foot]> POLYGON ((6738187 1967860, ..., POLYGON …
Note that you cannot join two sf objects together
using left_join()
. Always use left_join()
to
join a regular non spatial data frame to a spatial object. For example,
you will get an error if you try to join the ca.tracts to
sac.county.tracts
sac.county.tracts <- sac.county.tracts %>%
left_join(ca.tracts, by = "GEOID")
## Error: y should not have class sf; for spatial joins, use st_join
The error message tells you that you can use the function
st_join()
to join two spatial data objects. We’ll learn
more about st_join()
in the next lab.
You can also use the exploratory data analysis functions we learned about in Lab 4. For example, what is the correlation between median housing value and percent black?
sac.county.tracts %>%
summarize(Correlation = cor(medhval, pnhblk, use = "complete.obs"))
## Simple feature collection with 1 feature and 1 field
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 6601243 ymin: 1768598 xmax: 6840876 ymax: 2030363
## Projected CRS: NAD83 / California zone 2 (ftUS)
## Correlation geometry
## 1 -0.3960333 POLYGON ((6608864 1788526, ...
We get the correlation, but it’s not quite as clean as when we
calculated descriptive statistics in Lab 4. That’s because the resulting
object after you do summarize()
is still a spatial object.
To coerce it into a non spatial object, use the function
st_drop_geometry()
, which unsurprisingly drops the object’s
geometry. This way you can use flextable()
to create
presentation ready tables.
sac.county.tracts %>%
summarize(Correlation = cor(medhval, pnhblk, use = "complete.obs")) %>%
st_drop_geometry() %>%
flextable() %>%
colformat_double(digits = 2)
Correlation |
---|
-0.40 |
The main takeaway: sf objects are data frames, so you can use many of the functions you’ve learned in the past couple of labs on these objects. Here’s a badge!
To save an sf object to a file, use the function
st_write()
and specify at least two arguments, the
sf object you want to save and a file name in quotes
with the file extension. You’ll also need to specify
delete_layer = TRUE
which overwrites the existing file if
it already exists in your current working directory. Make sure you’ve
set your directory to the folder you want your file to be saved in. Type
in getwd()
to see your current directory and use
setwd()
to set the directory.
Let’s save sac.county.tracts as a shapefile named saccountytractslab5.shp.
st_write(sac.county.tracts, "saccountytractslab5.shp", delete_layer = TRUE)
Check your current working directory to see if saccountytractslab5.shp and its associated files were saved.
You can save your sf object in a number of different
data formats other than shp
. We won’t be concerned too much
with these other formats in this class, but you can see a list of them
here.
Now that you’ve got your spatial data in and wrangled, the next natural step is to map something. There are several functions in R that can be used for mapping. We won’t go through all of them, but GWR outlines in Table 9.1 the range of mapping packages available in R. The package we’ll rely on in this class for mapping is tmap.
The foundation function for mapping in tmap is
tm_shape()
. You then build on tm_shape()
by
adding one or more elements, all taking on the form of tm_
.
Let’s make a choropleth map of median housing values.
tm_shape(sac.county.tracts) +
tm_polygons(col = "medhval", style = "quantile")
You first put the dataset sac.county.tracts inside
tm_shape()
. Because you are plotting polygons, you use
tm_polygons()
next. If you are plotting points, you will
use tm_dots()
. If you are plotting lines, you will use
tm_lines()
. The argument col = "medhval"
tells
R to shade the tracts by the variable medhval.
tmap allows users to specify the classification style
with the style
argument. The argument
style = "quantile"
tells R to break up the shading into
quantiles, or equal groups of 5. Seven of the most useful classification
styles are described in the bullet points below (taken from GWR):
style = pretty
, the default setting, rounds breaks into
whole numbers where possible and spaces them evenlystyle = equal
divides input values into bins of equal
range, and is appropriate for variables with a uniform distribution (not
recommended for variables with a skewed distribution as the resulting
map may end-up having little color diversity)style = quantile
ensures the same number of
observations fall into each category (with the potential down side that
bin ranges can vary widely)style = jenks
identifies groups of similar values in
the data and maximizes the differences between categoriesstyle = cont
(and order
) present a large
number of colors over continuous color field, and are particularly
suited for continuous rasters (order can help visualize skewed
distributions)style = sd
divides the values by standard deviations
above and below the mean.style = cat
was designed to represent categorical
values and assures that each category receives a unique colorThe importance of choosing the appropriate classification scheme is discussed in Handout 5. You’ll get some practice trying out other classification schemes in this week’s assignment. Note that tmap is smart enough to detect the presence of missing values, and shades them gray and labels them on the legend.
You can overlay multiple features on one map. For example, we can add
park polygons on top of county tracts, providing a visual association
between parks and percent white. Here, we add another
tm_shape()
and tm_polygons()
to the above
code.
tm_shape(sac.county.tracts) +
tm_polygons(col = "medhval", style = "quantile") +
tm_shape(parks) +
tm_polygons(col = "green")
Don’t like the yellow/brown color scheme? We can change the color
scheme using the argument palette =
within
tm_polygons()
. The argument palette =
defines
the color ranges associated with the bins as determined by the
style
argument. Below we use the color scheme “Reds” using
palette = "Reds"
.
tm_shape(sac.county.tracts) +
tm_polygons(col = "medhval", style = "quantile", palette = "Reds")
See Ch. 9.2.4 in GWR for a fuller discussion on color and other schemes you can specify.
In addition to the built-in palettes, customized color ranges can be
created by specifying a vector with the desired colors as anchors. This
will create a spectrum of colors in the map that range between the
colors specified in the vector. For instance, if we used
c(“red”, “blue”)
, the color spectrum would move from red to
purple, then to blue, with in between shades. In our example:
tm_shape(sac.county.tracts) +
tm_polygons(col = "medhval", style = "quantile",palette = c("red","blue"))
Not exactly a pretty picture. In order to capture a diverging scale, we insert “white” in between red and blue.
tm_shape(sac.county.tracts) +
tm_polygons(col = "medhval", style = "quantile",palette = c("red","white", "blue"))
A preferred approach to select a color palette is to chose one of the schemes contained in the RColorBrewer package. These are based on the research of cartographer Cynthia Brewer (see the colorbrewer2 web site for details). RColorBrewer makes a distinction between sequential scales (for a scale that goes from low to high), diverging scales (to highlight how values differ from a central tendency), and qualitative scales (for categorical variables). For each scale, a series of single hue and multi-hue scales are suggested. In the RColorBrewer package, these are referred to by a name (e.g., the “Reds” palette we used above is an example). The full list is contained in the RColorBrewer documentation.
There are two very useful commands in this package. One sets a color palette by specifying its name and the number of desired categories. The result is a character vector with the hex codes of the corresponding colors.
For example, we select a sequential color scheme going from blue to
green, as BuGn, by means of the command
brewer.pal
, with the number of categories (6) and the
scheme as arguments. The resulting vector contains the HEX codes for the
colors.
pal <- brewer.pal(6,"BuGn")
pal
## [1] "#EDF8FB" "#CCECE6" "#99D8C9" "#66C2A4" "#2CA25F" "#006D2C"
Using this palette in our map yields the following result.
tm_shape(sac.county.tracts) +
tm_polygons(col = "medhval", style = "quantile", palette="BuGn")
The command display.brewer.pal()
allows us to explore
different color schemes before applying them to a map. For example:
display.brewer.pal(6,"BuGn")
There are many options to change the formatting of the legend. Often,
the automatic title for the legend is not intuitive, since it is simply
the variable name (in our case, medhval). This can be
customized by setting the title
argument in
tm_polygons()
. Let’s change the legend title to “Housing
values”
tm_shape(sac.county.tracts) +
tm_polygons(col = "medhval", style = "quantile",palette = "Reds",
title = "Housing values")
Another important aspect of the legend is its positioning. This is
handled through the tm_layout()
function. This function has
a vast number of options, as detailed in the documentation.
Also check the help documentation for tm_layout()
to see
the complete list of settings and examples in Ch. 9.2.5 in GWR. There
are also specialized subsets of layout functions, focused on specific
aspects of the map, such as tm_legend()
,
tm_style()
and tm_format()
. We illustrate the
positioning of the legend.
The default is to position the legend inside the map. Often, this
default solution is appropriate, but sometimes further control is
needed. The legend.position
argument in the
tm_layout()
function moves the legend around the map, and
it takes on a vector of two string variables that determine both the
horizontal position (“left”, “right”, or “center”) and the vertical
position (“top”, “bottom”, or “center”). The default is “right” and
“bottom”. But, we can change it to, say, top right.
tm_shape(sac.county.tracts) +
tm_polygons(col = "medhval", style = "quantile",palette = "Reds",
title = "Housing values") +
tm_layout(legend.position = c("right", "top"))
Yuck. We can leave it at the bottom right. Or there is also the
option to position the legend outside the frame of the map. This is
accomplished by setting legend.outside
to TRUE (the default
is FALSE) in tm_layout()
.
tm_shape(sac.county.tracts) +
tm_polygons(col = "medhval", style = "quantile",palette = "Reds",
title = "Housing values") +
tm_layout(legend.outside = TRUE)
We can also customize the size of the legend, its alignment, font, etc. We refer to GWR for specifics.
Another functionality of the tm_layout()
function is to
set a title for the map, and specify its position, size, etc. For
example, we can set the title using main.title
, and the
size using main.title.size
as in the example below. We made
the font size a bit smaller (0.95) in order not to overwhelm the
map.
tm_shape(sac.county.tracts) +
tm_polygons(col = "medhval", style = "quantile",palette = "Reds",
title = "Housing values") +
tm_layout(main.title = "2018-22 Median Housing Values in Sacramento County",
main.title.size = 0.95, legend.outside = TRUE)
You can change the title position using
main.title.position
. For example, we center the title by
specifying “center”.
tm_shape(sac.county.tracts) +
tm_polygons(col = "medhval", style = "quantile",palette = "Reds",
title = "Housing values") +
tm_layout(main.title = "2018-22 Median Housing Values in Sacramento County",
main.title.size = 0.95, main.title.position="center",
legend.outside = TRUE)
We need to add the other key map elements described in Handout 5.
Here is where we start adding more layout functions after
tm_polygons()
using the +
operator. First, the
scale bar, which you can add using the function
tm_scale_bar()
tm_shape(sac.county.tracts, unit = "mi") +
tm_polygons(col = "medhval", style = "quantile",palette = "Reds",
title = "Housing values") +
tm_layout(main.title = "2018-22 Median Housing Values in Sacramento County",
main.title.size = 0.95, main.title.position="center",
legend.outside = TRUE) +
tm_scale_bar(breaks = c(0, 5, 10), text.size = 0.75,
position = c("right", "bottom"))
The argument breaks
within tm_scale_bar()
tells R the distances to break up and end the bar. Make sure you use
reasonable break points - the Sacramento county area is not, for
example, 200 miles wide, so you should not use c(0,100,200)
(try it and see what happens. You won’t like it). Note that the scale is
in miles (were in America!). The default is in kilometers (the rest of
the world!), but you can specify the units within
tm_shape()
using the argument unit
. Here, we
used unit = "mi"
to designate distance in the scale bar
measured in miles. The position =
argument locates the
scale bar on the bottom right of the map. The argument
text.size =
controls the size of the scale bar. We
decrease the size by 25%.
The next element is the north arrow, which we can add using the
function tm_compass()
. You can control for the type, size
and location of the arrow within this function. We place a 4-star arrow
on the top left of the map.
tm_shape(sac.county.tracts, unit = "mi") +
tm_polygons(col = "medhval", style = "quantile",palette = "Reds",
title = "Housing values") +
tm_layout(main.title = "2018-22 Median Housing Values in Sacramento County",
main.title.size = 0.95, main.title.position="center",
legend.outside = TRUE) +
tm_scale_bar(breaks = c(0, 5, 10), text.size = 0.75,
position = c("right", "bottom")) +
tm_compass(type = "4star", position = c("left", "top"))
We can make the map prettier by changing a variety of
settings. We can eliminate the frame around the map using the argument
frame = FALSE
with tm_layout
. We also add back
the parks.
sacmhval.map <- tm_shape(sac.county.tracts, unit = "mi") +
tm_polygons(col = "medhval", style = "quantile",palette = "Reds",
title = "Housing values") +
tm_layout(main.title = "2018-22 Median Housing Values in Sacramento County",
main.title.size = 0.95, main.title.position="center",
legend.outside = TRUE, frame = FALSE, ) +
tm_scale_bar(breaks = c(0, 5, 10), text.size = 0.75,
position = c("right", "bottom")) +
tm_compass(type = "4star", position = c("left", "top")) +
tm_shape(parks) +
tm_polygons(col = "green")
sacmhval.map
Notice that we stored the map into an object called sacmhval.map. R is an object-oriented language, so everything you make in R are objects that can be stored for future manipulation. This includes maps. You should see sacmhval.map in your Environment window. By storing the map, you can access it anytime during your current R session.
Multiple map objects can be arranged in a single ‘metaplot’ with
tmap_arrange()
. For example, let’s map housing values and
percent Hispanic in Sacramento county. Let’s create the percent Hispanic
map, and save it into an object named sacphisp.map.
sacphisp.map <- tm_shape(sac.county.tracts, unit = "mi") +
tm_polygons(col = "phisp", style = "quantile",palette = "Reds",
title = "Housing values") +
tm_layout(main.title = "2018-22 Percent Hispanic in Sacramento County",
main.title.size = 0.95, main.title.position="center",
legend.outside = TRUE, frame = FALSE, ) +
tm_scale_bar(breaks = c(0, 5, 10), text.size = 0.75,
position = c("right", "bottom")) +
tm_compass(type = "4star", position = c("left", "top")) +
tm_shape(parks) +
tm_polygons(col = "green")
Then use tmap_arrange()
to map sacmhval.map and
sacphisp.map side by side
tmap_arrange(sacmhval.map, sacphisp.map)
Check the full list of tm_
elements here.
You can save your maps a couple of ways.
tmap_save()
For option 2, we can save the map object sacmhval.map as such
tmap_save(sacmhval.map, "saccountyhval.png")
Specify the tmap object and a filename with an
extension. It supports pdf
, eps
,
svg
, wmf
, png
, jpg
,
bmp
and tiff
. The default is png
.
Also make sure you’ve set your directory to the folder that you want
your map to be saved in.
So far we’ve created static maps. That is, maps that don’t “move”. But, we’re all likely used to Google or Bing maps - maps that we can move around and zoom into. You can make interactive maps in R using the package tmap.
To make your tmap object interactive, use the
function tmap_mode()
. Type in “view” inside this
function.
tmap_mode("view")
Now that the interactive mode has been ‘turned on’, all maps produced
with tm_shape()
will launch. Let’s view our saved
sacmhval.map interactively.
sacmhval.map
Click on above the map and a larger window should open up.
Besides interactivity, another important benefit of
tmap_mode()
is that it provides a basemap, which was
discussed in Handout 5. The function of a basemap is to provide
background detail necessary to orient the location of the map. In the
static maps we produced earlier, Sacramento county was sort of floating
in white space. As you can see in the interactive map above we’ve added
geographic context to the surrounding area.
The default basemap in tmap_mode()
is CartoDB.Positron.
You can change the basemap through the tm_basemap()
function. For example, let’s change the basemap to an OpenStreetMap.
sacmhval.map + tm_basemap("OpenStreetMap")