Measures of segregation and other indices of place-based inequality have been fundamental to documenting and understanding the causes and consequences of residential patterns of racial separation. In this guide you will learn how to calculate neighborhood segregation and using R. The objectives of the guide are as follows
To accomplish these objectives, you will be working with Census tract data for four of the largest cities in California: Fresno, San Diego, San Jose, and San Francisco.
This lab guide follows closely and supplements the material presented in Chapters 4.1 and 4.2 in the textbook Geocomputation with R (GWR) and Handout 7.
Assignment 7 is due by 10:00 am, February 28th on
Canvas. See here for
assignment guidelines. You must submit an .Rmd
file and its
associated .html
file. Name the files:
yourLastName_firstInitial_asgn07. For example: brazil_n_asgn07.
Download the Lab
template into an appropriate folder on your hard drive (preferably,
a folder named ‘Lab 7’), open it in R Studio, and type and run your code
there. The template is also located on Canvas under Files. The template
is also located on Canvas under Files. Change the title (“Lab 7”) and
insert your name and date. Don’t change anything else inside the YAML
(the stuff at the top in between the ---
). Also keep the
grey chunk after the YAML. For a rundown on the use of R Markdown, see
the assignment
guidelines.
We will not be using any new packages in this lab. You’ll need to
load the following packages. Unlike installing, you will always need to
load packages whenever you start a new R session. As such, you’ll always
need to use library()
in your R Markdown file.
library(sf)
library(tidyverse)
library(tidycensus)
library(tigris)
library(tmap)
library(rmapshaper)
library(flextable)
The following code uses the Census API to bring in demographic tract-level data for four of the most populated cities in California: San Diego, San Jose, San Francisco, and Fresno. We won’t go through each line of code in detail because we’ve covered all of these operations and functions in prior labs. We’ve embedded comments within the code that briefly explains what each chunk is doing. Go back to prior guides (or RDS/GWR) if you need further help.
# Bring in 2016-2020 census tract data using the Census API
ca.tracts <- get_acs(geography = "tract",
year = 2022,
variables = c(tpop = "B03002_001",
nhwhite = "B03002_003", nhblk = "B03002_004",
nhasn = "B03002_006", hisp = "B03002_012"),
state = "CA",
survey = "acs5",
output = "wide",
geometry = TRUE)
# Calculate, rename and keep essential vars.
ca.tracts <- ca.tracts %>%
mutate(pnhwhite = 100*(nhwhiteE/tpopE), pnhasn = 100*(nhasnE/tpopE),
pnhblk = 100*(nhblkE/tpopE), phisp = 100*(hispE/tpopE)) %>%
rename(nhwhite = nhwhiteE, nhasn = nhasnE, nhblk = nhblkE,
hisp = hispE, tpop = tpopE) %>%
select(GEOID,tpop, pnhwhite, pnhasn, pnhblk, phisp,
nhwhite, nhasn, nhblk, hisp)
# Bring in city boundaries
pl <- places(state = "CA", year = 2022, cb = TRUE)
# Keep four large cities in CA
large.cities <- filter(pl, NAME == "San Diego" |
NAME == "San Jose" | NAME == "San Francisco" |
NAME == "Fresno")
#Clip tracts in large cities
large.tracts <- ms_clip(target = ca.tracts,
clip = large.cities, remove_slivers = TRUE)
Make sure to take a look at the final outcome.
glimpse(large.tracts)
## Rows: 961
## Columns: 11
## $ GEOID <chr> "06019003804", "06019004405", "06019004801", "06019000501", "…
## $ tpop <dbl> 6865, 3575, 4442, 2990, 7903, 4435, 2922, 4565, 8409, 6178, 5…
## $ pnhwhite <dbl> 11.6970138, 66.7412587, 23.7055380, 7.3244147, 15.1841073, 5.…
## $ pnhasn <dbl> 26.365623, 6.237762, 2.678973, 4.347826, 22.611666, 41.984216…
## $ pnhblk <dbl> 8.8565186, 2.3776224, 8.5997299, 15.7525084, 6.6683538, 3.607…
## $ phisp <dbl> 49.395484, 22.349650, 60.423233, 70.936455, 49.867139, 47.170…
## $ nhasn <dbl> 1810, 223, 119, 130, 1787, 1862, 907, 843, 4229, 3790, 1785, …
## $ nhwhite <dbl> 803, 2386, 1053, 219, 1200, 230, 1336, 438, 2514, 276, 18, 12…
## $ nhblk <dbl> 608, 85, 382, 471, 527, 160, 67, 7, 405, 230, 41, 23, 0, 435,…
## $ hisp <dbl> 3391, 799, 2684, 2121, 3941, 2092, 409, 3115, 452, 1655, 3748…
## $ geometry <POLYGON [°]> POLYGON ((-119.8668 36.7863..., POLYGON ((-119.7795 3…
The object large.tracts contains the census tracts located in the four cities. When you view the dataset, you’ll notice that we don’t have any variable indicating which city each tract belongs to. We need the city identifier to calculate segregation for each city. The city GEOID and NAME are in the object large.cities, which we will need to append to each tract in the object large.tracts.
We do this by using the st_join()
function, which is a
part of the sf package. The function will join the
variables from large.cities to large.tracts based on
geographic location. That is, if a tract is located within a city, that
city’s values from large.cities will be appended to that
tract.
First, look at the variables already in large.tracts.
names(large.tracts)
## [1] "GEOID" "tpop" "pnhwhite" "pnhasn" "pnhblk" "phisp"
## [7] "nhasn" "nhwhite" "nhblk" "hisp" "geometry"
Then look at the variables in large.cities
names(large.cities)
## [1] "STATEFP" "PLACEFP" "PLACENS" "AFFGEOID" "GEOID"
## [6] "NAME" "NAMELSAD" "STUSPS" "STATE_NAME" "LSAD"
## [11] "ALAND" "AWATER" "geometry"
Then st_join()
large.tracts <- large.tracts %>%
st_join(large.cities)
This function joins the variables from large.cities to the object large.tracts.
names(large.tracts)
## [1] "GEOID.x" "tpop" "pnhwhite" "pnhasn" "pnhblk"
## [6] "phisp" "nhasn" "nhwhite" "nhblk" "hisp"
## [11] "STATEFP" "PLACEFP" "PLACENS" "AFFGEOID" "GEOID.y"
## [16] "NAME" "NAMELSAD" "STUSPS" "STATE_NAME" "LSAD"
## [21] "ALAND" "AWATER" "geometry"
Note that when the two files have the same variable names, R attaches .x and .y to the end of the variable names such as GEOID.x and GEOID.y, which represent the tract and city GEOIDs, respectively.
We don’t need all of these new variables, so let’s use
select()
to remove the variables we don’t need.
large.tracts <- large.tracts %>%
select(-(STATEFP:AFFGEOID), -(NAMELSAD:AWATER))
Make sure we’ve kept the variables we need
names(large.tracts)
## [1] "GEOID.x" "tpop" "pnhwhite" "pnhasn" "pnhblk" "phisp"
## [7] "nhasn" "nhwhite" "nhblk" "hisp" "GEOID.y" "NAME"
## [13] "geometry"
Before calculating segregation, you should map neighborhood racial/ethnic composition in order to gain a visual understanding of how race/ethnic groups are spatially distributed in your study region. For example, let’s map percent Hispanic in Fresno.
large.tracts %>%
filter(NAME == "Fresno") %>%
tm_shape(unit = "mi") +
tm_polygons(col = "phisp", style = "quantile",palette = "Reds",
border.alpha = 0, title = "") +
tm_scale_bar(breaks = c(0, 1, 2), text.size = 0.75, position = c("right", "bottom")) +
tm_compass(type = "4star", position = c("left", "top")) +
tm_layout(main.title = "Percent Hispanic in Fresno City Tracts",
main.title.size = 0.9, frame = FALSE)
How does this spatial distribution compare to percent non-Hispanic white?
large.tracts %>%
filter(NAME == "Fresno") %>%
tm_shape(unit = "mi") +
tm_polygons(col = "pnhwhite", style = "quantile",palette = "Reds",
border.alpha = 0, title = "") +
tm_scale_bar(breaks = c(0, 1, 2), text.size = 0.75, position = c("right", "bottom")) +
tm_compass(type = "4star", position = c("left", "top")) +
tm_layout(main.title = "Percent White in Fresno City Tracts",
main.title.size = 0.9,
frame = FALSE)
It looks like a North/South divide. Map the other two race/ethnic groups in Fresno and all the groups in the other three cities.
The most common measure of residential evenness is the Dissimilarity Index D. To calculate D, we’ll follow the Dissimilarity index formula on page 3 of Handout 7. We will calculate Black/White, Hispanic/White, and Asian/White Dissimilarity.
We already have the values \(t_{im}\), and \(t_{ik}\), which is the total population of
race/ethnic group \(m\) and \(k\) in each census tract. But we don’t have
the total population of race/ethnic group \(m\) and \(k\) for each city. This is the value \(T_m\) and \(T_k\) in the formula. To calculate these
values, we use the group_by()
and mutate()
functions.
large.tracts <- large.tracts %>%
group_by(NAME) %>%
mutate(nhwhitec = sum(nhwhite), nhasnc = sum(nhasn),
nhblkc = sum(nhblk), hispc = sum(hisp),
tpopc = sum(tpop)) %>%
ungroup()
We already covered group_by()
in Lab 4,
but as a reminder, the group_by()
function tells R that all
future functions on large.tracts will be grouped according to
the variable NAME, which is the city name. We use the
sum()
function within the mutate()
function to
sum up, for example, the non-Hispanic white population nhwhite
for each city. We name this variable nhwhitec. If you type in
View(large.tracts)
, you should find that the variable
nhwhitec provides the same value for all tracts within the same
city. We do this for all the other race/ethnic groups.
The function ungroup()
at the end of the code tells R to
stop the grouping. It’s always good practice to ungroup()
a
data set if you are saving it for future use (rather than using it as a
summary table as we’ve been doing so far in the class).
Now we can calculate the rest of the formula, breaking it down piece-by-piece like we did in the handout and in lecture.
large.tracts %>%
group_by(NAME) %>%
mutate(d.wb = abs(nhblk/nhblkc-nhwhite/nhwhitec),
d.wa = abs(nhasn/nhasnc-nhwhite/nhwhitec),
d.wh = abs(hisp/hispc-nhwhite/nhwhitec)) %>%
summarize(BWD = 0.5*sum(d.wb, na.rm=TRUE), AWD = 0.5*sum(d.wa, na.rm=TRUE),
HWD = 0.5*sum(d.wh, na.rm=TRUE)) %>%
ungroup()
## Simple feature collection with 4 features and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -123.0139 ymin: 32.53486 xmax: -116.9057 ymax: 37.86334
## Geodetic CRS: NAD83
## # A tibble: 4 × 5
## NAME BWD AWD HWD geometry
## <chr> <dbl> <dbl> <dbl> <MULTIPOLYGON [°]>
## 1 Fresno 0.463 0.423 0.388 (((-119.8897 36.67738, -119.8895 36.68476, -1…
## 2 San Diego 0.555 0.483 0.509 (((-117.1706 32.7007, -117.1671 32.69915, -11…
## 3 San Francisco 0.546 0.404 0.402 (((-122.3885 37.7897, -122.3923 37.79389, -12…
## 4 San Jose 0.466 0.472 0.468 (((-121.8237 37.20721, -121.8184 37.20481, -1…
Let’s break the code down so we’re all on the same page.
group_by()
because we want to calculate
Dissimilarity for each city, which is indicated by the variable
NAME.mutate()
to calculate the tract level
contributions to the index, i.e. the value \(\left|\frac{t_{rm}}{T_m} -
\frac{t_{rk}}{T_k}\right|\) in Equation 1 in Handout 7 for each
neighborhood \(i\).summarize()
to finish the rest of the
job. Within summarize()
, we use the function
sum()
to add the neighborhood specific values in Equation 1
in Handout 7. In other words, sum()
is performing the \(\sum\limits_{i}^{N}\) that adds up \(\left|\frac{t_{rm}}{T_m} -
\frac{t_{rk}}{T_k}\right|\).The resulting values provide the Dissimilarity indices for Black/White (BWD), Asian/White (AWD), and Hispanic/White (HWD). In all of these cases, we calculate segregation from white residents, but you can calculate segregation for any race/ethnicity combination (e.g. Black/Hispanic). Instead of just copying and pasting the chunk of code above into your console, make sure you understand what each line of code is doing. Not only will it help you become a more seasoned R coder, but it will also help you better understand the underlying math behind the Dissimilarity index.
The results table we got above is a little messy. Let’s clean it up
by doing three things: (1) Drop the geometry column using
st_drop_geometry()
, which is a part of the
sf package, thus making the object
large.tracts no longer spatial; (2) use the
flextable()
function to make a nicely formatted table; and
(3) save the resulting table in an object we named dis.table.
The st_drop_geometry()
function removes the
geometry variable, and thus makes the object
large.tracts no longer spatial. We save the table into an
object named dis.table
dis.table <- large.tracts %>%
group_by(NAME) %>%
mutate(d.wb = abs(nhblk/nhblkc-nhwhite/nhwhitec),
d.wa = abs(nhasn/nhasnc-nhwhite/nhwhitec),
d.wh = abs(hisp/hispc-nhwhite/nhwhitec)) %>%
summarize(BWD = 0.5*sum(d.wb, na.rm=TRUE), AWD = 0.5*sum(d.wa, na.rm=TRUE),
HWD = 0.5*sum(d.wh, na.rm=TRUE)) %>%
ungroup() %>%
st_drop_geometry() %>%
flextable()
dis.table %>%
colformat_double(j = c("BWD", "AWD", "HWD"), digits = 3)
NAME | BWD | AWD | HWD |
---|---|---|---|
Fresno | 0.463 | 0.423 | 0.388 |
San Diego | 0.555 | 0.483 | 0.509 |
San Francisco | 0.546 | 0.404 | 0.402 |
San Jose | 0.466 | 0.472 | 0.468 |
Looks much better. The Dissimilarity index for Black/White in Fresno is 0.463. The interpretation of this value is that 46.3% of black residents would need to move neighborhoods in order to achieve a uniform distribution of black and white residents across neighborhoods in the city.
The most common measure of exposure is the Interaction Index \(P^*\). Let’s calculate the exposure of black (BWI), Asian (AWI), Hispanic (HWI) residents to white residents using the formula on page 6 of Handout 7.
int.table <-large.tracts %>%
group_by(NAME) %>%
mutate(i.wb = (nhblk/nhblkc)*(nhwhite/tpop),
i.wa = (nhasn/nhasnc)*(nhwhite/tpop),
i.wh = (hisp/hispc)*(nhwhite/tpop)) %>%
summarize(BWI = sum(i.wb, na.rm=TRUE), AWI = sum(i.wa, na.rm=TRUE),
HWI = sum(i.wh, na.rm=TRUE)) %>%
ungroup() %>%
st_drop_geometry() %>%
flextable()
Look at the Interaction index equation in Handout 7. The
mutate()
function is creating the tract specific values
\(\frac{t_{im}}{T_m} *
\frac{t_{ik}}{t_i}\). We then turn to summarize()
to
perform the \(\sum\limits_{i}^{N}\).
We present the results in a nice table using the function
flextable()
.
int.table %>%
colformat_double(j = c("BWI", "AWI", "HWI"), digits = 3)
NAME | BWI | AWI | HWI |
---|---|---|---|
Fresno | 0.208 | 0.238 | 0.213 |
San Diego | 0.304 | 0.357 | 0.282 |
San Francisco | 0.290 | 0.305 | 0.332 |
San Jose | 0.251 | 0.205 | 0.204 |
The probability of a Black resident “interacting” with a white person in his or her neighborhood is about 20.8% in Fresno. We can also interpret this to mean that 21 of every 100 people a Black person meets in his or her neighborhood will be white. Remember that interaction is not symmetric. Calculate the interaction of white residents with Black residents in the other cities and see if there are major differences with the values we calculated above.
The Dissimilarity and Interaction indices are city-level indices. In the handout, we covered one neighborhood-level measure: Location Quotient for Racial Residential Segregation (LQRSS), which captures neighborhood racial/ethnic concentration.
Let’s zoom into the City of Fresno and calculate the LQRSS for each
of its tracts. First, keep Fresno tracts from large.tracts
using the filter()
command and calculate the LQRSS for
blacks, Asians, Hispanics, and whites using equation (3) in this week’s
handout.
fresno.tracts <- large.tracts %>%
filter(NAME == "Fresno") %>%
mutate(blklq = (nhblk/tpop)/(nhblkc/tpopc),
asnlq = (nhasn/tpop)/(nhasnc/tpopc),
hisplq = (hisp/tpop)/(hispc/tpopc),
whitelq = (nhwhite/tpop)/(nhwhitec/tpopc))
The census tract with GEOID of 06019004217 has a black LQ of 3.96. In your own words, what does this value represent?
You can visualize the distribution using a histogram (or boxplot). For example, a histogram of the black LQ looks like
fresno.tracts %>%
ggplot() +
geom_histogram(mapping = aes(x=blklq), na.rm=TRUE) +
xlab("Black Location Quotient")
The skewness of the distribution indicates significant concentration of the black population in Fresno. We can also map the LQRSS. Let’s use the viewing feature in tmap so we can zoom in and out, and identify the GEOIDs with the tracts with high or low Black location quotients.
tmap_mode("view")
tm_shape(fresno.tracts, unit = "mi") +
tm_polygons(col = "blklq", style = "quantile",palette = "Reds",
border.alpha = 0, title = "Black Location Quotient")
The map indicates that there are some neighborhoods in the city that have a percent black population that is as high as 4 times the overall percent black population in the city.
This
work is licensed under a
Creative
Commons Attribution-NonCommercial 4.0 International License.
Website created and maintained by Noli Brazil