SVI Calculation - SVI Calculation for Customized Boundaries with findSVI package

We have a new function in the findSVI package, find_svi_x(), which does census data retrieval and calculate the SVI at a customized geographic level, based on its relationship to a Census geographic level). Essentially it’s a wrapper for get_census_data() with exp = TRUE and get_svi_x().

Briefly, the important elements needed for the function are:

year: Year of interest (2012-2022).
geography: a Census geographic level that the customized level is based on.
state: State of interest (optional), default as whole US.
xwalk: a crosswalk (relationship file) of a Census geography (column GEOID) and a customized geographic level (column GEOID2).
geometry: set TRUE to include spatial information, default as FALSE.

Similar with the approach mentioned in the commute zone post, we are using a pseudo-crosswalk to validate our calculation of county-level SVI in PA, as an example, against the CDC SVI for the years that they are available.

Code

devtools::install_github("heli-xu/findSVI@customized-boundaries")

Code

library(dplyr)
library(findSVI)
library(ggplot2)
library(readr)
library(tidyr)

Pseudo-crosswalk

The pseudo-crosswalk is constructed with replicating the county IDs in PA (county-county crosswalk, first 10 rows shown below):

Code

xwalk %>% head(10)

GEOID	GEOID2
42001	42001
42003	42003
42005	42005
42007	42007
42009	42009
42011	42011
42013	42013
42015	42015
42017	42017
42019	42019

2022

Code

cdc_cty_svi2022 <- read_csv("2022svi_pa_co_cdc.csv")

svi_x2022 <- find_svi_x(
  year = 2022,
  geography = "county",
  state = "PA",
  xwalk = xwalk
)

Code

cor_plot <- function(cdc_svi, svi) {
  
  join_RPL <- cdc_svi %>%
    select(GEOID = FIPS,
      cdc_RPL_themes = RPL_THEMES,
      cdc_RPL_theme1 = RPL_THEME1,
      cdc_RPL_theme2 = RPL_THEME2,
      cdc_RPL_theme3 = RPL_THEME3,
      cdc_RPL_theme4 = RPL_THEME4) %>%
    mutate(GEOID = paste(GEOID)) %>%
    left_join(svi %>%
        select(GEOID,
          RPL_themes,
          RPL_theme1,
          RPL_theme2,
          RPL_theme3,
          RPL_theme4)) %>%
    drop_na() %>%   ## remove NA rows
    filter_all(all_vars(. >= 0)) #-999 in cdc data

coeff <- cor(join_RPL$cdc_RPL_themes, join_RPL$RPL_themes)

plot <- join_RPL %>% 
  ggplot(aes(x = cdc_RPL_themes, y = RPL_themes)) +
  geom_point(color = "#004C54")+
  geom_abline(slope = 1, intercept = 0)+
  labs(
    subtitle = paste0("Comparison of overall percentile rankings (RPLs), correlation coefficient = ", coeff),
    y = "findSVI",
    x = "CDC")+
  theme(plot.title = element_text(size= 14))

return(plot)
}

cor_plot(cdc_cty_svi2022, svi_x2022)+
  labs(
    title = "CDC vs. find_svi_x() CTY-level SVI for PA, 2022"
  )

Overall findSVI calculation correlates well with CDC SVI. It’s worth noting that find_svi_x() result has slightly more variability than find_svi() that directly uses Census percentage variables, perhaps especially so in 2022 among recent years. This is because CDC SVI added more variables that are directly using ACS percentages in the latest update for 2022, and the rationale is addressed in the documentation :

” For the 2022 database, we remapped some of our EP variables directly to ACS percentage variables. This change largely meant, when possible, we favored percentage variables from the ACS Data Profile (DP) and Subject (S) tables rather than calculating from Detailed (B) table count estimates. During our analysis we found the new variable mappings improved SVI processing through simpler calculations, greater transparency, and better accuracy. Furthermore, some variable changes allowed us to use ACS-calculated margins of error rather than deriving our own. These updates follow ACS recommendations noted in their guidance document U.S. Census Bureau, Understanding and Using American Community Survey Data: What All Data Users Need to Know, U.S. Government Publishing Office, Washington, DC, 2020.”

2020

Code

cdc_cty_svi2020 <- read_csv("2020svi_pa_co_cdc.csv")

svi_x2020 <- find_svi_x(
  year = 2020,
  geography = "county",
  state = "PA",
  xwalk = xwalk
)

Code

cor_plot(cdc_cty_svi2020, svi_x2020)+
  labs(
    title = "CDC vs. find_svi_x() CTY-level SVI for PA, 2020"
  )

Before the “remap” in 2022, the results between directly retrieving ACS percent and calculating from estimate count are more comparable, with strong correlation to CDC SVI.

2018

Code

cdc_cty_svi2018 <- read_csv("2018svi_pa_co_cdc.csv")

svi_x2018 <- find_svi_x(
  year = 2018,
  geography = "county",
  state = "PA",
  xwalk = xwalk
)

Code

cor_plot(cdc_cty_svi2018, svi_x2018)+
  labs(
    title = "CDC vs. find_svi_x() CTY-level SVI for PA, 2018"
  )

2016

Code

cdc_cty_svi2016 <- read_csv("2016svi_pa_co_cdc.csv")

svi_x2016 <- find_svi_x(
  year = 2016,
  geography = "county",
  state = "PA",
  xwalk = xwalk
)

Code

cor_plot(cdc_cty_svi2016, svi_x2016)+
  labs(
    title = "CDC vs. find_svi_x() CTY-level SVI for PA, 2016"
  )

2014

Code

cdc_cty_svi2014 <- read_csv("2014svi_pa_co_cdc.csv")

svi_x2014 <- find_svi_x(
  year = 2014,
  geography = "county",
  state = "PA",
  xwalk = xwalk
)

Code

cor_plot(cdc_cty_svi2014, svi_x2014)+
  labs(
    title = "CDC vs. find_svi_x() CTY-level SVI for PA, 2014"
  )