SVI Calculation for Customized Boundaries with findSVI package

Validating find_svi_x()

Author

Heli Xu

Published

January 24, 2027

We have a new function in the findSVI package, find_svi_x(), which does census data retrieval and calculate the SVI at a customized geographic level, based on its relationship to a Census geographic level). Essentially it’s a wrapper for get_census_data() with exp = TRUE and get_svi_x().

Briefly, the important elements needed for the function are:

Similar with the approach mentioned in the commute zone post, we are using a pseudo-crosswalk to validate our calculation of county-level SVI in PA, as an example, against the CDC SVI for the years that they are available.

Code
devtools::install_github("heli-xu/findSVI@customized-boundaries")
Code
library(dplyr)
library(findSVI)
library(ggplot2)
library(readr)
library(tidyr)

Pseudo-crosswalk

The pseudo-crosswalk is constructed with replicating the county IDs in PA (county-county crosswalk, first 10 rows shown below):

Code
xwalk %>% head(10)
GEOID GEOID2
42001 42001
42003 42003
42005 42005
42007 42007
42009 42009
42011 42011
42013 42013
42015 42015
42017 42017
42019 42019

2022

Code
cdc_cty_svi2022 <- read_csv("2022svi_pa_co_cdc.csv")

svi_x2022 <- find_svi_x(
  year = 2022,
  geography = "county",
  state = "PA",
  xwalk = xwalk
)
Code
cor_plot <- function(cdc_svi, svi) {
  
  join_RPL <- cdc_svi %>%
    select(GEOID = FIPS,
      cdc_RPL_themes = RPL_THEMES,
      cdc_RPL_theme1 = RPL_THEME1,
      cdc_RPL_theme2 = RPL_THEME2,
      cdc_RPL_theme3 = RPL_THEME3,
      cdc_RPL_theme4 = RPL_THEME4) %>%
    mutate(GEOID = paste(GEOID)) %>%
    left_join(svi %>%
        select(GEOID,
          RPL_themes,
          RPL_theme1,
          RPL_theme2,
          RPL_theme3,
          RPL_theme4)) %>%
    drop_na() %>%   ## remove NA rows
    filter_all(all_vars(. >= 0)) #-999 in cdc data

coeff <- cor(join_RPL$cdc_RPL_themes, join_RPL$RPL_themes)

plot <- join_RPL %>% 
  ggplot(aes(x = cdc_RPL_themes, y = RPL_themes)) +
  geom_point(color = "#004C54")+
  geom_abline(slope = 1, intercept = 0)+
  labs(
    subtitle = paste0("Comparison of overall percentile rankings (RPLs), correlation coefficient = ", coeff),
    y = "findSVI",
    x = "CDC")+
  theme(plot.title = element_text(size= 14))

return(plot)
}

cor_plot(cdc_cty_svi2022, svi_x2022)+
  labs(
    title = "CDC vs. find_svi_x() CTY-level SVI for PA, 2022"
  )

Overall findSVI calculation correlates well with CDC SVI. It’s worth noting that find_svi_x() result has slightly more variability than find_svi() that directly uses Census percentage variables, perhaps especially so in 2022 among recent years. This is because CDC SVI added more variables that are directly using ACS percentages in the latest update for 2022, and the rationale is addressed in the documentation :

” For the 2022 database, we remapped some of our EP variables directly to ACS percentage variables. This change largely meant, when possible, we favored percentage variables from the ACS Data Profile (DP) and Subject (S) tables rather than calculating from Detailed (B) table count estimates. During our analysis we found the new variable mappings improved SVI processing through simpler calculations, greater transparency, and better accuracy. Furthermore, some variable changes allowed us to use ACS-calculated margins of error rather than deriving our own. These updates follow ACS recommendations noted in their guidance document U.S. Census Bureau, Understanding and Using American Community Survey Data: What All Data Users Need to Know, U.S. Government Publishing Office, Washington, DC, 2020.”

2020

Code
cdc_cty_svi2020 <- read_csv("2020svi_pa_co_cdc.csv")

svi_x2020 <- find_svi_x(
  year = 2020,
  geography = "county",
  state = "PA",
  xwalk = xwalk
)
Code
cor_plot(cdc_cty_svi2020, svi_x2020)+
  labs(
    title = "CDC vs. find_svi_x() CTY-level SVI for PA, 2020"
  )

Before the “remap” in 2022, the results between directly retrieving ACS percent and calculating from estimate count are more comparable, with strong correlation to CDC SVI.

2018

Code
cdc_cty_svi2018 <- read_csv("2018svi_pa_co_cdc.csv")

svi_x2018 <- find_svi_x(
  year = 2018,
  geography = "county",
  state = "PA",
  xwalk = xwalk
)
Code
cor_plot(cdc_cty_svi2018, svi_x2018)+
  labs(
    title = "CDC vs. find_svi_x() CTY-level SVI for PA, 2018"
  )

2016

Code
cdc_cty_svi2016 <- read_csv("2016svi_pa_co_cdc.csv")

svi_x2016 <- find_svi_x(
  year = 2016,
  geography = "county",
  state = "PA",
  xwalk = xwalk
)
Code
cor_plot(cdc_cty_svi2016, svi_x2016)+
  labs(
    title = "CDC vs. find_svi_x() CTY-level SVI for PA, 2016"
  )

2014

Code
cdc_cty_svi2014 <- read_csv("2014svi_pa_co_cdc.csv")

svi_x2014 <- find_svi_x(
  year = 2014,
  geography = "county",
  state = "PA",
  xwalk = xwalk
)
Code
cor_plot(cdc_cty_svi2014, svi_x2014)+
  labs(
    title = "CDC vs. find_svi_x() CTY-level SVI for PA, 2014"
  )