Code
::install_github("heli-xu/findSVI@customized-boundaries") devtools
Validating find_svi_x()
Heli Xu
January 24, 2027
We have a new function in the findSVI package, find_svi_x()
, which does census data retrieval and calculate the SVI at a customized geographic level, based on its relationship to a Census geographic level). Essentially it’s a wrapper for get_census_data()
with exp = TRUE
and get_svi_x()
.
Briefly, the important elements needed for the function are:
year
: Year of interest (2012-2022).
geography
: a Census geographic level that the customized level is based on.
state
: State of interest (optional), default as whole US.
xwalk
: a crosswalk (relationship file) of a Census geography (column GEOID
) and a customized geographic level (column GEOID2
).
geometry
: set TRUE to include spatial information, default as FALSE.
Similar with the approach mentioned in the commute zone post, we are using a pseudo-crosswalk to validate our calculation of county-level SVI in PA, as an example, against the CDC SVI for the years that they are available.
The pseudo-crosswalk is constructed with replicating the county IDs in PA (county-county crosswalk, first 10 rows shown below):
cor_plot <- function(cdc_svi, svi) {
join_RPL <- cdc_svi %>%
select(GEOID = FIPS,
cdc_RPL_themes = RPL_THEMES,
cdc_RPL_theme1 = RPL_THEME1,
cdc_RPL_theme2 = RPL_THEME2,
cdc_RPL_theme3 = RPL_THEME3,
cdc_RPL_theme4 = RPL_THEME4) %>%
mutate(GEOID = paste(GEOID)) %>%
left_join(svi %>%
select(GEOID,
RPL_themes,
RPL_theme1,
RPL_theme2,
RPL_theme3,
RPL_theme4)) %>%
drop_na() %>% ## remove NA rows
filter_all(all_vars(. >= 0)) #-999 in cdc data
coeff <- cor(join_RPL$cdc_RPL_themes, join_RPL$RPL_themes)
plot <- join_RPL %>%
ggplot(aes(x = cdc_RPL_themes, y = RPL_themes)) +
geom_point(color = "#004C54")+
geom_abline(slope = 1, intercept = 0)+
labs(
subtitle = paste0("Comparison of overall percentile rankings (RPLs), correlation coefficient = ", coeff),
y = "findSVI",
x = "CDC")+
theme(plot.title = element_text(size= 14))
return(plot)
}
cor_plot(cdc_cty_svi2022, svi_x2022)+
labs(
title = "CDC vs. find_svi_x() CTY-level SVI for PA, 2022"
)
Overall findSVI calculation correlates well with CDC SVI. It’s worth noting that find_svi_x()
result has slightly more variability than find_svi()
that directly uses Census percentage variables, perhaps especially so in 2022 among recent years. This is because CDC SVI added more variables that are directly using ACS percentages in the latest update for 2022, and the rationale is addressed in the documentation :
” For the 2022 database, we remapped some of our EP variables directly to ACS percentage variables. This change largely meant, when possible, we favored percentage variables from the ACS Data Profile (DP) and Subject (S) tables rather than calculating from Detailed (B) table count estimates. During our analysis we found the new variable mappings improved SVI processing through simpler calculations, greater transparency, and better accuracy. Furthermore, some variable changes allowed us to use ACS-calculated margins of error rather than deriving our own. These updates follow ACS recommendations noted in their guidance document U.S. Census Bureau, Understanding and Using American Community Survey Data: What All Data Users Need to Know, U.S. Government Publishing Office, Washington, DC, 2020.”
Before the “remap” in 2022, the results between directly retrieving ACS percent and calculating from estimate count are more comparable, with strong correlation to CDC SVI.