Code
library(dplyr)
library(sf)
library(leaflet)
library(glue)
library(knitr)
Aggregating street-level data to neighborhood (ZAT) level
Heli Xu
February 28, 2024
To include more indicators at the ZAT level for our neighborhood built environment profiling, we’ll describe the process of aggregating some of street-level variables to the ZAT level.
Both the street-level and ZAT-level geometries are polygons, and the street-level ones are not always nested in the ZAT-level ones. This can pose a problem in spatial joining: while setting the criteria as one within another (st_contains
, st_within
, st_covers
) leaves many streets unmatched to any ZAT unit, setting the criteria as one has any point in common with another (st_intersects
) results in many streets matching to multiple ZAT units. For example, below is the map of randomly selected 100 streets with which multiple ZAT units are matched (purple is the border of street, blue is the border of ZAT).
calle_multi_sub_leaf <- readRDS("../../../../clean_data/calle2zat_aggr/calle_multi_sub_leaf.rds")
zat_sub_leaf <- readRDS("../../../../clean_data/calle2zat_aggr/zat_sub_leaf.rds")
leaflet() %>%
addTiles() %>%
addPolygons(data = calle_multi_sub_leaf, weight = 3, fillColor = 'purple', color = 'purple') %>%
addPolygons(data = zat_sub_leaf, weight = 3, fillColor = 'blue', color = 'blue')
As we can see from zooming in the map, many of the major roads are in fact the border of ZAT unit, and these streets will either be matched to multiple ZAT unit (as shown in the map) using the method of st_intersects
, or be matched to non of the ZAT units (missing values) using st_within
or other similar methods.
Of note, by st_within
, there are 14878 (~15%) street units with unmatched ZAT units (shown below), whereas by st_intersect
, there were 16 unmatched streets.
Potential approaches to address the caveat are:
Adding from the original data of the signals and other new features that are not found in ZAT.
Assigning street data to any ZAT that crosses them, but this could result in counting things multiple times.
Dividing the counts in street data by the number of ZATs, then allocate or distribute the divided value into each ZAT.
Here, we’ll use the third option, dividing and distributing.
We first create a column indicating the number of ZATs that each street is matched to (match_n
), append that column to the street data (calle_clean
), then divide the value for each variable of each street by match_n
. We’ll select two streets that we know are matched to multiple ZATs (CL1000 and CL4462), and show the values of a few variables before (top) and after (bottom) the dividing process.
calle_clean <- readRDS("calles/calle_clean.rds")
calle_zat_xwalk <- calle_zat2 %>% as.data.frame() %>%
select(-geometry) %>%
add_count(CodigoCL, name = "match_n")
calle_match_n <- calle_clean %>%
as.data.frame() %>%
select(-geometry) %>%
left_join(calle_zat_xwalk %>% select(-ZAT) %>% distinct(), by = "CodigoCL")
calle_n_divide <- calle_match_n %>%
select(-puente_vh, -Puente_PT) %>%
mutate(
across(-c(CodigoCL,area, match_n), ~.x/match_n, .names = "{.col}")
) # remember the "match_n"
checkCL <- c("CL1000", "CL4462")
calle_match_n %>% filter(CodigoCL %in% checkCL) %>%
select(CodigoCL, area, match_n, AVE_pendie, A_Calzada, arboles)
calle_n_divide %>% filter(CodigoCL %in% checkCL) %>%
select(CodigoCL, area, match_n, AVE_pendie, A_Calzada, arboles)
CodigoCL | area | match_n | AVE_pendie | A_Calzada | arboles |
---|---|---|---|---|---|
CL1000 | 2632.239 | 2 | 0.0221 | 1407.66 | 0 |
CL4462 | 49696.739 | 6 | 0.0478 | 15487.68 | 1146 |
CodigoCL | area | match_n | AVE_pendie | A_Calzada | arboles |
---|---|---|---|---|---|
CL1000 | 2632.239 | 2 | 0.0110500 | 703.83 | 0 |
CL4462 | 49696.739 | 6 | 0.0079667 | 2581.28 | 191 |
After this “adjusting” process, we use the resulting values for aggregation, either by taking the weighted mean or the sum.
Bridge counts (puente_vh
, puente_PT
) were excluded because they’re character values, and we can consider recoding them into 1 or 0. For aggregation, while the sum is taken for most of the street-level variables, area-weighted mean is taken for AVE_pendie
, P_Ancho_Cl
, sent_vial
, velcidad
.
calle2zat_df <- calle_zat_xwalk %>%
select(-match_n) %>%
left_join(calle_n_divide, by = "CodigoCL") %>%
group_by(ZAT) %>%
summarise(
across(c(AVE_pendie, P_Ancho_Cl, sent_vial, velcidad), ~weighted.mean(.x, w=area), .names = "{.col}"),
across(-c(CodigoCL, AVE_pendie, P_Ancho_Cl, sent_vial, velcidad), ~sum(.x), .names = "{.col}")
)
After aggregation, we can join the data with ZAT-level geometry (code not included here, refer to initial look post) to create a geo-referenced database for downstream modeling.
We still need to normalize the aggregated values by area, and perhaps take natural logs for some variables. But here, as an example, we’ll map the aggregated average road width (P_Ancho_Cl
) and area of sidewalks (A_andenes
) at the ZAT level.
calle2zat <- readRDS("../../../../clean_data/calle2zat_aggr/calle2zat.rds")
pal <- colorNumeric(
palette = "BrBG",
domain = calle2zat$P_Ancho_Cl
)
zat_label <- glue("ZAT{calle2zat$ZAT} Road width: {calle2zat$P_Ancho_Cl}")
calle2zat %>%
select(ZAT, P_Ancho_Cl) %>%
st_zm() %>%
st_transform(crs = st_crs("+proj=longlat +datum=WGS84")) %>%
leaflet() %>%
addProviderTiles(providers$CartoDB.Voyager) %>%
addPolygons(color = "white",
weight = 0.5,
smoothFactor = 0.5,
opacity = 1,
fillColor = ~pal(P_Ancho_Cl),
fillOpacity = 0.8,
highlightOptions = highlightOptions(
weight = 5,
color = "#666",
fillOpacity = 0.8,
bringToFront = TRUE),
label = zat_label,
labelOptions = labelOptions(
style = list(
"font-family" = "Fira Sans, sans-serif",
"font-size" = "1.2em"
))
)%>%
addLegend("bottomleft",
pal = pal,
values = ~P_Ancho_Cl,
title = "Average Road Width in Bogotá</br>(ZAT-level)",
opacity = 1)
pal2 <- colorNumeric(
palette = "YlGnBu",
domain = calle2zat$A_andenes
)
zat_label2 <- glue("ZAT{calle2zat$ZAT} Sidewalk area: {calle2zat$A_andenes}")
calle2zat %>%
select(ZAT, A_andenes) %>%
st_zm() %>%
st_transform(crs = st_crs("+proj=longlat +datum=WGS84")) %>%
leaflet() %>%
addProviderTiles(providers$CartoDB.Voyager) %>%
addPolygons(color = "grey",
weight = 0.3,
smoothFactor = 0.5,
opacity = 1,
fillColor = ~pal2(A_andenes),
fillOpacity = 0.8,
highlightOptions = highlightOptions(
weight = 5,
color = "#666",
fillOpacity = 0.8,
bringToFront = TRUE),
label = zat_label2,
labelOptions = labelOptions(
style = list(
"font-family" = "Fira Sans, sans-serif",
"font-size" = "1.2em"
))
)%>%
addLegend("bottomleft",
pal = pal2,
values = ~A_andenes,
title = "Sidewalk Areas in Bogotá</br>(ZAT-level)",
opacity = 1)