Expanding the Neighborhood-Level Database

Aggregating street-level data to neighborhood (ZAT) level

Author

Heli Xu

Published

February 28, 2024

To include more indicators at the ZAT level for our neighborhood built environment profiling, we’ll describe the process of aggregating some of street-level variables to the ZAT level.

Code
library(dplyr)
library(sf)
library(leaflet)
library(glue)
library(knitr)

Caveat in Street-ZAT Spatial Join

Both the street-level and ZAT-level geometries are polygons, and the street-level ones are not always nested in the ZAT-level ones. This can pose a problem in spatial joining: while setting the criteria as one within another (st_contains, st_within, st_covers) leaves many streets unmatched to any ZAT unit, setting the criteria as one has any point in common with another (st_intersects) results in many streets matching to multiple ZAT units. For example, below is the map of randomly selected 100 streets with which multiple ZAT units are matched (purple is the border of street, blue is the border of ZAT).

Code
calle_multi_sub_leaf <- readRDS("../../../../clean_data/calle2zat_aggr/calle_multi_sub_leaf.rds")
zat_sub_leaf <- readRDS("../../../../clean_data/calle2zat_aggr/zat_sub_leaf.rds")

leaflet() %>% 
  addTiles() %>% 
  addPolygons(data = calle_multi_sub_leaf, weight = 3, fillColor = 'purple', color = 'purple') %>% 
  addPolygons(data = zat_sub_leaf, weight = 3, fillColor = 'blue', color = 'blue') 

As we can see from zooming in the map, many of the major roads are in fact the border of ZAT unit, and these streets will either be matched to multiple ZAT unit (as shown in the map) using the method of st_intersects, or be matched to non of the ZAT units (missing values) using st_within or other similar methods.

Of note, by st_within, there are 14878 (~15%) street units with unmatched ZAT units (shown below), whereas by st_intersect, there were 16 unmatched streets.

Divide and Distribute

Potential approaches to address the caveat are:

  • Adding from the original data of the signals and other new features that are not found in ZAT.

  • Assigning street data to any ZAT that crosses them, but this could result in counting things multiple times.

  • Dividing the counts in street data by the number of ZATs, then allocate or distribute the divided value into each ZAT.

Here, we’ll use the third option, dividing and distributing.

We first create a column indicating the number of ZATs that each street is matched to (match_n), append that column to the street data (calle_clean), then divide the value for each variable of each street by match_n. We’ll select two streets that we know are matched to multiple ZATs (CL1000 and CL4462), and show the values of a few variables before (top) and after (bottom) the dividing process.

Code
calle_clean <- readRDS("calles/calle_clean.rds")

calle_zat_xwalk <- calle_zat2 %>% as.data.frame() %>%
  select(-geometry) %>% 
  add_count(CodigoCL, name = "match_n") 

calle_match_n <- calle_clean %>% 
  as.data.frame() %>% 
  select(-geometry) %>% 
  left_join(calle_zat_xwalk %>% select(-ZAT) %>% distinct(), by = "CodigoCL")

calle_n_divide <- calle_match_n %>% 
  select(-puente_vh, -Puente_PT) %>% 
  mutate(
    across(-c(CodigoCL,area, match_n), ~.x/match_n, .names = "{.col}")
    ) # remember the "match_n"

checkCL <- c("CL1000", "CL4462")

calle_match_n %>% filter(CodigoCL %in% checkCL) %>% 
  select(CodigoCL, area, match_n, AVE_pendie, A_Calzada, arboles)

calle_n_divide %>% filter(CodigoCL %in% checkCL) %>% 
  select(CodigoCL, area, match_n, AVE_pendie, A_Calzada, arboles)
CodigoCL area match_n AVE_pendie A_Calzada arboles
CL1000 2632.239 2 0.0221 1407.66 0
CL4462 49696.739 6 0.0478 15487.68 1146
CodigoCL area match_n AVE_pendie A_Calzada arboles
CL1000 2632.239 2 0.0110500 703.83 0
CL4462 49696.739 6 0.0079667 2581.28 191

After this “adjusting” process, we use the resulting values for aggregation, either by taking the weighted mean or the sum.

Note

Bridge counts (puente_vh, puente_PT) were excluded because they’re character values, and we can consider recoding them into 1 or 0. For aggregation, while the sum is taken for most of the street-level variables, area-weighted mean is taken for AVE_pendie, P_Ancho_Cl, sent_vial, velcidad.

Code
calle2zat_df <- calle_zat_xwalk %>% 
  select(-match_n) %>% 
  left_join(calle_n_divide, by = "CodigoCL") %>% 
  group_by(ZAT) %>% 
  summarise(
    across(c(AVE_pendie, P_Ancho_Cl, sent_vial, velcidad), ~weighted.mean(.x, w=area), .names = "{.col}"),
    across(-c(CodigoCL, AVE_pendie, P_Ancho_Cl, sent_vial, velcidad), ~sum(.x), .names = "{.col}")
  )

After aggregation, we can join the data with ZAT-level geometry (code not included here, refer to initial look post) to create a geo-referenced database for downstream modeling.

Visualize the Aggregated Data

We still need to normalize the aggregated values by area, and perhaps take natural logs for some variables. But here, as an example, we’ll map the aggregated average road width (P_Ancho_Cl) and area of sidewalks (A_andenes) at the ZAT level.

Average Road Width

Code
calle2zat <- readRDS("../../../../clean_data/calle2zat_aggr/calle2zat.rds")

pal <- colorNumeric(
  palette = "BrBG",
  domain = calle2zat$P_Ancho_Cl
)

zat_label <- glue("ZAT{calle2zat$ZAT} Road width: {calle2zat$P_Ancho_Cl}")

calle2zat %>% 
  select(ZAT, P_Ancho_Cl) %>% 
  st_zm() %>%
  st_transform(crs = st_crs("+proj=longlat +datum=WGS84")) %>%
  leaflet() %>%
  addProviderTiles(providers$CartoDB.Voyager)  %>%
  addPolygons(color = "white", 
              weight = 0.5,
              smoothFactor = 0.5,
              opacity = 1,
              fillColor = ~pal(P_Ancho_Cl),
              fillOpacity = 0.8,
              highlightOptions = highlightOptions(
                weight = 5,
                color = "#666",
                fillOpacity = 0.8,
                bringToFront = TRUE),
              label = zat_label,
              labelOptions = labelOptions(
                style = list(
                  "font-family" = "Fira Sans, sans-serif",
                  "font-size" = "1.2em"
                ))
              )%>% 
  addLegend("bottomleft",
            pal = pal,
            values = ~P_Ancho_Cl,
            title = "Average Road Width in Bogotá</br>(ZAT-level)",
            opacity = 1)

Sidewalk Areas

Code
pal2 <- colorNumeric(
  palette = "YlGnBu",
  domain = calle2zat$A_andenes
)

zat_label2 <- glue("ZAT{calle2zat$ZAT} Sidewalk area: {calle2zat$A_andenes}")

calle2zat %>% 
  select(ZAT, A_andenes) %>% 
  st_zm() %>%
  st_transform(crs = st_crs("+proj=longlat +datum=WGS84")) %>%
  leaflet() %>%
  addProviderTiles(providers$CartoDB.Voyager)  %>%
  addPolygons(color = "grey", 
              weight = 0.3,
              smoothFactor = 0.5,
              opacity = 1,
              fillColor = ~pal2(A_andenes),
              fillOpacity = 0.8,
              highlightOptions = highlightOptions(
                weight = 5,
                color = "#666",
                fillOpacity = 0.8,
                bringToFront = TRUE),
              label = zat_label2,
              labelOptions = labelOptions(
                style = list(
                  "font-family" = "Fira Sans, sans-serif",
                  "font-size" = "1.2em"
                ))
              )%>% 
  addLegend("bottomleft",
            pal = pal2,
            values = ~A_andenes,
            title = "Sidewalk Areas in Bogotá</br>(ZAT-level)",
            opacity = 1)