Exploratory Data Analysis of BEPIDL Data

Geo-referenced ZAT-level Indicators

Author

Heli Xu

Published

January 23, 2024

Building on the initial exploratory data analysis on variable distribution, I’d like to join the spatial information from ZAT folder with the variables from ZAT_INDICADORES.xlsx and visualize the geo-referenced data.

Files included

ZAT/

This folder contains boundary files for the ZAT units – a table of 1141 rows, 6 cols (each row being a ZAT unit); (dropping z dimension to create interactive maps).

Code
library(readxl)
library(dplyr)
library(tidyr)
library(ggplot2)
library(readr)
library(sf)
library(leaflet)

zat <- st_read("../../../../data/ZAT/ZAT_geo/ZAT.shp")
Reading layer `ZAT' from data source 
  `D:\LocalGitHub\BEPIDL\data\ZAT\ZAT_geo\ZAT.shp' using driver `ESRI Shapefile'
Simple feature collection with 1141 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension:     XY, XYZ
Bounding box:  xmin: -74.88439 ymin: 3.70344 xmax: -73.05211 ymax: 5.83076
z_range:       zmin: 0 zmax: 0
Geodetic CRS:  MAGNA-SIRGAS
Code
zat %>%
  st_zm() %>% 
  leaflet() %>%
  addProviderTiles(providers$CartoDB.Voyager) %>%
  addPolygons()

ZAT_INDICADORES.xlsx

This is a data table of 919 rows, 18 cols (each row being a ZAT unit).

According to the codebook, the indicators in this table seem to be the “old” version. For all the ZAT included, there are no missing data, while some of the values are zeros. As seen from the row count, it contains fewer ZAT units than the shapefiles. (first 10 rows shown below)

Code
zat_data <- read_xlsx("../../../../data/ZAT/ZAT_INDICADORES.xlsx")
zat_data[1:10,]
# A tibble: 10 × 18
     ZAT `Area M2` areakm2 BUSTOPDENS LONGMV LRDENS BPRDRATE NUMINT INTDENS
   <dbl>     <dbl>   <dbl>      <dbl>  <dbl>  <dbl>    <dbl>  <dbl>   <dbl>
 1    83  1016244.   1.02       13.8  10852. 10679.   0         100   98.4 
 2   854   219816.   0.220      18.2   1391.  6330.   0          21   95.5 
 3    57   513564.   0.514      35.0   6935. 13504.   0          99  193.  
 4    20   217888.   0.218      18.4   3586. 16460.   0          40  184.  
 5    18   162708.   0.163      24.6   1083.  6655.   0           6   36.9 
 6    19   228574.   0.229       0        0      0    0           0    0   
 7    84   239887.   0.240       0      759.  3163.   0           5   20.8 
 8   559  1184382.   1.18        0      167.   141.   0           2    1.69
 9   561   807833.   0.808       3.71 10939. 13542.   0.0603    159  197.  
10   947   916149.   0.916       8.73 17013. 18570.   0.0677    216  236.  
# ℹ 9 more variables: NUMTTREES <dbl>, NUMSTTREES <dbl>, NUMBRIDGES <dbl>,
#   NUMTTFLIGH <dbl>, NUMPTFLIGH <dbl>, NUMRBP <dbl>, LONGRBP <dbl>,
#   NUMRT <dbl>, LONGRT <dbl>

Joining Geometry with Data

Here we’ll join the ZAT shapefiles with the ZAT_INDICADORES.xlsx by ZAT ID, keeping the ZAT units that we have data for, resulting in a table of 919 rows and 23 columns.

Code
georef_zat <- zat_data %>% 
  left_join(zat, by = "ZAT") %>% 
  st_as_sf()

Visualizing Indicators in Maps

We’ll use the georeferenced data to visualize the indicators in maps (dropping z dimension, datum set to WGS84). With loose categorizing the variables into 3 groups: public transits, road features and traffic lights, we’ll show interactive maps for one variable for each group along with distributions of all variables.

Public transit

Code
library(glue)
# set color palette for circles
pal <- colorNumeric(
  palette = c("orange","navy"),
  domain = georef_zat$BUSTOPDENS
)

zat_label <- glue("ZAT{georef_zat$ZAT} Bus stop density: {round(georef_zat$BUSTOPDENS,2)}/km2")

georef_zat %>%
  st_zm() %>%
  st_transform(crs = st_crs("+proj=longlat +datum=WGS84")) %>%
  leaflet() %>%
  addProviderTiles(providers$CartoDB.Voyager)  %>%
  addPolygons(color = "white", 
              weight = 0.5,
              smoothFactor = 0.5,
              opacity = 1,
              fillColor = ~pal(BUSTOPDENS),
              fillOpacity = 0.8,
              highlightOptions = highlightOptions(
                weight = 5,
                color = "#666",
                fillOpacity = 0.8,
                bringToFront = TRUE),
              label = zat_label,
              labelOptions = labelOptions(
                style = list(
                  "font-family" = "Fira Sans, sans-serif",
                  "font-size" = "1.2em"
                ))
              ) %>% 
  addLegend("bottomleft",
            pal = pal,
            values = ~BUSTOPDENS,
            title = "Bus Stop Density in Bogotá</br>(ZAT-level)",
            opacity = 1)

Code
zat_data_long <- zat_data %>% 
  select(-2, -3) %>% 
  pivot_longer(-ZAT, names_to = "indicator")

bus <- c('BUSTOPDENS', 'NUMRBP', 'LONGRBP', 'NUMRT', 'LONGRT')
onroad <- c('LONGMV', 'LRDENS', 'BPRDRATE', 'NUMINT', 'INTDENS')
tlight <- c('NUMTTFLIGH', 'NUMPTFLIGH')
byroad <- c('NUMTTREES', 'NUMSTTREES', 'NUMBRIDGES')
  
zat_data_long %>% 
  filter(indicator%in%all_of(bus)) %>% 
  ggplot(aes(x=value)) +
           geom_histogram(fill = "skyblue", color = "blue")+
           facet_wrap(~indicator, scales = "free") +
           theme_minimal() +
  labs(title = "Distribution of Indicators related to Bus/BRT")+
  theme(
    strip.background = element_rect(fill = "#dadada", color = "white"),
    panel.grid.minor = element_blank(),
    plot.title = element_text(size = 15, face = "bold")
  )

Road features

Code
pal2 <- colorNumeric(
  palette = c("orange","navy"),
  domain = georef_zat$INTDENS
)

zat_label2 <- glue("ZAT{georef_zat$ZAT} Intersection density: {round(georef_zat$INTDENS,2)}/km2")

georef_zat %>%
  st_zm() %>%
  st_transform(crs = st_crs("+proj=longlat +datum=WGS84")) %>%
  leaflet() %>%
  addProviderTiles(providers$CartoDB.Voyager)  %>%
  addPolygons(color = "white", 
              weight = 0.5,
              smoothFactor = 0.5,
              opacity = 1,
              fillColor = ~pal2(INTDENS),
              fillOpacity = 0.8,
              highlightOptions = highlightOptions(
                weight = 5,
                color = "#666",
                fillOpacity = 0.8,
                bringToFront = TRUE),
              label = zat_label2,
              labelOptions = labelOptions(
                style = list(
                  "font-family" = "Fira Sans, sans-serif",
                  "font-size" = "1.2em"
                ))
              ) %>% 
  addLegend("bottomleft",
            pal = pal2,
            values = ~INTDENS,
            title = "Intersection Density in Bogotá</br>(ZAT-level)",
            opacity = 1)
Code
zat_data_long %>% 
  filter(indicator%in%all_of(onroad)) %>% 
  ggplot(aes(x=value)) +
           geom_histogram(fill = "skyblue", color = "blue")+
           facet_wrap(~indicator, scales = "free") +
           theme_minimal()+
  labs(title = "Distribution of Indicators related to Features on the Road") +
  theme(
    strip.background = element_rect(fill = "#dadada", color = "white"),
    panel.grid.minor = element_blank(),
    plot.title = element_text(size = 15, face = "bold")
  )

Code
zat_data_long %>% 
  filter(indicator%in%all_of(byroad)) %>% 
  ggplot(aes(x=value)) +
           geom_histogram(fill = "skyblue", color = "blue", alpha = 0.7)+
           facet_wrap(~indicator, scales = "free") +
           theme_minimal()+
  labs(title = "Distribution of Indicators related to Features by the Road") +
  theme(
    strip.background = element_rect(fill = "#dadada", color = "white"),
    panel.grid.minor = element_blank(),
    plot.title = element_text(size = 15, face = "bold")
  )

Traffic lights

Code
pal3 <- colorNumeric(
  palette = c("orange","navy"),
  domain = georef_zat$NUMTTFLIGH
)

zat_label3 <- glue("ZAT{georef_zat$ZAT} Total traffic lights: {georef_zat$NUMTTFLIGH}")

georef_zat %>%
  st_zm() %>%
  st_transform(crs = st_crs("+proj=longlat +datum=WGS84")) %>%
  leaflet() %>%
  addProviderTiles(providers$CartoDB.Voyager)  %>%
  addPolygons(color = "white", 
              weight = 0.5,
              smoothFactor = 0.5,
              opacity = 1,
              fillColor = ~pal3(NUMTTFLIGH),
              fillOpacity = 0.8,
              highlightOptions = highlightOptions(
                weight = 5,
                color = "#666",
                fillOpacity = 0.8,
                bringToFront = TRUE),
              label = zat_label3,
              labelOptions = labelOptions(
                style = list(
                  "font-family" = "Fira Sans, sans-serif",
                  "font-size" = "1.2em"
                ))
              ) %>% 
  addLegend("bottomleft",
            pal = pal3,
            values = ~NUMTTFLIGH,
            title = "Total Traffic Lights in Bogotá</br>(ZAT-level)",
            opacity = 1)
Code
zat_data_long %>% 
  filter(indicator%in%all_of(tlight)) %>% 
  ggplot(aes(x=value)) +
           geom_histogram(fill = "skyblue", color = "blue", alpha = 0.7)+
           facet_wrap(~indicator, scales = "free") +
           theme_minimal()+
  labs(title = "Distribution of Indicators about Traffic Lights") +
  theme(
    strip.background = element_rect(fill = "#dadada", color = "white"),
    panel.grid.minor = element_blank(),
    plot.title = element_text(size = 15, face = "bold")
  )

Initial takeaway: some of the indicators seem to show similar spatial patterns, though there is a significant number of lower count values in the dataset which might make the visualization less clear.