Diagnostic Evaluation of an AI Model for Built Environment Features

Part 1: Comparing annotators’ reports, AI training data and predictions with GIS data

Author

Heli Xu

Published

April 17, 2024

This post is to document the process of the (preliminary) assessments of the reliability of the AI model for built environment (BE) features, by comparing different data sources of BE features at the street level. Here, we are testing the reliability of training data for the AI model, model-based predictions and annotators’ reports, using GIS data or as reference; In another post, we are also testing the GIS/training/prediction/reports data against Computer Assisted Neighborhood Visual Assessment System (CANVAS) data, which is considered a gold standard for BE data collection.

Note

In contrast to the AI testing and validation process, BE features data from different sources are aggregated to the street level for comparison.

Code
library(dplyr)
library(sf)
library(readr)
library(reactable)

1. GIS - Training data

The code related to this part (and the next part) of the work is stored in train_predict_gis_comparison.R, and linked here. Alex has a Stata code for GIS-prediction comparison (discussed in the next section), and the R scripts here are following a similar workflow and naming fashion.

Data cleaning

- Training data

Raw data file is located at Data/Bogota/Annotations/2024_02_12/annotations.csv. Using the latitude and longitude columns, we can construct the coordinates and join that to the street (calle) level (point to polygons, with st_within). On a side note, there are 343 unmatched data points that are outside of the calle polygons (shown below).

After spatially joining the training data to the street level, we can group by the street ID (CodigoCL) and sum up the counts for each street. For distinguishing variables about similar features between sources, we are adding “tr_” to all the variables from training data.

- GIS data

The raw data for GIS data is the Calle_datos/ shapefiles, or the STREET_LEVEL.xlsx with updated column names (that match the codebook).

Most of the cleaning is renaming the columns for easier understanding (following the styles by Alex’s Stata code), with an additional prefix of “gis_”.

- Join and Derive

Last, we are joining the training and GIS data by street IDs, and generate binary variables based on street-level counts (1 if counts >0, 0 if otherwise). Most of the binary columns will have a suffix of “_yn”, with the following exceptions (these are also binary variables): gis_sw, gis_bike_lane, gis_any_bus and gis_median. Notably, some binary variables may be used to derive additional binary variables. For example, tr_pedxwalk_yn is set as 1 if either tr_sign_crossing_yn or tr_crosswalk_yn is 1.

Total street count is 8132.

Reliability Metrics

Given that locations of data points vary from different sources, the aggregated street-level counts may not be compatible for comparison. Here we are only comparing the binary variables of equivalent/similar features, including traffic signs, traffic lights, crosswalk, stop signs, yield signs, school zone signs, sidewalks, bike lanes, bus lanes, medians, speed bumps, trees, bus stops, parked vehicles, parking lanes, and BRT stations. For quick reference, below shows the summary table of the reliability metrics, and the ROC curves.

Reliability metrics comparing training data vs GIS data

ROC curves of training vs GIS data

Overall, traffic lights, medians, sidewalks, trees, bus and BRT stops seem to have higher agreement between training and GIS data.

2. GIS - Prediction data

The code related to this part (and the previous part) of the work is stored in train_predict_gis_comparison.R, and linked here.

Data cleaning

Alex completed joining the prediction and GIS data (with SES level and road types) at the street level (Data/Bogota/Annotations/2024_02_12/predictions_st_mv.dbf), so we’ll use that to perform further processing:

  • rename the columns: prefix “an_” for prediction data, prefix “gis_” for GIS data.

  • derive the binary columns.

With NAs removed in street id, total street count is 7332.

Reliability Metrics

Reliability metrics comparing prediction data vs GIS data

ROC curves of prediction vs GIS data

Traffic lights, sidewalks, median, trees, bus and BRT stops remain to have relatively high agreement between prediction and GIS data. In addition, bike lanes also showed much higher agreement in prediction-GIS comparison than training-GIS comparison.

*Spatial Overlap between Training and Prediction

In the image below, prediction data is shown in orange, while training data is shown in blue. The slightly darker shade of blue indicates the streets that are present in both training and prediction data aggregated to street level (~1000 streets).

3. GIS - Annotators

The code related to this part of the work is stored in annotator_gis_comparison.R, and linked here.

Data cleaning

These are another set of data from human annotators stored in .json format (in Bogota/Annotations/Annotations_2023_04_23/ folder). Each json contains a list of lists/tables (as shown below for test.json), and the more important and relevant information for us are the coordinates in the image file names (in red rectangle), the categories of the annotation (orange underlined: IDs and names) and the actual annotations. We’ll match the category name to the annotations by category id, match the image coordinates to the annotations by image id, combine the three json files (test.json, train.json and eval.json) together, and use the coordinates to join the point data to street level.

Last, we’ll join GIS data by street IDs and rename/derive the columns.

  • Rename: prefix “js_” for annotators’ report, “gis_” for GIS data.

  • Binary variables.

Total street count is 9178.

Reliability Metrics

Reliability metrics comparing annotation data vs GIS data

ROC curves of annotators’ vs GIS data

Consistently, traffic lights, sidewalks, medians, bike lanes, trees, bus and BRT stops show better agreement between annotators’ data and GIS data.

In summary, compared with GIS data, several features have fair to substantial agreement across training, prediction and annotators’ data, including traffic lights, sidewalks, medians, trees and bus and BRT stops (potentially bike lanes).