Urbanization Index Comparisons with Global Human Settlement (GHS)¶
Products used: gm_s2_annual, wofs_ls_summary_alltime
Tags: band index; NDBI, band index; ENDISI, urban
Background¶
There are many different urbanization indices with different characteristics and use cases. It is often convenient to be able to compare the performance of several indicies for an area - determining which is the best for an area based on the outputs and a “ground truth” dataset of urbanization.
Description¶
This notebook uses several indices to classify land as “urban” and then compares those results with the Global Human Settlement (GHS) product which shows the extent of built-up area (urban extent) through to 2014. The indices tested here are the Normalized Difference Buildup Index (NDBI) and the Enhanced Normalized Difference Impervious Surface Index (ENDISI).
Load a geomedian image from the region of interest.
Mask watr using the WOfS alltime summary
Calulate urban indices and show histograms for the indicies.
Select minimum and maximum threshold values for the indicies to determine urban pixels.
Show the urbanization prediction images.
Load and show the “ground truth” data (GHS) for the year 2014.
Compare the urbanization predictions with the “ground truth” data visually and statistically.
The choice of threshold values can be informed by the histograms and comparing the urbanization prediction images with the “ground truth” urbanization data.
Getting started¶
To run this analysis, run all the cells in the notebook, starting with the “Load packages” cell.
Load packages¶
Import Python packages that are used for the analysis.
[1]:
%matplotlib inline
import datacube
import matplotlib.pyplot as plt
plt.rcParams.update({'font.size': 14})
import numpy as np
import xarray as xr
from collections import namedtuple
from datacube.utils.geometry import assign_crs
from skimage.morphology import remove_small_objects
from skimage.morphology import remove_small_holes
from matplotlib.patches import Patch
from odc.algo import xr_reproject
from deafrica_tools.plotting import display_map, rgb
from deafrica_tools.bandindices import calculate_indices
/env/lib/python3.8/site-packages/geopandas/_compat.py:106: UserWarning: The Shapely GEOS version (3.8.0-CAPI-1.13.1 ) is incompatible with the GEOS version PyGEOS was compiled with (3.9.1-CAPI-1.14.2). Conversions between both will be slow.
warnings.warn(
Set up a Dask cluster¶
Dask can be used to better manage memory use and conduct the analysis in parallel. For an introduction to using Dask with Digital Earth Africa, see the Dask notebook.
Note: We recommend opening the Dask processing window to view the different computations that are being executed; to do this, see the Dask dashboard in DE Africa section of the Dask notebook.
To use Dask, set up the local computing cluster using the cell below.
Connect to the datacube¶
Activate the datacube database, which provides functionality for loading and displaying stored Earth observation data.
[2]:
dc = datacube.Datacube(app="Urbanization_GHS_Comparison")
Analysis parameters¶
The following cell sets important parameters for the analysis. The parameters are:
lat
: The central latitude to analyse (e.g.10.338
).lon
: The central longitude to analyse (e.g.-1.055
).lat_buffer
: The number of degrees to load around the central latitude.lon_buffer
: The number of degrees to load around the central longitude.time_range
: The time range to analyze - in YYYY-MM-DD format (e.g.('2016-01-01', '2016-12-31')
).
If running the notebook for the first time, keep the default settings below. The default area is Dakar, Senegal.
Select location¶
[3]:
# Define the area of interest
lat = 14.72
lon = -17.355
lat_buffer = 0.15
lon_buffer = 0.2
time_range = ("2017")
# Combine central lat,lon with buffer to get area of interest
lat_range = (lat - lat_buffer, lat + lat_buffer)
lon_range = (lon - lon_buffer, lon + lon_buffer)
View the selected location¶
The next cell will display the selected area on an interactive map. Feel free to zoom in and out to get a better understanding of the area you’ll be analysing. Clicking on any point of the map will reveal the latitude and longitude coordinates of that point.
[4]:
# The code below renders a map that can be used to view the region.
display_map(lon_range, lat_range)
[4]:
Load the data¶
We will below load a geoomedian from 2017
[5]:
# Create the 'query' dictionary object
query = {
"longitude": lon_range,
"latitude": lat_range,
"time": time_range,
"resolution": (-20, 20),
}
#load geomedian
ds = dc.load(product='gm_s2_annual',
measurements=["red", "green", "blue", "swir_1", "swir_2", "nir"],
**query
).squeeze()
Once the load is complete, examine the data by printing it in the next cell. The Dimensions
attribute revels the number of time steps in the data set, as well as the number of pixels in the longitude
and latitude
dimensions.
[6]:
ds
[6]:
<xarray.Dataset> Dimensions: (x: 1930, y: 1854) Coordinates: time datetime64[ns] 2017-07-02T11:59:59.999999 * y (y) float64 1.876e+06 1.876e+06 ... 1.839e+06 1.839e+06 * x (x) float64 -1.694e+06 -1.694e+06 ... -1.655e+06 -1.655e+06 spatial_ref int32 6933 Data variables: red (y, x) uint16 438 443 435 455 471 474 ... 542 546 551 549 548 green (y, x) uint16 502 515 500 520 538 542 ... 652 659 667 666 664 blue (y, x) uint16 526 536 525 544 566 577 ... 590 605 603 601 602 swir_1 (y, x) uint16 346 346 345 359 372 377 ... 441 445 447 448 445 swir_2 (y, x) uint16 296 296 298 314 323 330 ... 387 388 387 384 383 nir (y, x) uint16 389 398 385 401 416 424 ... 499 503 502 506 506 Attributes: crs: EPSG:6933 grid_mapping: spatial_ref
- x: 1930
- y: 1854
- time()datetime64[ns]2017-07-02T11:59:59.999999
- units :
- seconds since 1970-01-01 00:00:00
array('2017-07-02T11:59:59.999999000', dtype='datetime64[ns]')
- y(y)float641.876e+06 1.876e+06 ... 1.839e+06
- units :
- metre
- resolution :
- -20.0
- crs :
- EPSG:6933
array([1876350., 1876330., 1876310., ..., 1839330., 1839310., 1839290.])
- x(x)float64-1.694e+06 ... -1.655e+06
- units :
- metre
- resolution :
- 20.0
- crs :
- EPSG:6933
array([-1693810., -1693790., -1693770., ..., -1655270., -1655250., -1655230.])
- spatial_ref()int326933
- spatial_ref :
- PROJCS["WGS 84 / NSIDC EASE-Grid 2.0 Global",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Cylindrical_Equal_Area"],PARAMETER["standard_parallel_1",30],PARAMETER["central_meridian",0],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","6933"]]
- grid_mapping_name :
- lambert_cylindrical_equal_area
array(6933, dtype=int32)
- red(y, x)uint16438 443 435 455 ... 546 551 549 548
- units :
- 1
- nodata :
- 0
- crs :
- EPSG:6933
- grid_mapping :
- spatial_ref
array([[ 438, 443, 435, ..., 2113, 2096, 2084], [ 436, 437, 458, ..., 2139, 2084, 2081], [ 441, 463, 477, ..., 2073, 2056, 2089], ..., [ 463, 459, 450, ..., 549, 553, 550], [ 453, 442, 442, ..., 552, 550, 553], [ 454, 441, 442, ..., 551, 549, 548]], dtype=uint16)
- green(y, x)uint16502 515 500 520 ... 659 667 666 664
- units :
- 1
- nodata :
- 0
- crs :
- EPSG:6933
- grid_mapping :
- spatial_ref
array([[ 502, 515, 500, ..., 1483, 1459, 1446], [ 503, 507, 518, ..., 1462, 1435, 1436], [ 506, 521, 533, ..., 1410, 1413, 1437], ..., [ 525, 524, 516, ..., 662, 666, 661], [ 514, 508, 512, ..., 668, 663, 665], [ 517, 507, 515, ..., 667, 666, 664]], dtype=uint16)
- blue(y, x)uint16526 536 525 544 ... 605 603 601 602
- units :
- 1
- nodata :
- 0
- crs :
- EPSG:6933
- grid_mapping :
- spatial_ref
array([[526, 536, 525, ..., 990, 957, 941], [527, 533, 541, ..., 947, 931, 934], [528, 536, 552, ..., 905, 918, 934], ..., [537, 541, 541, ..., 606, 601, 595], [530, 529, 542, ..., 604, 600, 602], [540, 534, 545, ..., 603, 601, 602]], dtype=uint16)
- swir_1(y, x)uint16346 346 345 359 ... 445 447 448 445
- units :
- 1
- nodata :
- 0
- crs :
- EPSG:6933
- grid_mapping :
- spatial_ref
array([[ 346, 346, 345, ..., 4895, 4883, 4866], [ 342, 348, 365, ..., 4896, 4852, 4833], [ 352, 367, 374, ..., 4888, 4843, 4828], ..., [ 375, 363, 356, ..., 445, 449, 445], [ 362, 352, 352, ..., 448, 449, 447], [ 365, 357, 352, ..., 447, 448, 445]], dtype=uint16)
- swir_2(y, x)uint16296 296 298 314 ... 388 387 384 383
- units :
- 1
- nodata :
- 0
- crs :
- EPSG:6933
- grid_mapping :
- spatial_ref
array([[ 296, 296, 298, ..., 4074, 4100, 4063], [ 294, 300, 317, ..., 4105, 4077, 4037], [ 304, 317, 324, ..., 4086, 4055, 4035], ..., [ 331, 318, 313, ..., 387, 387, 383], [ 320, 311, 307, ..., 388, 385, 383], [ 319, 313, 305, ..., 387, 384, 383]], dtype=uint16)
- nir(y, x)uint16389 398 385 401 ... 503 502 506 506
- units :
- 1
- nodata :
- 0
- crs :
- EPSG:6933
- grid_mapping :
- spatial_ref
array([[ 389, 398, 385, ..., 3133, 3093, 3089], [ 388, 395, 401, ..., 3142, 3075, 3084], [ 390, 405, 416, ..., 3092, 3056, 3095], ..., [ 411, 409, 406, ..., 508, 510, 509], [ 402, 396, 406, ..., 509, 510, 509], [ 401, 395, 406, ..., 502, 506, 506]], dtype=uint16)
- crs :
- EPSG:6933
- grid_mapping :
- spatial_ref
Filter out water pixels using WOfS¶
[7]:
# Create a water mask using WOfS.
water = dc.load(
product="wofs_ls_summary_alltime",
like=ds.geobox).frequency
# Mask out water.
water_mask = water > 0.1
ds = ds.where(~water_mask.squeeze())
Land Spectral Indices¶
We’re going to be calculating NDBI and ENDISI as urbanization indices.
NDBI
The Normalized Difference Built-Up Index (NDBI) is one of the most commonly used proxies of urbanization. Like all normalized difference indicies, it has a range of [-1,1].
Calculate NDBI and ENDISI¶
[8]:
#NDBI and ENDISI using DE Africa funciton calculate_indices
ds = calculate_indices(ds, index=['NDBI', "ENDISI"], collection='s2')
Plot the two urban indices¶
[9]:
#Plot
fig, ax = plt.subplots(1, 2, figsize=(16,6), sharey=True)
ds.NDBI.plot.imshow(ax=ax[0], vmin=-.75, vmax=.75, cmap='RdBu_r')
ds.ENDISI.plot.imshow(ax=ax[1], vmin=-.75, vmax=.75, cmap='RdBu_r')
ax[0].set_title('NDBI'), ax[0].xaxis.set_visible(False), ax[0].yaxis.set_visible(False)
ax[1].set_title('ENDISI'), ax[1].xaxis.set_visible(False), ax[1].yaxis.set_visible(False)
plt.tight_layout();

Determining the thresholds for urbanization¶
These histogram plots show the distribution of values for each product. The urban threshold values are chosen using these histograms.
If a highly urban area is being examined, there should be visible maximal values for these histograms. The ideal thresholds should usually include these values (see the x axes of the histograms) and some range of values less than and greater than these maximal values.
[10]:
#NDBI
ds.NDBI.plot.hist(bins=1000, range=(-1,1), facecolor='gray', figsize=(10, 4))
plt.title('NDBI Histogram')
#ENDISI
ds.ENDISI.plot.hist(bins=1000, range=(-1,1), facecolor='gray', figsize=(10, 4))
plt.title('ENDISI Histogram')
plt.tight_layout();


Create Threshold Plots¶
First we will define a minimum threshold and a maximum threshold for each index. Then we will create plots that color the threshold region a single color (e.g. red).
[11]:
# NDBI (Buildup Index) = -1.0 to 1.0 (full range)
# NDBI -0.1 to 0.3 is typical for urban areas
min_ndbi_threshold = -0.1
max_ndbi_threshold = 0.15
# ENDISI = -1.0 to 1.0 (full range)
# ENDISI -0.2 to 0.4 is typical for urban areas
min_endisi_threshold = -0.3
max_endisi_threshold = 0.1
[12]:
# Set up the sub-plots
fig, ax = plt.subplots(1, 2, figsize=(12, 6))
rgb(ds, ax=ax[0])
ds.NDBI.where((ds.NDBI > min_ndbi_threshold) &
(ds.NDBI < max_ndbi_threshold)).plot.imshow(
cmap='Greys',
ax=ax[0],
robust=True,
add_colorbar=False,
add_labels=False)
rgb(ds, ax=ax[1])
ds.ENDISI.where((ds.ENDISI > min_endisi_threshold) &
(ds.ENDISI < max_endisi_threshold)).plot.imshow(
cmap='Greys',
ax=ax[1],
robust=True,
add_colorbar=False,
add_labels=False)
#remove axes plotting elements
for a in ax:
a.xaxis.set_visible(False)
a.yaxis.set_visible(False)
#set titles
ax[0].set_title(f'NDBI Threshold ({min_ndbi_threshold} < x < {max_ndbi_threshold})')
ax[1].set_title(f'ENDISI Threshold ({min_endisi_threshold} < x < {max_endisi_threshold})')
plt.tight_layout();

Comparison Metrics¶
We will compare the performance of the urban index results against the GHS GeoTIFF product (shown below).
The GHS geotiff for the Senegal region is available by default in the Supplementary_data
folder. To find and download GHS geotiffs for other regions, use the following link: https://ghsl.jrc.ec.europa.eu/download.php?ds=bu
[13]:
# Senegal region
tif = '../Supplementary_data/Urban_index_comparison/GHS_BUILT_LDSMT_GLOBE_R2018A_3857_30_V2_0_12_10.tif'
Open and reproject dataset to match Landsat¶
[14]:
#Open and assign a geobox object
ghs_ds = assign_crs(xr.open_rasterio(tif).squeeze().chunk({'x':5000, 'y':5000}))
#reproject to match our landsat composite
ghs_ds = xr_reproject(ghs_ds,
ds.geobox,"nearest").compute()
#Threshold GHS to get all the urban areas
actual = (ghs_ds >= 3) & (ghs_ds <= 6)
/env/lib/python3.8/site-packages/pyproj/crs/crs.py:280: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
projstring = _prepare_from_string(projparams)
[15]:
#Plot
actual.plot(figsize=(8,7), add_colorbar=False)
plt.title('Global Human Settlement Urban Areas');

Metric and Plotting Functions¶
The code below will calculate the true/false positive/negative sums and calculate values for a typical confusion matrix to assess the results. Accuracy is used when the True Positives and True negatives are more important while F1-score is used when the False Negatives and False Positives are crucial.
[16]:
def get_metrics(actual, predicted, minimum_threshold, maximum_threshold, filter_size=1):
""" Creates performance metrics.
Args:
actual: the data to use as truth.
predicted: the data to predict and to compare against actual.
minimum_threshold: the minimum threshold to apply on the predicted values for generating a boolean mask.
maximum_threshold: the maximum threshold to apply on the predicted values for generating a boolean mask.
filter_size: the filter size to apply on predicted to remove small object/holes with.
Returns: A namedtuple containing the actual, predicted mask, and varying metrics for a confusion matrix.
"""
metrics = namedtuple('Metrics',
'actual predicted true_positive true_negative false_positive false_negative')
predicted = (predicted > minimum_threshold) & (predicted < maximum_threshold)
predicted = remove_small_objects(predicted, min_size=filter_size+1, connectivity=2)
predicted = remove_small_holes(predicted, area_threshold=filter_size+1, connectivity=2)
true_positive=(predicted & actual).sum()
true_negative=(~predicted & ~actual).sum()
false_positive=(predicted & ~actual).sum()
false_negative=(~predicted & actual).sum()
return metrics(actual=actual,
predicted=predicted,
true_positive=true_positive,
true_negative=true_negative,
false_positive=false_positive,
false_negative=false_negative)
def print_metrics(metrics):
norm = metrics.true_positive + metrics.false_negative + metrics.false_positive + metrics.true_negative
accuracy = (metrics.true_positive + metrics.true_negative)/norm
ppv = metrics.true_positive/(metrics.true_positive + metrics.false_positive)
tpr = metrics.true_positive/(metrics.true_positive + metrics.false_negative)
f1 = (2*ppv*tpr)/(ppv+tpr)
print('True Positive (Actual + Model = Urban): {tp}'.format(tp=round(metrics.true_positive/norm*100,3)))
print('True Negative (Actual + Model = Non-Urban): {tn}'.format(tn=round(metrics.true_negative/norm*100,3)))
print('False Positive (Actual=Non-Urban, Model=Urban): {fp}'.format(fp=round(metrics.false_positive/norm*100,3)))
print('False Negative (Actual=Urban, Model=Non-Urban): {fn}'.format(fn=round(metrics.false_negative/norm*100,3)))
print('\nAccuracy: {accuracy}'.format(accuracy=round(accuracy*100, 3)))
print('F1 Score: {f1}\n'.format(f1=round(f1*100, 3)))
[17]:
indexes =['NDBI','ENDISI']
min_thresholds = [min_ndbi_threshold, min_endisi_threshold]
max_thresholds =[max_ndbi_threshold, max_endisi_threshold]
index_metrics=[]
for index, min_thresh, max_thresh in zip(indexes, min_thresholds, max_thresholds):
print ('\033[1m' + '\033[91m' + index+' - Comparison Results') # bold print and red
print ('\033[0m') # stop bold and red
metrics = get_metrics(actual.values, ds[index].values, min_thresh, max_thresh)
index_metrics.append(metrics)
print_metrics(metrics)
#create a dictionary with the accuracy data in it
index_metrics = {indexes[i]: index_metrics[i] for i in range(len(indexes))}
NDBI - Comparison Results
True Positive (Actual + Model = Urban): 11.682
True Negative (Actual + Model = Non-Urban): 78.626
False Positive (Actual=Non-Urban, Model=Urban): 8.475
False Negative (Actual=Urban, Model=Non-Urban): 1.218
Accuracy: 90.308
F1 Score: 70.679
ENDISI - Comparison Results
True Positive (Actual + Model = Urban): 9.617
True Negative (Actual + Model = Non-Urban): 83.161
False Positive (Actual=Non-Urban, Model=Urban): 3.94
False Negative (Actual=Urban, Model=Non-Urban): 3.282
Accuracy: 92.778
F1 Score: 72.702
Output Comparisons¶
The dstack
calls provide the imshow
calls with RGB array inputs. For each image, the first channel (red) is the actual (ground truth, GHS) values, and both the second and third channels (green, blue) are the predicted values (green + blue = cyan).
[18]:
fig, ax = plt.subplots(1, 2, figsize=(20,10))
for a, key in zip(ax, index_metrics):
a.imshow(np.dstack((index_metrics[key].actual.astype(float),
index_metrics[key].predicted.astype(float),
index_metrics[key].predicted.astype(float))))
a.legend(
[Patch(facecolor='cyan'), Patch(facecolor='red'), Patch(facecolor='white')],
['False Positive', 'False Negative', 'True Positive'], loc='lower left', fontsize=10)
a.xaxis.set_visible(False)
a.yaxis.set_visible(False)
a.set_title(key +' Comparison')

Next Steps¶
Machine learning can also be used to measure urbanisation. See this notebook for a guide on using machine learning in the context of the ODC.
Additional information¶
License: The code in this notebook is licensed under the Apache License, Version 2.0. Digital Earth Africa data is licensed under the Creative Commons by Attribution 4.0 license.
Contact: If you need assistance, please post a question on the Open Data Cube Slack channel or on the GIS Stack Exchange using the open-data-cube
tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on
Github.
Compatible datacube version:
[19]:
print(datacube.__version__)
1.8.5
Last Tested:
[20]:
from datetime import datetime
datetime.today().strftime('%Y-%m-%d')
[20]:
'2021-09-16'