Calculating band indices

Keywords data used; sentinel-2, band index; NDVI, band index; NDWI, band index; MNDWI

Contexte

Remote sensing indices are combinations of spectral bands used to highlight features in the data and the underlying landscape. For example, one of the most commonly used indices is the Normalised Difference Vegetation Index (NDVI), which uses the ratio of the red and near-infrared (NIR) band to highlight healthy vegetation (see here for a deeper explanation). Using Digital Earth Africa’s archive of analysis-ready satellite data, we can easily calculate a wide range of remote sensing indices that can be used to assist in mapping and monitoring features like vegetation and water consistently through time, or as inputs to machine learning or classification algorithms.

Description

This notebook demonstrates how to:

  • Calculate an index manually using xarray

  • Calculate one or multiple indices using the function calculate_indices from deafrica_bandindices.py


Getting started

To run this analysis, run all the cells in the notebook, starting with the « Load packages » cell.

Load packages

[1]:
%matplotlib inline

import datacube
import matplotlib.pyplot as plt
import xarray as xr
import geopandas as gpd
from datacube.utils.geometry import Geometry

from deafrica_tools.datahandling import load_ard, mostcommon_crs
from deafrica_tools.plotting import rgb
from deafrica_tools.bandindices import calculate_indices
from deafrica_tools.areaofinterest import define_area

Connect to the datacube

[2]:
dc = datacube.Datacube(app='Calculating_band_indices')

Create a query and load satellite data

To demonstrate how to compute a remote sensing index, we first need to load in a time series of satellite data for an area. We will use data from the Sentinel-2 satellite:

To define the area of interest, there are two methods available:

  1. By specifying the latitude, longitude, and buffer. This method requires you to input the central latitude, central longitude, and the buffer value in square degrees around the center point you want to analyze. For example, lat = 10.338, lon = -1.055, and buffer = 0.1 will select an area with a radius of 0.1 square degrees around the point with coordinates (10.338, -1.055).

  2. By uploading a polygon as a GeoJSON or Esri Shapefile. If you choose this option, you will need to upload the geojson or ESRI shapefile into the Sandbox using Upload Files button 7718594000664e9f92221e6a0c224109 in the top left corner of the Jupyter Notebook interface. ESRI shapefiles must be uploaded with all the related files (.cpg, .dbf, .shp, .shx). Once uploaded, you can use the shapefile or geojson to define the area of interest. Remember to update the code to call the file you have uploaded.

To use one of these methods, you can uncomment the relevant line of code and comment out the other one. To comment out a line, add the "#" symbol before the code you want to comment out. By default, the first option which defines the location using latitude, longitude, and buffer is being used.

If running the notebook for the first time, keep the default settings below. This will demonstrate how the analysis works and provide meaningful results.

[3]:
# Method 1: Specify the latitude, longitude, and buffer
aoi = define_area(lat=13.94, lon=-16.54, buffer=0.125)

# Method 2: Use a polygon as a GeoJSON or Esri Shapefile.
# aoi = define_area(vector_path='aoi.shp')

#Create a geopolygon and geodataframe of the area of interest
geopolygon = Geometry(aoi["features"][0]["geometry"], crs="epsg:4326")
geopolygon_gdf = gpd.GeoDataFrame(geometry=[geopolygon], crs=geopolygon.crs)

# Get the latitude and longitude range of the geopolygon
lat_range = (geopolygon_gdf.total_bounds[1], geopolygon_gdf.total_bounds[3])
lon_range = (geopolygon_gdf.total_bounds[0], geopolygon_gdf.total_bounds[2])

# Create a reusable query
query = {
    'x': lon_range,
    'y': lat_range,
    'time': ('2019-06', '2020-06'),
    'resolution': (-30, 30)
}

# Identify the most common projection system in the input query
output_crs = mostcommon_crs(dc=dc, product='s2_l2a', query=query)

# Load available data from Sentinel-2 and filter to retain only times
# with at least 99% good data
ds = load_ard(dc=dc,
              products=['s2_l2a'],
              min_gooddata=0.99,
              measurements=['red', 'green', 'blue', 'swir_1', 'swir_2', 'nir', 'nir_2'],
              output_crs=output_crs,
              **query)

Using pixel quality parameters for Sentinel 2
Finding datasets
    s2_l2a
Counting good quality pixels for each time step
Filtering to 4 out of 78 time steps with at least 99.0% good quality pixels
Applying pixel quality/cloud mask
Loading 4 time steps

It is highly recommended to load data with load_ard when calculating indices, and when using the calculate_indices function introduced below. This is because load_ard performs the necessary data cleaning and scaling for more robust index results.

[4]:
print(ds)
<xarray.Dataset>
Dimensions:      (time: 4, y: 928, x: 907)
Coordinates:
  * time         (time) datetime64[ns] 2019-10-20T11:47:38 ... 2020-03-18T11:...
  * y            (y) float64 1.556e+06 1.556e+06 ... 1.528e+06 1.528e+06
  * x            (x) float64 3.2e+05 3.201e+05 3.201e+05 ... 3.472e+05 3.472e+05
    spatial_ref  int32 32628
Data variables:
    red          (time, y, x) float32 2.067e+03 1e+03 ... 1.899e+03 1.93e+03
    green        (time, y, x) float32 1.707e+03 1.065e+03 ... 1.337e+03
    blue         (time, y, x) float32 896.0 452.0 520.0 ... 1.009e+03 993.0
    swir_1       (time, y, x) float32 4.049e+03 3.072e+03 ... 4.834e+03
    swir_2       (time, y, x) float32 3.372e+03 2.174e+03 ... 4.112e+03
    nir          (time, y, x) float32 4.196e+03 4.068e+03 ... 2.842e+03
    nir_2        (time, y, x) float32 4.361e+03 4.318e+03 ... 3.163e+03 3.05e+03
Attributes:
    crs:           epsg:32628
    grid_mapping:  spatial_ref

Plot the first image to see what our area looks like

We can use the rgb function to plot the timesteps in our dataset as true colour RGB images:

[5]:
# Plot as an RGB image

rgb(ds, col='time')
/usr/local/lib/python3.10/dist-packages/matplotlib/cm.py:478: RuntimeWarning: invalid value encountered in cast
  xx = (xx * 255).astype(np.uint8)
../../../_images/sandbox_notebooks_Frequently_used_code_Calculating_band_indices_14_1.png

Manually calculate an index

One of the most commonly used remote sensing indices is the Normalised Difference Vegetation Index or NDVI. This index uses the ratio of the red and near-infrared (NIR) bands to identify live green vegetation. The formula for NDVI is:

\[\begin{split}\begin{aligned} \text{NDVI} & = \frac{(\text{NIR} - \text{Red})}{(\text{NIR} + \text{Red})} \\ \end{aligned}\end{split}\]

When interpreting this index, high values indicate vegetation, and low values indicate soil or water.

[6]:
# Calculate NDVI using the formula above
ds['NDVI_manual'] = (ds.nir - ds.red) / (ds.nir + ds.red)

# Plot the results for one time step to see what they look like:
ds.NDVI_manual.plot(col='time', vmin=-0.50, vmax=0.8, cmap='RdYlGn')

[6]:
<xarray.plot.facetgrid.FacetGrid at 0x7f14508813c0>
../../../_images/sandbox_notebooks_Frequently_used_code_Calculating_band_indices_16_1.png

In the image above, vegetation shows up as green (NDVI > 0). Sand shows up as yellow (NDVI ~ 0) and water shows up as red (NDVI < 0).

Use the calculate_indices function to calculate an index

The calculate_indices function provides an easier way to calculate a wide range of remote sensing indices, including:

  • ASI (Artificial Surface Index, Yongquan Zhao & Zhe Zhu 2022)

  • AWEI_ns (Automated Water Extraction Index,no shadows, Feyisa 2014)

  • AWEI_sh (Automated Water Extraction Index,shadows, Feyisa 2014)

  • BAEI (Built-Up Area Extraction Index, Bouzekri et al. 2015)

  • BAI (Burn Area Index, Martin 1998)

  • BSI (Bare Soil Index, Rikimaru et al. 2002)

  • BUI (Built-Up Index, He et al. 2010)

  • CMR (Clay Minerals Ratio, Drury 1987)

  • ENDISI (Enhanced Normalised Difference for Impervious Surfaces Index, Chen et al. 2019)

  • EVI (Enhanced Vegetation Index, Huete 2002)

  • FMR (Ferrous Minerals Ratio, Segal 1982)

  • IOR (Iron Oxide Ratio, Segal 1982)

  • LAI (Leaf Area Index, Boegh 2002)

  • MBI (Modified Bare Soil Index, Nguyen et al. 2021)

  • MNDWI (Modified Normalised Difference Water Index, Xu 1996)

  • MSAVI (Modified Soil Adjusted Vegetation Index, Qi et al. 1994)

  • NBI (New Built-Up Index, Jieli et al. 2010)

  • NBR (Normalised Burn Ratio, Lopez Garcia 1991)

  • NDBI (Normalised Difference Built-Up Index, Zha 2003)

  • NDCI (Normalised Difference Chlorophyll Index, Mishra & Mishra, 2012)

  • NDMI (Normalised Difference Moisture Index, Gao 1996)

  • NDSI (Normalised Difference Snow Index, Hall 1995)

  • NDTI (Normalised Difference Turbidity Index, Lacaux et al. 2007)

  • NDVI (Normalised Difference Vegetation Index, Rouse 1973)

  • NDWI (Normalised Difference Water Index, McFeeters 1996)

  • SAVI (Soil Adjusted Vegetation Index, Huete 1988)

  • TCB (Tasseled Cap Brightness, Crist 1985)

  • TCG (Tasseled Cap Greeness, Crist 1985)

  • TCW (Tasseled Cap Wetness, Crist 1985)

  • WI (Water Index, Fisher 2016)

The calculate_indices function can be found in the deafrica_tools.bandindices script. This script provides all required band math involved in creating each index and is worth a look.

Using calculate_indices to get the same result

The function provides a simple way to calculate band indices without needing to explicitly write code for the band math.

[7]:
calculate_indices(ds, index=['NDVI'], satellite_mission='s2')
[7]:
<xarray.Dataset>
Dimensions:      (time: 4, y: 928, x: 907)
Coordinates:
  * time         (time) datetime64[ns] 2019-10-20T11:47:38 ... 2020-03-18T11:...
  * y            (y) float64 1.556e+06 1.556e+06 ... 1.528e+06 1.528e+06
  * x            (x) float64 3.2e+05 3.201e+05 3.201e+05 ... 3.472e+05 3.472e+05
    spatial_ref  int32 32628
Data variables:
    red          (time, y, x) float32 2.067e+03 1e+03 ... 1.899e+03 1.93e+03
    green        (time, y, x) float32 1.707e+03 1.065e+03 ... 1.337e+03
    blue         (time, y, x) float32 896.0 452.0 520.0 ... 1.009e+03 993.0
    swir_1       (time, y, x) float32 4.049e+03 3.072e+03 ... 4.834e+03
    swir_2       (time, y, x) float32 3.372e+03 2.174e+03 ... 4.112e+03
    nir          (time, y, x) float32 4.196e+03 4.068e+03 ... 2.842e+03
    nir_2        (time, y, x) float32 4.361e+03 4.318e+03 ... 3.163e+03 3.05e+03
    NDVI_manual  (time, y, x) float32 0.3399 0.6054 0.6021 ... 0.2269 0.1911
    NDVI         (time, y, x) float32 0.3399 0.6054 0.6021 ... 0.2269 0.1911
Attributes:
    crs:           epsg:32628
    grid_mapping:  spatial_ref
[8]:
# Calculate NDVI using `calculate indices`
ds_ndvi = calculate_indices(ds, index='NDVI', satellite_mission='s2')

# Plot the results
ds_ndvi.NDVI.plot(col='time', vmin=-0.50, vmax=0.8, cmap='RdYlGn')

[8]:
<xarray.plot.facetgrid.FacetGrid at 0x7f1448238e50>
../../../_images/sandbox_notebooks_Frequently_used_code_Calculating_band_indices_21_1.png

Note: when using the calculate_indices function, it is important to set the satellite_mission parameter correctly. This is because different satellite missions use different names for the same bands, which can lead to invalid results if not accounted for. For Sentinel-2 , specify satellite_mission='s2'. For Landsat Collection 2, specify satellite_mission='ls'.

Using calculate_indices to calculate multiple indices at once

The calculate_indices function makes it straightforward to calculate multiple remote sensing indices in one line of code.

In the example below, we will calculate NDVI as well as two common water indices: the Normalised Difference Water Index (NDWI), and the Modified Normalised Difference Index (MNDWI). The new indices will appear in the list of data_variables below:

[9]:
# Calculate multiple indices
ds_multi = calculate_indices(ds, index=['NDVI', 'NDWI', 'MNDWI'], satellite_mission='s2')

print(ds_multi)
<xarray.Dataset>
Dimensions:      (time: 4, y: 928, x: 907)
Coordinates:
  * time         (time) datetime64[ns] 2019-10-20T11:47:38 ... 2020-03-18T11:...
  * y            (y) float64 1.556e+06 1.556e+06 ... 1.528e+06 1.528e+06
  * x            (x) float64 3.2e+05 3.201e+05 3.201e+05 ... 3.472e+05 3.472e+05
    spatial_ref  int32 32628
Data variables:
    red          (time, y, x) float32 2.067e+03 1e+03 ... 1.899e+03 1.93e+03
    green        (time, y, x) float32 1.707e+03 1.065e+03 ... 1.337e+03
    blue         (time, y, x) float32 896.0 452.0 520.0 ... 1.009e+03 993.0
    swir_1       (time, y, x) float32 4.049e+03 3.072e+03 ... 4.834e+03
    swir_2       (time, y, x) float32 3.372e+03 2.174e+03 ... 4.112e+03
    nir          (time, y, x) float32 4.196e+03 4.068e+03 ... 2.842e+03
    nir_2        (time, y, x) float32 4.361e+03 4.318e+03 ... 3.163e+03 3.05e+03
    NDVI_manual  (time, y, x) float32 0.3399 0.6054 0.6021 ... 0.2269 0.1911
    NDVI         (time, y, x) float32 0.3399 0.6054 0.6021 ... 0.2269 0.1911
    NDWI         (time, y, x) float32 -0.4217 -0.585 -0.5577 ... -0.387 -0.3601
    MNDWI        (time, y, x) float32 -0.4069 -0.4851 ... -0.5657 -0.5667
Attributes:
    crs:           epsg:32628
    grid_mapping:  spatial_ref
[10]:
# Plot the NDWI results
ds_multi.NDWI.plot(col='time', robust=True, cmap='PuBu')
[10]:
<xarray.plot.facetgrid.FacetGrid at 0x7f142c3c5cc0>
../../../_images/sandbox_notebooks_Frequently_used_code_Calculating_band_indices_25_1.png
[11]:
# Plot the MNDWI results
ds_multi.MNDWI.plot(col='time', robust=True, cmap='PuBu')
[11]:
<xarray.plot.facetgrid.FacetGrid at 0x7f1448320190>
../../../_images/sandbox_notebooks_Frequently_used_code_Calculating_band_indices_26_1.png

We can also drop the original satellite bands from the dataset using drop=True. The dataset produced below should now only include the new 'NDVI', 'NDWI', 'MNDWI' bands under data_variables:

[12]:
# Calculate multiple indices and drop original bands
ds_drop = calculate_indices(ds, index=['NDVI', 'NDWI', 'MNDWI'], drop=True, satellite_mission='s2')
print(ds_drop)
Dropping bands ['red', 'green', 'blue', 'swir_1', 'swir_2', 'nir', 'nir_2', 'NDVI_manual']
<xarray.Dataset>
Dimensions:      (time: 4, y: 928, x: 907)
Coordinates:
  * time         (time) datetime64[ns] 2019-10-20T11:47:38 ... 2020-03-18T11:...
  * y            (y) float64 1.556e+06 1.556e+06 ... 1.528e+06 1.528e+06
  * x            (x) float64 3.2e+05 3.201e+05 3.201e+05 ... 3.472e+05 3.472e+05
    spatial_ref  int32 32628
Data variables:
    NDVI         (time, y, x) float32 0.3399 0.6054 0.6021 ... 0.2269 0.1911
    NDWI         (time, y, x) float32 -0.4217 -0.585 -0.5577 ... -0.387 -0.3601
    MNDWI        (time, y, x) float32 -0.4069 -0.4851 ... -0.5657 -0.5667
Attributes:
    crs:           epsg:32628
    grid_mapping:  spatial_ref

Additional information

License: The code in this notebook is licensed under the Apache License, Version 2.0. Digital Earth Africa data is licensed under the Creative Commons by Attribution 4.0 license.

Contact: If you need assistance, please post a question on the Open Data Cube Slack channel or on the GIS Stack Exchange using the open-data-cube tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on Github.

Compatible datacube version:

[13]:
print(datacube.__version__)
1.8.15

Last Tested:

[14]:
from datetime import date
print(date.today())
2023-08-11