Run analyses on multiple polygons
Products used: s2_l2a
Keywords: spatial analysis; polygons, data used; sentinel-2, python package; GeoPandas, band index; NDVI
Background
Many users need to run analyses on their own areas of interest. A common use case involves running the same analysis across multiple polygons in a vector file (e.g. ESRI Shapefile or GeoJSON). This notebook will demonstrate how to use a vector file and the Open Data Cube to extract satellite data from Digital Earth Africa to match individual polygon geometries.
Description
If we have a vector file containing multiple polygons, we can use the python package geopandas to open it as a GeoDataFrame
. We can then iterate through each geometry and extract satellite data corresponding with the extent of each geometry. Further anlaysis can then be conducted on each resulting xarray.Dataset
.
In this notebook, we demonstrate how to retrieve data for each polygon and perform an analysis. The example analysis in this notebook is to load and plot the normalised difference vegetation index (NDVI). This is conducted through the following steps:
Open the file of polygons using
geopandas
.Iterate through the generated GeoDataFrame, extracting satellite data from DE Africa’s Open Data Cube.
Calculate NDVI as an example analysis on one of the extracted satellite timeseries.
Plot NDVI for the polygon extent.
Getting started
To run this analysis, run all the cells in the notebook, starting with the “Load packages” cell.
Load packages
Please note the use of datacube.utils
package geometry
: this is important for saving the coordinate reference system of the incoming shapefile in a format that the Digital Earth Africa query can understand.
[1]:
%matplotlib inline
import datacube
import matplotlib.pyplot as plt
import geopandas as gpd
from odc.geo.geom import Geometry
from odc.geo.crs import CRS
from deafrica_tools.datahandling import load_ard
from deafrica_tools.bandindices import calculate_indices
from deafrica_tools.plotting import rgb, map_shapefile
from deafrica_tools.spatial import xr_rasterize
from deafrica_tools.classification import HiddenPrints
Connect to the datacube
Connect to the datacube database to enable loading Digital Earth Australia data.
[2]:
dc = datacube.Datacube(app='Analyse_multiple_polygons')
Analysis parameters
time_range
: Enter a time range for your query, e.g.('2019-01', '2019-12')
if you wanted data from all of 2019vector_file
: A path to a vector file (ESRI Shapefile or GeoJSON) containing polygons to load. For this example we have provided a demonstration Shapefileattribute_col
: A column in the vector file used to label the outputxarray
datasets containing satellite images. Each row of this column should have a unique identifierproducts
: A list of product names to load from the datacube, e.g.['ls8_c2l2', 'ls7_c2l2']
. In this example we will use only Sentinel-2 data,['s2_l2a']
measurements
: A list of band names to load from the satellite product, e.g.['red', 'green']
resolution
: The spatial resolution of the loaded satellite data in thex
andy
directions in metres. For this Sentinel-2 example, we have selected(-20, 20)
output_crs
: The coordinate reference system/map projection to load data into, e.g.'EPSG:6933'
to load data in an Equal Area projection for Africa
[3]:
time_range = ('2019-04', '2019-07')
vector_file = '../Supplementary_data/Analyse_multiple_polygons/multiple_polygons.shp'
attribute_col = 'id'
products = ['s2_l2a']
measurements = ['red', 'green', 'blue', 'nir']
resolution = (-20, 20)
output_crs = 'EPSG:6933'
Look at the structure of the vector file
Import the file and take a look at how the file is structured so we understand what we are iterating through. There are three polygons in the file:
We will also update the id
column to give each polygon a unique identifier. This will be used to identify the satelllite data corresponding with the polygon in the results dictionary
[4]:
#read shapefile
gdf = gpd.read_file(vector_file)
# add an ID column
gdf['id']=range(0, len(gdf))
#print gdf
gdf.head()
[4]:
id | class | geometry | |
---|---|---|---|
0 | 0 | 0 | POLYGON ((-0.55603 7.66642, -0.52764 7.67401, ... |
1 | 1 | 0 | POLYGON ((-0.53018 7.64633, -0.49693 7.65515, ... |
2 | 2 | 0 | POLYGON ((-0.566 7.62164, -0.54007 7.6283, -0.... |
We can then plot the geopandas.GeoDataFrame
using the function map_shapefile
to make sure it covers the area of interest we are concerned with:
[5]:
map_shapefile(gdf, attribute=attribute_col)
Create a datacube query object
We then create a dictionary that will contain the parameters that will be used to load data from the Digital Earth Africa datacube:
Note: We do not include the usual
x
andy
spatial query parameters here, as these will be taken directly from each of our vector polygon objects.
[6]:
query = {'time': time_range,
'measurements': measurements,
'resolution': resolution,
'output_crs': output_crs,
}
query
[6]:
{'time': ('2019-04', '2019-07'),
'measurements': ['red', 'green', 'blue', 'nir'],
'resolution': (-20, 20),
'output_crs': 'EPSG:6933'}
Loading satellite data
Here we will iterate through each row of the geopandas.GeoDataFrame
and load satellite data. The results will be appended to a dictionary object which we can later index to analyse each dataset.
[7]:
# Dictionary to save results
results = {}
# A progress indicator
i = 0
# Loop through polygons in geodataframe and extract satellite data
for index, row in gdf.iterrows():
print(" Feature {:02}/{:02}\r".format(i + 1, len(gdf)),
end='')
# Get the geometry
geom = Geometry(row.geometry.__geo_interface__, CRS(f'EPSG:{gdf.crs.to_epsg()}'))
# Update dc query with geometry
query.update({'geopolygon': geom})
# Load landsat (hide print statements)
with HiddenPrints():
ds = load_ard(dc=dc,
products=products,
min_gooddata=0.75,
group_by='solar_day',
**query)
# Generate a polygon mask to keep only data within the polygon
with HiddenPrints():
mask = xr_rasterize(gdf.iloc[[index]], ds)
# Mask dataset to set pixels outside the polygon to `NaN`
ds = ds.where(mask)
# Append results to a dictionary using the attribute
# column as an key
results.update({str(row[attribute_col]) : ds})
# Update counter
i += 1
Feature 01/03
/opt/venv/lib/python3.12/site-packages/rasterio/warp.py:387: NotGeoreferencedWarning: Dataset has no geotransform, gcps, or rpcs. The identity matrix will be returned.
dest = _reproject(
Feature 03/03
Further analysis
Our results
dictionary will contain xarray
objects labelled by the unique attribute_col
values we specified in the Analysis parameters
section:
[8]:
results
[8]:
{'0': <xarray.Dataset> Size: 2MB
Dimensions: (time: 4, y: 192, x: 178)
Coordinates:
* time (time) datetime64[ns] 32B 2019-05-02T10:39:04 ... 2019-06-26...
* y (y) float64 2kB 9.762e+05 9.761e+05 ... 9.724e+05 9.723e+05
* x (x) float64 1kB -5.365e+04 -5.363e+04 ... -5.013e+04 -5.011e+04
spatial_ref int32 4B 6933
Data variables:
red (time, y, x) float32 547kB nan nan nan nan ... nan nan nan nan
green (time, y, x) float32 547kB nan nan nan nan ... nan nan nan nan
blue (time, y, x) float32 547kB nan nan nan nan ... nan nan nan nan
nir (time, y, x) float32 547kB nan nan nan nan ... nan nan nan nan
Attributes:
crs: EPSG:6933
grid_mapping: spatial_ref,
'1': <xarray.Dataset> Size: 3MB
Dimensions: (time: 6, y: 169, x: 178)
Coordinates:
* time (time) datetime64[ns] 48B 2019-04-17T10:38:59 ... 2019-06-26...
* y (y) float64 1kB 9.738e+05 9.738e+05 ... 9.704e+05 9.704e+05
* x (x) float64 1kB -5.115e+04 -5.113e+04 ... -4.763e+04 -4.761e+04
spatial_ref int32 4B 6933
Data variables:
red (time, y, x) float32 722kB nan nan nan nan ... nan nan nan nan
green (time, y, x) float32 722kB nan nan nan nan ... nan nan nan nan
blue (time, y, x) float32 722kB nan nan nan nan ... nan nan nan nan
nir (time, y, x) float32 722kB nan nan nan nan ... nan nan nan nan
Attributes:
crs: EPSG:6933
grid_mapping: spatial_ref,
'2': <xarray.Dataset> Size: 3MB
Dimensions: (time: 5, y: 189, x: 166)
Coordinates:
* time (time) datetime64[ns] 40B 2019-04-12T10:39:01 ... 2019-06-26...
* y (y) float64 2kB 9.704e+05 9.704e+05 ... 9.666e+05 9.666e+05
* x (x) float64 1kB -5.461e+04 -5.459e+04 ... -5.133e+04 -5.131e+04
spatial_ref int32 4B 6933
Data variables:
red (time, y, x) float32 627kB nan nan nan nan ... nan nan nan nan
green (time, y, x) float32 627kB nan nan nan nan ... nan nan nan nan
blue (time, y, x) float32 627kB nan nan nan nan ... nan nan nan nan
nir (time, y, x) float32 627kB nan nan nan nan ... nan nan nan nan
Attributes:
crs: EPSG:6933
grid_mapping: spatial_ref}
Enter one of those values below to index our dictionary and conduct further analsyis on the satellite timeseries for that polygon.
[9]:
key = '2'
Plot an RGB image
We can now use the dea_plotting.rgb
function to plot our loaded data as a three-band RGB plot:
[10]:
rgb(results[key], col='time', size=4)

Calculate NDVI and plot
We can also apply analyses to data loaded for each of our polygons. For example, we can calculate the normalised difference vegetation index (NDVI) to identify areas of growing vegetation.
Note: NDVI can be calculated using the calculate_indices function, imported from deafrica_tools.bandindices. Here, we use
satellite_mission='s2'
since we are working with Sentinel-2 data.
[11]:
# Calculate band index
ndvi = calculate_indices(results[key], index='NDVI', satellite_mission='s2')
# Plot NDVI for each polygon for the time query
ndvi.NDVI.plot(col='time', cmap='YlGn', vmin=0.0, vmax=1, figsize=(25, 4))
plt.show()

Additional information
License: The code in this notebook is licensed under the Apache License, Version 2.0. Digital Earth Africa data is licensed under the Creative Commons by Attribution 4.0 license.
Contact: If you need assistance, please post a question on the Open Data Cube Slack channel or on the GIS Stack Exchange using the open-data-cube
tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on
Github.
Compatible datacube version:
[12]:
print(datacube.__version__)
1.8.20
Last Tested:
[13]:
from datetime import datetime
datetime.today().strftime('%Y-%m-%d')
[13]:
'2025-01-15'