Determining seasonal extent of waterbodies with Sentinel-2

Products used: s2_l2a,

Keywords: data used; sentinel-2, water; extent, analysis; time series, band index; MNDWI, visualisation; animation

Background

The United Nations have prescribed 17 “Sustainable Development Goals” (SDGs). This notebook attempts to monitor SDG Indicator 6.6.1 - change in the extent of water-related ecosystems. Indicator 6.6.1 has 4 sub-indicators: > i. The spatial extent of water-related ecosystems > ii. The quantity of water contained within these ecosystems > iii. The quality of water within these ecosystems > iv. The health or state of these ecosystems

This notebook primarily focuses on the first sub-indicator - spatial extents.

Description

The notebook demonstrates how to:

Load satellite data over the water body of interest
Calculate the water index MNDWI
Resample the time-series of MNDWI to seasonal medians
Generate an animation of the water extent time-series
Calculate and plot a time series of seassonal water extent (in square kilometres)
Find the minimum and maximum water extents in the time-series and plot them.
Compare two nominated time-periods, and plot where the water-body extent has changed.

Getting started

To run this analysis, run all the cells in the notebook, starting with the “Load packages” cell.

Load packages

Import Python packages that are used for the analysis.

[1]:

%matplotlib inline

import datacube
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr
import geopandas as gpd
from IPython.display import Image
from matplotlib.colors import ListedColormap
from matplotlib.patches import Patch

from odc.geo.geom import Geometry
from deafrica_tools.datahandling import load_ard
from deafrica_tools.bandindices import calculate_indices
from deafrica_tools.plotting import display_map, xr_animation
from deafrica_tools.dask import create_local_dask_cluster
from deafrica_tools.spatial import xr_rasterize
from deafrica_tools.areaofinterest import define_area

Connect to the datacube

Activate the datacube database, which provides functionality for loading and displaying stored Earth observation data.

[2]:

dc = datacube.Datacube(app='water_extent')

Set up a Dask cluster

Dask can be used to better manage memory use and conduct the analysis in parallel. For an introduction to using Dask with Digital Earth Africa, see the Dask notebook.

Note: We recommend opening the Dask processing window to view the different computations that are being executed; to do this, see the Dask dashboard in DE Africa section of the Dask notebook.

To activate Dask, set up the local computing cluster using the cell below.

[3]:

create_local_dask_cluster()

Client

Client-880a55b6-476f-11f0-8241-a22f78693c99

Connection method: Cluster object	Cluster type: distributed.LocalCluster
Dashboard: /user/nanaboamah89@gmail.com/proxy/8787/status

Cluster Info

LocalCluster

26918b16

Dashboard: /user/nanaboamah89@gmail.com/proxy/8787/status	Workers: 1
Total threads: 4	Total memory: 26.21 GiB
Status: running	Using processes: True

Scheduler Info

Scheduler

Scheduler-912db550-61c8-432f-81e9-1f59b78725a8

Comm: tcp://127.0.0.1:46779	Workers: 1
Dashboard: /user/nanaboamah89@gmail.com/proxy/8787/status	Total threads: 4
Started: Just now	Total memory: 26.21 GiB

Workers

Worker: 0

Comm: tcp://127.0.0.1:36285	Total threads: 4
Dashboard: /user/nanaboamah89@gmail.com/proxy/37211/status	Memory: 26.21 GiB
Nanny: tcp://127.0.0.1:44913
Local directory: /tmp/dask-scratch-space/worker-qj36xgu3

Analysis parameters

The following cell sets the parameters, which define the area of interest and the length of time to conduct the analysis over.

The parameters are:

lat: The central latitude to analyse (e.g. 10.338).
lon : The central longitude to analyse (e.g. -1.055).
lat_buffer : The number of degrees to load around the central latitude.
lon_buffer : The number of degrees to load around the central longitude.
start_year and end_year: The date range to analyse (e.g. ('2017', '2020').

Select location

To define the area of interest, there are two methods available:

By specifying the latitude, longitude, and buffer. This method requires you to input the central latitude, central longitude, and the buffer value in square degrees around the center point you want to analyze. For example, lat = 10.338, lon = -1.055, and buffer = 0.1 will select an area with a radius of 0.1 square degrees around the point with coordinates (10.338, -1.055).

Alternatively, you can provide separate buffer values for latitude and longitude for a rectangular area. For example, lat = 10.338, lon = -1.055, and lat_buffer = 0.1 andlon_buffer = 0.08 will select a rectangular area extending 0.1 degrees north and south, and 0.08 degrees east and west from the point (10.338, -1.055).

For reasonable loading times, set the buffer as 0.1 or lower.
By uploading a polygon as a GeoJSON or Esri Shapefile. If you choose this option, you will need to upload the geojson or ESRI shapefile into the Sandbox using Upload Files button in the top left corner of the Jupyter Notebook interface. ESRI shapefiles must be uploaded with all the related files (.cpg, .dbf, .shp, .shx). Once uploaded, you can use the shapefile or geojson to define the area of interest. Remember to update the code to call the file you have uploaded.

To use one of these methods, you can uncomment the relevant line of code and comment out the other one. To comment out a line, add the "#" symbol before the code you want to comment out. By default, the first option which defines the location using latitude, longitude, and buffer is being used.

If running the notebook for the first time, keep the default settings below. This will demonstrate how the analysis works and provide meaningful results. The example covers part of the Lake Sulunga. Tanzania.

[4]:

# Method 1: Specify the latitude, longitude, and buffer
aoi = define_area(lat=-5.9460, lon=35.5188 , buffer=0.03)

# Method 2: Use a polygon as a GeoJSON or Esri Shapefile.
# aoi = define_area(vector_path='aoi.shp')

#Create a geopolygon and geodataframe of the area of interest
geopolygon = Geometry(aoi["features"][0]["geometry"], crs="epsg:4326")
geopolygon_gdf = gpd.GeoDataFrame(geometry=[geopolygon], crs=geopolygon.crs)

# Get the latitude and longitude range of the geopolygon
lat_range = (geopolygon_gdf.total_bounds[1], geopolygon_gdf.total_bounds[3])
lon_range = (geopolygon_gdf.total_bounds[0], geopolygon_gdf.total_bounds[2])


# Define the start year and end year
start_year = '2017'
end_year = '2021-05'

View the area of Interest on an interactive map

The next cell will display the selected area on an interactive map. The red border represents the area of interest of the study. Zoom in and out to get a better understanding of the area of interest. Clicking anywhere on the map will reveal the latitude and longitude coordinates of the clicked point.

[5]:

display_map(lon_range, lat_range)

[5]:

Make this Notebook Trusted to load map: File -> Trust Notebook

Load cloud-masked satellite data

The code below will create a query dictionary for our region of interest, and then load Sentinel-2 satellite data. For more information on loading data, see the Loading data notebook.

[6]:

#Create a query object
query = {
    'x': lon_range,
    'y': lat_range,
    'resolution': (-20, 20),
    'output_crs':'EPSG:6933',
    'time': (start_year, end_year),
    'dask_chunks':{'time':1,'x':500,'y':500}
}

#load Sentinel 2 data
ds = load_ard(dc=dc,
             products=['s2_l2a'],
             measurements=['green','swir_1'],
             mask_filters=[("opening", 3),("dilation", 2)], #improve cloud mask
             group_by='solar_day',
             **query)

print(ds)

Using pixel quality parameters for Sentinel 2
Finding datasets
    s2_l2a
Applying morphological filters to pq mask [('opening', 3), ('dilation', 2)]
Applying pixel quality/cloud mask
Returning 174 time steps as a dask array
<xarray.Dataset> Size: 154MB
Dimensions:      (time: 174, y: 382, x: 290)
Coordinates:
  * time         (time) datetime64[ns] 1kB 2019-01-04T08:01:41 ... 2021-05-28...
  * y            (y) float64 3kB -7.534e+05 -7.534e+05 ... -7.61e+05 -7.61e+05
  * x            (x) float64 2kB 3.424e+06 3.424e+06 ... 3.43e+06 3.43e+06
    spatial_ref  int32 4B 6933
Data variables:
    green        (time, y, x) float32 77MB dask.array<chunksize=(1, 382, 290), meta=np.ndarray>
    swir_1       (time, y, x) float32 77MB dask.array<chunksize=(1, 382, 290), meta=np.ndarray>
Attributes:
    crs:           EPSG:6933
    grid_mapping:  spatial_ref

Clip the datasets to the shape of the area of interest

A geopolygon represents the bounds and not the actual shape because it is designed to represent the extent of the geographic feature being mapped, rather than the exact shape. In other words, the geopolygon is used to define the outer boundary of the area of interest, rather than the internal features and characteristics.

Clipping the data to the exact shape of the area of interest is important because it helps ensure that the data being used is relevant to the specific study area of interest. While a geopolygon provides information about the boundary of the geographic feature being represented, it does not necessarily reflect the exact shape or extent of the area of interest.

[7]:

#Rasterise the area of interest polygon
aoi_raster = xr_rasterize(gdf=geopolygon_gdf, da=ds, crs=ds.crs)
#Mask the dataset to the rasterised area of interest
ds = ds.where(aoi_raster == 1)

Calculate the MNDWI water index

[8]:

# Calculate the chosen vegetation proxy index and add it to the loaded data set
ds = calculate_indices(ds=ds, index='MNDWI', satellite_mission='s2', drop=True)

Dropping bands ['green', 'swir_1']

Resample time series

Due to many factors (e.g. cloud obscuring the region, missed cloud cover in the fmask layer) the data will be gappy and noisy. Here, we will resample the data to ensure we working with a consistent time-series.

To do this we resample the data to seasonal time-steps using medians

These calculations will take several minutes to complete as we will run .compute(), triggering all the tasks we scheduled above and bringing the arrays into memory.

[9]:

%%time
sample_frequency="QS-DEC"  # quarterly starting in DEC, i.e. seasonal

#resample using medians
print('calculating MNDWI seasonal medians...')
mndwi = ds['MNDWI'].resample(time=sample_frequency).median().compute()

calculating MNDWI seasonal medians...

/opt/venv/lib/python3.12/site-packages/rasterio/warp.py:387: NotGeoreferencedWarning: Dataset has no geotransform, gcps, or rpcs. The identity matrix will be returned.
  dest = _reproject(
/opt/venv/lib/python3.12/site-packages/dask/utils.py:78: RuntimeWarning: All-NaN slice encountered
  return func(*args, **kwargs)

CPU times: user 4.49 s, sys: 363 ms, total: 4.85 s
Wall time: 1min 18s

Animating time series

In the next cell, we plot the dataset we loaded above as an animation GIF, using the `xr_animation <../Frequently_used_code/Animated_timeseries.ipynb>`__ function. The output_path will be saved in the directory where the script is found and you can change the names to prevent files overwrite.

[11]:

out_path = 'water_extent.gif'

xr_animation(ds=mndwi.to_dataset(name='MNDWI'),
             output_path=out_path,
             bands = ['MNDWI'],
             show_text = 'Seasonal MNDWI',
             interval=500,
             width_pixels=300,
             show_colorbar=True,
             imshow_kwargs={'cmap':'RdBu','vmin': -0.5, 'vmax': 0.5},
             colorbar_kwargs={'colors': 'black'}
            )

# Plot animated gif
plt.close()
Image(filename=out_path)

Exporting animation to water_extent.gif

[11]:

<IPython.core.display.Image object>

Calculate the area per pixel

The number of pixels can be used for the area of the waterbody if the pixel area is known. Run the following cell to generate the necessary constants for performing this conversion.

[12]:

pixel_length = query["resolution"][1]  # in metres
m_per_km = 1000  # conversion from metres to kilometres
area_per_pixel = pixel_length**2 / m_per_km**2

Calculating the extent of water

Calculates the area of pixels classified as water (if MNDWI is > 0, then water)

[13]:

water = mndwi.where(mndwi > 0, np.nan)
area_ds = water.where(np.isnan(water),1)
ds_valid_water_area = area_ds.sum(dim=['x', 'y']) * area_per_pixel

Plot seasonal time series from the `Start year` to `End year`

[14]:

plt.figure(figsize=(18, 4))
ds_valid_water_area.plot(marker='o', color='#9467bd')
plt.title(f'Observed Seasonal Area of Water from {start_year} to {end_year}')
plt.xlabel('Dates')
plt.ylabel('Waterbody area (km$^2$)')
plt.tight_layout()

../../../_images/sandbox_notebooks_Real_world_examples_Water_extent_sentinel_2_32_0.png

Determine minimum and maximum water extent

The next cell extract the Minimum and Maximum extent of water from the dataset using the min and max functions, we then add the dates to an xarray.DataArray.

[15]:

min_water_area_date, max_water_area_date =  min(ds_valid_water_area), max(ds_valid_water_area)
time_xr = xr.DataArray([min_water_area_date.time.values, max_water_area_date.time.values], dims=["time"])

print(time_xr)

<xarray.DataArray (time: 2)> Size: 16B
array(['2019-09-01T00:00:00.000000000', '2021-03-01T00:00:00.000000000'],
      dtype='datetime64[ns]')
Dimensions without coordinates: time

Plot the dates when the min and max water extent occur

Plot water classified pixel for the two dates where we have the minimum and maximum surface water extent.

[16]:

area_ds.sel(time=time_xr).plot.imshow(col="time", col_wrap=2, figsize=(14, 6));

../../../_images/sandbox_notebooks_Real_world_examples_Water_extent_sentinel_2_36_0.png

Compare two time periods

The following cells determine the maximum extent of water for two different years.

baseline_year : The baseline year for the analysis
analysis_year : The year to compare to the baseline year

[17]:

baseline_time = '2019-03-01'
analysis_time = '2020-03-01'

baseline_ds, analysis_ds = ds_valid_water_area.sel(time=baseline_time, method ='nearest'),ds_valid_water_area.sel(time=analysis_time, method ='nearest')

A new dataArray is created to store the new date from the maximum water extent for the two years

[18]:

time_xr = xr.DataArray([baseline_ds.time.values, analysis_ds.time.values], dims=["time"])

Plotting

Plot water extent of the MNDWI product for the two chosen periods.

[19]:

area_ds.sel(time=time_xr).plot(col="time", col_wrap=2, robust=True, figsize=(10, 5), cmap='viridis', add_colorbar=False);

../../../_images/sandbox_notebooks_Real_world_examples_Water_extent_sentinel_2_42_0.png

Calculating the change for the two nominated periods

The cells below calculate the amount of water gain, loss and stable for the two periods

[20]:

# The two period Extract the two periods(Baseline and analysis) dataset from
ds_selected = area_ds.where(area_ds == 1, 0).sel(time=time_xr)

analyse_total_value = ds_selected[1]
change = analyse_total_value - ds_selected[0]

water_appeared = change.where(change == 1)
permanent_water = change.where((change == 0) & (analyse_total_value == 1))
permanent_land = change.where((change == 0) & (analyse_total_value == 0))
water_disappeared = change.where(change == -1)

The cell below calculate the area of water extent for water_loss, water_gain, permanent water and land

[21]:

total_area = analyse_total_value.count().values * area_per_pixel
water_apperaed_area = water_appeared.count().values * area_per_pixel
permanent_water_area = permanent_water.count().values * area_per_pixel
water_disappeared_area = water_disappeared.count().values * area_per_pixel

Plotting

The water variables are plotted to visualised the result

[22]:

water_appeared_color = "Green"
water_disappeared_color = "Yellow"
stable_color = "Blue"
land_color = "Brown"

fig, ax = plt.subplots(1, 1, figsize=(10, 10))

ds_selected[1].plot.imshow(cmap="Pastel1",
                                       add_colorbar=False,
                                       add_labels=False,
                                       ax=ax)
water_appeared.plot.imshow(
    cmap=ListedColormap([water_appeared_color]),
    add_colorbar=False,
    add_labels=False,
    ax=ax,
)
water_disappeared.plot.imshow(
    cmap=ListedColormap([water_disappeared_color]),
    add_colorbar=False,
    add_labels=False,
    ax=ax,
)
permanent_water.plot.imshow(cmap=ListedColormap([stable_color]),
                            add_colorbar=False,
                            add_labels=False,
                            ax=ax)

plt.legend(
    [
        Patch(facecolor=stable_color),
        Patch(facecolor=water_disappeared_color),
        Patch(facecolor=water_appeared_color),
        Patch(facecolor=land_color),
    ],
    [
        f"Water to Water {round(permanent_water_area, 2)} km2",
        f"Water to No Water {round(water_disappeared_area, 2)} km2",
        f"No Water to Water: {round(water_apperaed_area, 2)} km2",
    ],
    loc="lower left",
)

plt.title("Change in water extent: " + baseline_time + " to " + analysis_time);

../../../_images/sandbox_notebooks_Real_world_examples_Water_extent_sentinel_2_48_0.png

Next steps

Return to the “Analysis parameters” section, modify some values (e.g. latitude, longitude, start_year, end_year) and re-run the analysis. You can use the interactive map in the “View the selected location” section to find new central latitude and longitude values by panning and zooming, and then clicking on the area you wish to extract location values for. You can also use Google maps to search for a location you know, then return the latitude and longitude values by clicking the map.

Change the year also in “Compare Two Time Periods - a Baseline and an Analysis” section, (e.g. base_year, analyse_year) and re-run the analysis.

Additional information

License The code in this notebook is licensed under the Apache License, Version 2.0.

Digital Earth Africa data is licensed under the Creative Commons by Attribution 4.0 license.

Contact If you need assistance, please post a question on the DE Africa Slack channel or on the GIS Stack Exchange using the open-data-cube tag (you can view previously asked questions here).

If you would like to report an issue with this notebook, you can file one on Github.

Compatible datacube version

[23]:

print(datacube.__version__)

1.8.20

Last Tested:

[24]:

from datetime import datetime
datetime.today().strftime('%Y-%m-%d')

[24]:

'2025-06-12'

Determining seasonal extent of waterbodies with Sentinel-2

Background

Description

Getting started

Load packages

Connect to the datacube

Set up a Dask cluster

Client

Cluster Info

LocalCluster

Scheduler Info

Scheduler

Workers

Worker: 0

Analysis parameters

Select location

View the area of Interest on an interactive map

Load cloud-masked satellite data

Clip the datasets to the shape of the area of interest

Calculate the MNDWI water index

Resample time series

Facet plot the MNDWI time-steps

Animating time series

Calculate the area per pixel

Calculating the extent of water

Plot seasonal time series from the Start year to End year

Determine minimum and maximum water extent

Plot the dates when the min and max water extent occur

Compare two time periods

Plotting

Calculating the change for the two nominated periods

Plotting

Next steps

Additional information

Plot seasonal time series from the `Start year` to `End year`