Per-pixel vegetation phenology in the Ruko Conservancy

Products used: s2_l2a

Background

Phenology is the study of plant and animal life cycles in the context of the seasons. It can be useful in understanding the life cycle trends of crops and how the growing seasons are affected by changes in climate. For more information, see the USGS page on deriving phenology.

Description

This notebook will produce two-dimensional (phenology per-pixel) plot for a given year.

A number of steps are required to produce the desired outputs:

Load satellite data for a region specified by an vector file (shapefile or geojson)
Buffer the cloud masking layer to better mask clouds in the data (Sentinel-2 cloud mask is quite poor)
Further prepare the data for analysis byt removing bad values (infs), masking surafce water, and removing outliers in the vegetation index.
Interpolate and smooth the time-series to ensure a consistent dataset with all gaps and noise removed.
Calculate per-pixel phenology statistics
Plot the results and export the plot to disk as a .png

Getting started

To run this analysis, run all the cells in the notebook, starting with the “Load packages” cell.

Load packages

Load key Python packages and supporting functions for the analysis.

[1]:

%matplotlib inline


import datacube
import numpy as np
import pandas as pd
import xarray as xr
import datetime as dt
import geopandas as gpd
import matplotlib.pyplot as plt
import matplotlib as mpl
from odc.geo.geom import Geometry

from deafrica_tools.datahandling import load_ard
from deafrica_tools.bandindices import calculate_indices
from deafrica_tools.plotting import map_shapefile
import deafrica_tools.temporal as ts
from deafrica_tools.dask import create_local_dask_cluster
from deafrica_tools.spatial import xr_rasterize
from deafrica_tools.classification import HiddenPrints

from datacube.utils.aws import configure_s3_access
configure_s3_access(aws_unsigned=True, cloud_defaults=True)

Set up a Dask cluster

Dask can be used to better manage memory use and conduct the analysis in parallel. For an introduction to using Dask with Digital Earth Africa, see the Dask notebook.

Note: We recommend opening the Dask processing window to view the different computations that are being executed; to do this, see the Dask dashboard in DE Africa section of the Dask notebook.

To use Dask, set up the local computing cluster using the cell below.

[2]:

create_local_dask_cluster(spare_mem='2Gb')

/opt/venv/lib/python3.12/site-packages/distributed/node.py:188: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 34803 instead
  warnings.warn(

Client

Client-b49dd2a1-43c6-11f1-9e1f-ae180e9931c9

Connection method: Cluster object	Cluster type: distributed.LocalCluster
Dashboard: /user/mpho.sadiki@digitalearthafrica.org/proxy/34803/status

Cluster Info

LocalCluster

35e785f3

Dashboard: /user/mpho.sadiki@digitalearthafrica.org/proxy/34803/status	Workers: 1
Total threads: 4	Total memory: 27.14 GiB
Status: running	Using processes: True

Scheduler Info

Scheduler

Scheduler-483a2aa2-4063-4170-98a4-17201d6bb557

Comm: tcp://127.0.0.1:40127	Workers: 0
Dashboard: /user/mpho.sadiki@digitalearthafrica.org/proxy/34803/status	Total threads: 0
Started: Just now	Total memory: 0 B

Workers

Worker: 0

Comm: tcp://127.0.0.1:42097	Total threads: 4
Dashboard: /user/mpho.sadiki@digitalearthafrica.org/proxy/46871/status	Memory: 27.14 GiB
Nanny: tcp://127.0.0.1:34787
Local directory: /tmp/dask-scratch-space/worker-motyduvx

Analysis parameters

The following cell sets important parameters for the analysis:

veg_proxy: Band index to use as a proxy for vegetation health e.g. 'NDVI' or 'EVI'
product: The satellite product to load. Either Sentinel-2: 's2_l2a', or Landsat-8: 'ls8_cl2'
shapefile: The path to the vector file delineating the analysis region. Can be a shapefile or a geojson
year: Entire the year to anlayse, e.g. '2017'
min_gooddata: the fraction of good data (not cloudy) a scene must have before it is returned as a dataset
resolution: The pixel resolution, in metres, of the returned dataset
dask_chunks: The size, in number of pixel, for the dask chunks on each dimension.

[3]:

veg_proxy = 'NDVI'

product = 's2_l2a'

shapefile='data/Ruko_conservancy.geojson'

year = '2018'

resolution = (-20,20)

dask_chunks = {'x':750, 'y':750, 'time':1}

Connect to the datacube

Connect to the datacube so we can access DE Africa data. The app parameter is a unique name for the analysis which is based on the notebook file name.

[4]:

dc = datacube.Datacube(app='Vegetation_phenology')

View the region of interest

The next cell will display the selected area on an web map.

[5]:

#First open the shapefile using geopandas
gdf = gpd.read_file(shapefile)

[6]:

map_shapefile(gdf, attribute='ConsrvName')

Load cloud-masked Sentinel-2 data

The first step is to load Sentinel-2 data for the specified area of interest and time range. The load_ard function is used here to load data that has been masked for cloud, shadow and quality filters, making it ready for analysis.

The cell directly below will create a query object using the first geometry in the shapefile, along with the parameters we defined in the Analysis Parameters section above.

[7]:

# Create a reusable query
geom = Geometry(geom=gdf.iloc[0].geometry, crs=gdf.crs)

query = {
    "geopolygon": geom,
    'time': year,
    'measurements': ['red','nir','green','swir_1'],
    'resolution': resolution,
    'output_crs': 'epsg:6933',
    'group_by':'solar_day'
}

Load available data from S2:

[8]:

filters=[("opening", 3),("dilation", 2)]

ds = load_ard(
    dc=dc,
    products=['s2_l2a'],
    dask_chunks=dask_chunks,
    mask_filters=filters,
    **query,
)

print(ds)

Using pixel quality parameters for Sentinel 2
Finding datasets
    s2_l2a
Applying morphological filters to pq mask [('opening', 3), ('dilation', 2)]

/opt/venv/lib/python3.12/site-packages/odc/algo/_masking.py:425: FutureWarning: `binary_opening` is deprecated since version 0.26 and will be removed in version 0.28. Use `skimage.morphology.opening` instead.
  mask = op(mask, _disk(radius, mask.ndim))
/opt/venv/lib/python3.12/site-packages/odc/algo/_masking.py:425: FutureWarning: `binary_dilation` is deprecated since version 0.26 and will be removed in version 0.28. Use `skimage.morphology.dilation` instead. Note the lack of mirroring for non-symmetric footprints (see docstring notes).
  mask = op(mask, _disk(radius, mask.ndim))

Applying pixel quality/cloud mask

/opt/venv/lib/python3.12/site-packages/odc/algo/_masking.py:425: FutureWarning: `binary_opening` is deprecated since version 0.26 and will be removed in version 0.28. Use `skimage.morphology.opening` instead.
  mask = op(mask, _disk(radius, mask.ndim))
/opt/venv/lib/python3.12/site-packages/odc/algo/_masking.py:425: FutureWarning: `binary_dilation` is deprecated since version 0.26 and will be removed in version 0.28. Use `skimage.morphology.dilation` instead. Note the lack of mirroring for non-symmetric footprints (see docstring notes).
  mask = op(mask, _disk(radius, mask.ndim))

Returning 72 time steps as a dask array

/opt/venv/lib/python3.12/site-packages/deafrica_tools/datahandling.py:565: FutureWarning: In a future version of xarray the default value for compat will change from compat='no_conflicts' to compat='override'. This is likely to lead to different results when combining overlapping variables with the same name. To opt in to new defaults and get rid of these warnings now use `set_options(use_new_combine_kwarg_defaults=True) or set compat explicitly.
  ds = xr.merge([ds_data, ds_masks])

<xarray.Dataset> Size: 835MB
Dimensions:      (time: 72, y: 1194, x: 607)
Coordinates:
  * time         (time) datetime64[ns] 576B 2018-01-02T08:07:58 ... 2018-12-2...
  * y            (y) float64 10kB 9.461e+04 9.459e+04 ... 7.077e+04 7.075e+04
  * x            (x) float64 5kB 3.479e+06 3.479e+06 ... 3.491e+06 3.491e+06
    spatial_ref  int32 4B 6933
Data variables:
    red          (time, y, x) float32 209MB dask.array<chunksize=(1, 750, 607), meta=np.ndarray>
    nir          (time, y, x) float32 209MB dask.array<chunksize=(1, 750, 607), meta=np.ndarray>
    green        (time, y, x) float32 209MB dask.array<chunksize=(1, 750, 607), meta=np.ndarray>
    swir_1       (time, y, x) float32 209MB dask.array<chunksize=(1, 750, 607), meta=np.ndarray>
Attributes:
    crs:           EPSG:6933
    grid_mapping:  spatial_ref

Mask the satellite data with shape

[9]:

#create mask
mask = xr_rasterize(gdf,ds)

#mask data
ds = ds.where(mask)

#convert to float 32 to conserve memory
ds=ds.astype(np.float32)

Calculate vegetation and water indices

[10]:

# Calculate the chosen vegetation proxy index and add it to the loaded data set
ds = calculate_indices(ds, index=[veg_proxy, 'MNDWI'], satellite_mission='s2', drop=True)

Dropping bands ['red', 'nir', 'green', 'swir_1']

Prepare data for analysis

Remove any NaN or infinite values, mask water, remove any outliers in the vegetation index. We then reduce the data to a 1D timeseries by calculting the mean across the x and y dimensions.

This cell will take a couple of minutes to calculate since the water_mask needs to be brought into memory (we use this later to mask the phenology results).

[11]:

# remove any infinite values
ds = ds.where(~np.isinf(ds))

# mask water
ds = ds.where(ds.MNDWI < 0)

#create a all-time water mask for use later
water_mask = ds.MNDWI < 0
water_mask = water_mask.max('time')
water_mask = water_mask.compute()

#remove outliers (if EVI greater than 1.0, set to NaN)
ds[veg_proxy] = xr.where(ds[veg_proxy]>1.0, np.nan, ds[veg_proxy])
ds[veg_proxy] = xr.where(ds[veg_proxy]<0, np.nan, ds[veg_proxy])

# create 1D line plots
veg = ds[veg_proxy]

Smooth and interpolate time series

Due to many factors (e.g. cloud obscuring the region, missed cloud cover in the SCL layer) the data will be gappy and noisy. Here, we will smooth and interpolate the data to ensure we working with a consistent time-series.

To do this we take two steps:

Resample the data to fortnightly time-steps using the median
Calculate a rolling mean with a window of 4 steps

These calculations will take several minutes to complete as we will run .compute(), triggering all the tasks we scheduled above and bringing the arrays into memory.

[12]:

resample_period='2W'
window=4

# calculate median first, then bring into memory with .compute()
veg_smooth = veg.resample(time=resample_period, label='left').median('time')
# Update the time coordinates of the resampled dataset.
veg_smooth = veg_smooth.assign_coords(time=(veg_smooth.time + np.timedelta64(1, 'W')))
veg_smooth = veg_smooth.compute()

#calculate rolling mean to smooth time-series
veg_smooth=veg_smooth.rolling(time=window, min_periods=1).mean()

Compute 2D phenology statistics

Below we specify the statistics to calculate, and the method we’ll use for determining the statistics. Options are ‘first’ & ‘median’ for method_sos, and ‘last’ & ‘median’ for method_eos.

method_sos : str
        If 'first' then vSOS is estimated as the first positive
        slope on the greening side of the curve. If 'median',
        then vSOS is estimated as the median value of the postive
        slopes on the greening side of the curve.

method_eos : str
        If 'last' then vEOS is estimated as the last negative slope
        on the senescing side of the curve. If 'median', then vEOS is
        estimated as the 'median' value of the negative slopes on the
        senescing side of the curve.

[13]:

pheno_stats = ['SOS','vSOS','POS','vPOS','EOS','vEOS','Trough','LOS','AOS','ROG','ROS']
method_sos = 'first'
method_eos = 'last'

[14]:

with HiddenPrints():

    phen=ts.xr_phenology(
            veg_smooth,
            method_sos=method_sos,
            method_eos=method_eos,
            stats=pheno_stats,
                )

print(phen)

<xarray.Dataset> Size: 41MB
Dimensions:      (y: 1194, x: 607)
Coordinates:
  * y            (y) float64 10kB 9.461e+04 9.459e+04 ... 7.077e+04 7.075e+04
  * x            (x) float64 5kB 3.479e+06 3.479e+06 ... 3.491e+06 3.491e+06
    spatial_ref  int32 4B 0
Data variables:
    SOS          (y, x) datetime64[ns] 6MB NaT NaT NaT NaT ... NaT NaT NaT NaT
    vSOS         (y, x) float32 3MB nan nan nan nan nan ... nan nan nan nan nan
    POS          (y, x) datetime64[ns] 6MB NaT NaT NaT NaT ... NaT NaT NaT NaT
    vPOS         (y, x) float32 3MB nan nan nan nan nan ... nan nan nan nan nan
    EOS          (y, x) datetime64[ns] 6MB NaT NaT NaT NaT ... NaT NaT NaT NaT
    vEOS         (y, x) float32 3MB nan nan nan nan nan ... nan nan nan nan nan
    Trough       (y, x) float32 3MB nan nan nan nan nan ... nan nan nan nan nan
    LOS          (y, x) float32 3MB nan nan nan nan nan ... nan nan nan nan nan
    AOS          (y, x) float32 3MB nan nan nan nan nan ... nan nan nan nan nan
    ROG          (y, x) float32 3MB nan nan nan nan nan ... nan nan nan nan nan
    ROS          (y, x) float32 3MB nan nan nan nan nan ... nan nan nan nan nan
Attributes:
    grid_mapping:  spatial_ref

Re-mask data

The phenology code has methods for handling pixels with only NaNs (such as those regions outside of the polygon mask), so the results can have phenology results for regions over water and outside the mask. We will therefore have to mask the data again.

[15]:

#mask with polygon
phen = phen.where(mask)

#mask with water-mask
phen = phen.where(water_mask)

Plot the 2D phenology statistics

The phenology statistics have been calculated seperately for every pixel in the image. Let’s plot each of them to see the results.

[16]:

year_to_plot = year

# Define a few items to aid in plotting.
start_date = dt.date(int(year_to_plot), 1, 1)
end_date = dt.date(int(year_to_plot), 10, 27)

date_list = pd.date_range(start_date, end_date, freq="MS")
bounds = [int(i.strftime("%Y%m%d")) for i in date_list]


@mpl.ticker.FuncFormatter
def float_to_date(x, pos):
    tick_str = str(int(x))
    year = tick_str[:4]
    month = tick_str[4:6]
    day = tick_str[6:]
    return f"{year}-{month}-{day}"


# set up figure
fig, ax = plt.subplots(nrows=2,
                       ncols=5,
                       figsize=(18, 8),
                       sharex=True,
                       sharey=True)

# set colorbar size
cbar_size = 0.7

# set aspect ratios
for a in fig.axes:
    a.set_aspect("equal")

# start of season
# Convert SOS values from np.date64 to float values for plotting
cax = phen.SOS.dt.dayofyear.plot(
    ax=ax[0, 0],
    cmap="magma_r",
#     levels=bounds,
    add_colorbar=True,
    cbar_kwargs=dict(shrink=cbar_size, label=None),
)
ax[0, 0].set_title("Start of Season")

phen.vSOS.plot(ax=ax[0, 1],
               cmap="YlGn",
               vmax=0.8,
               cbar_kwargs=dict(shrink=cbar_size, label=None))
ax[0, 1].set_title(veg_proxy + " at SOS")

# peak of season
# # Convert POS values from np.date64 to float values for plotting
cax = phen.POS.dt.dayofyear.plot(
    ax=ax[0, 2],
    cmap="magma_r",
#     levels=bounds,
    add_colorbar=True,
    cbar_kwargs=dict(shrink=cbar_size, label=None),
)
ax[0, 2].set_title("Peak of Season")

phen.vPOS.plot(ax=ax[0, 3],
               cmap="YlGn",
               vmax=0.8,
               cbar_kwargs=dict(shrink=cbar_size, label=None))
ax[0, 3].set_title(veg_proxy + " at POS")

# end of season
# Convert EOS values from np.date64 to float values for plotting
cax = phen.EOS.dt.dayofyear.plot(
    ax=ax[0, 4],
    cmap="magma_r",
#     levels=bounds,
    add_colorbar=True,
    cbar_kwargs=dict(shrink=cbar_size, label=None),
)
ax[0, 4].set_title("End of Season")

phen.vEOS.plot(ax=ax[1, 0],
               cmap="YlGn",
               vmax=0.8,
               cbar_kwargs=dict(shrink=cbar_size, label=None))
ax[1, 0].set_title(veg_proxy + " at EOS")

# Length of Season
phen.LOS.plot(
    ax=ax[1, 1],
    cmap="magma_r",
    vmax=300,
    vmin=0,
    cbar_kwargs=dict(shrink=cbar_size, label=None),
)
ax[1, 1].set_title("Length of Season (Days)")

# Amplitude
phen.AOS.plot(ax=ax[1, 2],
              cmap="YlGn",
              vmax=0.8,
              cbar_kwargs=dict(shrink=cbar_size, label=None))
ax[1, 2].set_title("Amplitude of Season")

# rate of growth
phen.ROG.plot(
    ax=ax[1, 3],
    cmap="coolwarm_r",
    vmin=-0.02,
    vmax=0.02,
    cbar_kwargs=dict(shrink=cbar_size, label=None),
)
ax[1, 3].set_title("Rate of Growth")

# rate of Sensescence
phen.ROS.plot(
    ax=ax[1, 4],
    cmap="coolwarm_r",
    vmin=-0.02,
    vmax=0.02,
    cbar_kwargs=dict(shrink=cbar_size, label=None),
)
ax[1, 4].set_title("Rate of Senescence")
plt.suptitle("Phenology for " + year_to_plot)
plt.tight_layout();

plt.savefig('results/phenology_2D_'+year+".png");

../../../../_images/sandbox_notebooks_Use_cases_Lake_baringo_grazing_Vegetation_phenology_perpixel_34_0.png

Additional information

License: The code in this notebook is licensed under the Apache License, Version 2.0. Digital Earth Africa data is licensed under the Creative Commons by Attribution 4.0 license.

Contact: If you need assistance, please post a question on the Open Data Cube Slack channel or on the GIS Stack Exchange using the open-data-cube tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on Github.

Compatible datacube version:

[17]:

print(datacube.__version__)

1.9.13

Last Tested:

[18]:

from datetime import datetime
datetime.today().strftime('%Y-%m-%d')

[18]:

'2026-04-29'