# Cropland seasonal vegetation condition anomalies

Keywords: data used; landsat 8, vegetation; anomalies, band index; NDVI;

## Background

Understanding how the vegetated landscape responds to longer-term environmental drivers such as the El Nino Southern Oscillation (ENSO) or climate change, requires the calculation of seasonal anomalies. Standardised seasonal anomalies subtract the long-term seasonal mean from a time-series and then divide the result by the long-term standard deviation, thus removing seasonal variability and highlighting change related to longer-term drivers.

### Description

This notebook will calculate seasonal standardised NDVI anomalies for any given season and year. The long-term seasonal climatologies (both mean and standard deviation) are calculated on-the-fly.

$$\text{Standardised anomaly }=\frac{x-m}{s}$$

$$x$$ is the seasonal mean, $$m$$ is the long-term mean, and $$s$$ is the long-term standard deviation.

Note: It is a convention to establish climatologies based on a 30-year time range to account for inter-annual and inter-decadal modes of climate variability (often 1980–2010 or 1960–1990). As the Landsat archive over Africa is not consistent before 2000, the climatologies here have been calculated using the date-range 1999–2019 (inclusive). While this is not ideal, a 20-year climatology should suffice to capture the bulk of inter-annual and inter-decadal variability; for example, both a major El Nino (2015/16) and a major La Nina (2011) are captured by this time-range.

The following steps are taken in the notebook:

1. Load cloud-masked Landsat data over the region of interest for the years over which the climatologies will be computed

2. Create a map of clear-pixel counts

3. Calculate the NDVI long-term mean and long-term standard deviation

4. Calculate standardised NDVI anomalies

5. Focus on agricultural regions using the DE Africa cropland extent map

6. Plot the results

## Getting started

To run this analysis, run all the cells in the notebook, starting with the “Load packages” cell.

After finishing the analysis, return to the “Analysis parameters” cell, modify some values (e.g. choose a different location or time period to analyse) and re-run the analysis.

Load key Python packages and supporting functions for the analysis.

[1]:
%matplotlib inline

import datacube
import xarray as xr
import numpy as np
import matplotlib.pyplot as plt

from deafrica_tools.bandindices import calculate_indices
from deafrica_tools.plotting import display_map
/env/lib/python3.8/site-packages/geopandas/_compat.py:106: UserWarning: The Shapely GEOS version (3.8.0-CAPI-1.13.1 ) is incompatible with the GEOS version PyGEOS was compiled with (3.9.1-CAPI-1.14.2). Conversions between both will be slow.
warnings.warn(

## Set up a Dask cluster

Dask can be used to better manage memory use and conduct the analysis in parallel. For an introduction to using Dask with Digital Earth Africa, see the Dask notebook.

Note: We recommend opening the Dask processing window to view the different computations that are being executed; to do this, see the Dask dashboard in DE Africa section of the Dask notebook.

To use Dask, set up the local computing cluster using the cell below.

[2]:

### Cluster

• Workers: 1
• Cores: 3
• Memory: 28.14 GB

### Connect to the datacube

Activate the datacube database, which provides functionality for loading and displaying stored Earth observation data.

[3]:
dc = datacube.Datacube(app="Vegetation_anomalies")

### Analysis parameters

The following cell sets important parameters for the analysis:

• lat: The central latitude to analyse (e.g. 14.283).

• lon: The central longitude to analyse (e.g. -16.921).

• buffer: The number of square degrees to load around the central latitude and longitude. For reasonable loading times, set this as 0.2 or lower.

• time_range: The date range over which the NDVI climatolgies will be calculated (e.g. ('1999', '2019'))

• year: The year for which we will calculate the standardised anomaly, e.g. '2021'

• season: The season for which we will calculate the standardised anomaly, e.g 'DJF','JFM', 'FMA' etc.

• resolution: The x and y cell resolution of the satellite data, e.g. (-30, 30) will load Landsat data at its native 30 x 30m resolution

• dask_chunks: the size of the dask chunks, dask breaks data into manageable chunks that can be easily stored in memory, e.g. dict(x=1000,y=1000)

If running the notebook for the first time, keep the default settings below. This will demonstrate how the analysis works and provide meaningful results. The example explores coastal change in Ponto, Senegal.

To run the notebook for a different area, make sure Landsat data is available for the new location, which you can check at the DE Africa Explorer (use the drop-down menu to view all Landsat products).

[4]:
# Define the area of interest
lat, lon = 12.4917, 37.7649 #near Lake Tana, Ethiopia
buffer = 0.2

# Set the range of dates for the climatology
time_range = ('1999', '2019')
year = '2021'
season = 'MJJ'
resolution = (-30, 30)

# Combine central lat,lon with buffer to get area of interest
lat_range = (lat-buffer, lat+buffer)
lon_range = (lon-buffer, lon+buffer)

## View the selected location

The next cell will display the selected area on an interactive map. Feel free to zoom in and out to get a better understanding of the area you’ll be analysing. Clicking on any point of the map will reveal the latitude and longitude coordinates of that point.

[5]:
display_map(x=lon_range, y=lat_range)
[5]:

### Define a function for filtering data to the season of interest

[6]:
quarter = {'JFM': [1,2,3],
'FMA': [2,3,4],
'MAM': [3,4,5],
'AMJ': [4,5,6],
'MJJ': [5,6,7],
'JJA': [6,7,8],
'JAS': [7,8,9],
'ASO': [8,9,10],
'SON': [9,10,11],
'OND': [10,11,12],
'NDJ': [11,12,1],
'DJF': [12,1,2]
}

def filter_season(dataset):
dss = []
if dataset.time.begin.month in quarter[season]:
dss.append(dataset)
return dss

The first step in this analysis is to load in Landsat data for the lat_range, lon_range and time_range we provided above. The code below uses the load_ard function to load in data from the Landsat 5, 7 and 8 satellites for the area and time specified. For more information, see the Using load_ard notebook.

The function will also automatically mask out clouds from the dataset, allowing us to focus on pixels that contain useful data:

[7]:
# Create the 'query' dictionary object, which contains the longitudes,
# latitudes and time provided above
query = {
'x': lon_range,
'y': lat_range,
'time': time_range,
'measurements': ['red','nir','pixel_quality'],
'resolution': resolution,
'output_crs':'epsg:6933',
}

# Load available data Landsat 8
products=['ls7_sr','ls8_sr'],
group_by='solar_day',
predicate=filter_season,
**query)

Using pixel quality parameters for USGS Collection 2
Finding datasets
ls7_sr
ls8_sr
Filtering datasets using filter function
Applying morphological filters to pq mask [('opening', 3), ('dilation', 3)]
Re-scaling Landsat C2 data
Returning 181 time steps as a dask array

## Generate a clear pixel count summary

This will help us understand how many observations are going into the NDVI climatology calculations. Too few observations might indicate a bias in the climatology. Remember, the role of the climatology is to define ‘average’ conditions, too few observations will not provide a realistic estimation of the typical conditions.

[8]:
#create bit flag dictionary
flags_def = ds.pixel_quality.attrs["flags_definition"]
quality_flags = dict(
cloud="high_confidence", # True where there is cloud
)

#calculate the number of clear observation per pixel
[9]:
pq_count.plot(figsize=(10,10))
plt.title('Clear pixel count for NDVI climatologies');

## Calculate NDVI climatologies

This will take a few minutes to run because we will bring the climatologies into memory. Check the Dask dashboard to see progress. Access the dashboard by clicking on the Dashboard link generated when you created the cluster.

[10]:
#calculate NDVI
ndvi = calculate_indices(ds, 'NDVI', collection='c2', drop=True)

ndvi = ndvi.persist()

#calculate the climatologies and bring into memory
climatology_mean = ndvi.mean("time").NDVI.compute()
climatology_std = ndvi.std("time").NDVI.compute()
Dropping bands ['red', 'nir', 'pixel_quality', 'cloud_mask']

## Calculate standardised anomalies

Step 1: Load data from the season we’re analysing

Step 2. Calculate the mean NDVI during the season

[11]:
season_query = {
'x': lon_range,
'y': lat_range,
'time': (year+'-'+str(quarter[season][0]), year+'-'+str(quarter[season][-1])),
'measurements': ['red','nir'],
'resolution': resolution,
'output_crs':'epsg:6933',
}

# Load available data, Landsat 8 only
products=['ls8_sr'],
group_by='solar_day',
predicate=filter_season,
**season_query)

#calculate mean NDVI
season_ndvi = calculate_indices(season_ds, 'NDVI', collection='c2', drop=True)
seasonal_mean = season_ndvi.mean('time').NDVI.compute()
Using pixel quality parameters for USGS Collection 2
Finding datasets
ls8_sr
Filtering datasets using filter function
Re-scaling Landsat C2 data
Returning 10 time steps as a dask array
Dropping bands ['red', 'nir']

Step 3: Now we can calculate the standardised anomalies by subtracting the long-term mean and dividing by the long-term standard deviation

[12]:
stand_anomalies = xr.apply_ufunc(
lambda x, m, s: (x - m) / s,
seasonal_mean,
climatology_mean,
climatology_std,
output_dtypes=[ds.red.dtype],
)

## Incorporating DE Africa’s cropland extent map

Load the cropland mask over the region of interest. The default analysed here is in Ethiopia, so we need to load the crop_mask_eastern product, which cover the countries of Ethiopia, Kenya, Tanzania, Rwanda, and Burundi.

[13]:
time=('2019'),
measurements='filtered',
resampling='nearest',
like=ds.geobox).filtered.squeeze()

plt.title('Cropland Extent');

### Plot NDVI climatolgies, seasonal mean, and anomalies for cropping regions only

Below we mask out the regions that aren’t cropping, revealing only the condition of the cropped regions.

[14]:
climatology_mean=climatology_mean.where(cm, np.nan)
climatology_std=climatology_std.where(cm, np.nan)
seasonal_mean=seasonal_mean.where(cm, np.nan)
stand_anomalies=stand_anomalies.where(cm, np.nan)
[15]:
#plot al layers
plt.rcParams['axes.facecolor'] = 'gray' # makes transparent pixels obvious
fig,ax = plt.subplots(2,2, sharey=True, sharex=True, figsize=(15,15))

climatology_mean.plot.imshow(ax=ax[0,0], cmap='YlGn' ,vmin=0, vmax=0.75)
ax[0,0].set_title('NDVI: '+season+'mean climatology')
climatology_std.plot.imshow(ax=ax[0,1], vmin=0, vmax=0.25)
ax[0,1].set_title('NDVI: '+season+'std dev climatology')
seasonal_mean.plot.imshow(ax=ax[1,0], cmap='YlGn', vmin=0, vmax=0.75)
ax[1,0].set_title('NDVI: '+year+" "+season+' mean')
stand_anomalies.plot.imshow(ax=ax[1,1], cmap='BrBG',vmin=-2, vmax=2)
ax[1,1].set_title('NDVI: '+year+" "+season+' standardised anomaly')
plt.tight_layout();

## Drawing Conclusions

Here are some questions to think about:

1. How does the seasonal NDVI mean compare with the long term mean of NDVI?

2. Do the cropping regions tend to have high or low standard deviations in NDVI?

3. Looking at the map of standardised anomalies, are the crops fairing better or worse than average? And how unusual are the anomalies compared with average?

4. What are other environmental data might help us confirm the drivers of the changes in NDVI?

Contact: If you need assistance, please post a question on the Open Data Cube Slack channel or on the GIS Stack Exchange using the open-data-cube tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on Github.

Compatible datacube version:

[16]:
print(datacube.__version__)
1.8.5

Last Tested:

[17]:
from datetime import datetime
datetime.today().strftime('%Y-%m-%d')
[17]:
'2021-09-22'