Rainfall anomalies from Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS)

Keywords: datasets; CHIRPS, climate, rainfall, monthly

Background

Rainfall anomalies are deviations of rainfall from long-run averages. They are useful for identifying wet and dry periods which can be linked to climatically influenced patterns such as flooding, river flows, and agricultural production.

Description

In this real world example we will calculate rainfall anomalies for a selected African country using the CHIRPS monthly rainfall dataset. Standardised anomaly is calculated as:

\begin{equation} \text{Standardised anomaly }=\frac{x-m}{s} \end{equation}

x is the seasonal mean, m is the long-term mean, and s is the long-term standard deviation.

This means we need a long-term reference period (m) and a period of interest (x) for which we’ll calculate the anomalies. This notebook names datasets ds_rf_m and ds_rf_x accordingly.

The notebook outlines:

  1. Loading a shapefile for African countries and selecting a single country

  2. Loading and rainfall data and masking it to the selected country.

  3. Calculating monthly rainfall anomalies and plotting the result, aggregated over space, as a bar chart.

  4. Calculating and plotting monthly rainfall anomalies spatially.


Getting started

To run this analysis, run all the cells in the notebook, starting with the “Load packages” cell.

Load packages

[1]:
%matplotlib inline

# Force GeoPandas to use Shapely instead of PyGEOS
# In a future release, GeoPandas will switch to using Shapely by default.
import os
os.environ['USE_PYGEOS'] = '0'

import datacube
import numpy as np
import pandas as pd
import geopandas as gpd
import xarray as xr
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from datacube.utils.geometry import Geometry, CRS
from deafrica_tools.spatial import xr_rasterize
from deafrica_tools.dask import create_local_dask_cluster

Set up a Dask cluster

Dask can be used to better manage memory use and conduct the analysis in parallel. For an introduction to using Dask with Digital Earth Africa, see the Dask notebook.

Note: We recommend opening the Dask processing window to view the different computations that are being executed; to do this, see the Dask dashboard in DE Africa section of the Dask notebook.

To use Dask, set up the local computing cluster using the cell below.

[ ]:
create_local_dask_cluster()

Analysis parameters

The following cell sets important parameters for the analysis:

  • country: In this analysis, we’ll select an African country to mask the dataset and analysis.

  • time_m: CHIRPS monthly rainfall is available from 1981. The long-term mean for rainfall anomalies is often calculated on a 30-year period, so we’ll use 1981 to 2011 in this example.

  • time_x: This is the period for which we want to calculate anomalies.

  • resolution: We’ll use 5,000 m, which is approximately equal to the default resolution shown above.

  • dask_chunks: the size of the dask chunks, dask breaks data into manageable chunks that can be easily stored in memory, e.g. dict(x=1000,y=1000)

Standardised anomaly is calculated as:

\begin{equation} \text{Standardised anomaly }=\frac{x-m}{s} \end{equation}

\(x\) is the seasonal mean, \(m\) is the long-term mean, and \(s\) is the long-term standard deviation.

This means we need a long-term reference period (m) and a period of interest (x) for which we’ll calculate the anomalies. This notebook names datasets ds_rf_m and ds_rf_x accordingly.

If running the notebook for the first time, keep the default settings below. This will demonstrate how the analysis works and provide meaningful results.

[3]:
# Select a country, for the example we will use Kenya, a complete list of countries is available below.
country = "Kenya"

# Set the range of dates for the climatology, this will be the reference period (m) for the anomaly calculation.
# Standard practice is to use a 30 year period, so we've used 1981 to 2011 in this example.
time_m = ('1981', '2011')

# time period for monthly anomaly (x)
time_x = ('1981', '2021')

# CHIRPS has a spatial resolution of ~5x5 km
resolution = (-5000, 5000)

#size of dask chunks
dask_chunks = dict(x=500,y=500)

Connect to the datacube

Connect to the datacube so we can access DE Africa data. The app parameter is a unique name for the analysis which is based on the notebook file name.

[4]:
dc = datacube.Datacube(app='rainfall_anomaly')

Load the African Countries shapefile

This shapefile contains polygons for the boundaries of African countries and will allows us to calculate rainfall anomalies within a chosen country

[5]:
african_countries = gpd.read_file('../Supplementary_data/Rainfall_anomaly_CHIRPS/african_countries.geojson')
african_countries.explore()
[5]:
Make this Notebook Trusted to load map: File -> Trust Notebook

List countries

You can change the country in the analysis parameters cell to any African country. A complete list of countries is printed below.

[6]:
print(np.unique(african_countries.COUNTRY))
['Algeria' 'Angola' 'Benin' 'Botswana' 'Burkina Faso' 'Burundi' 'Cameroon'
 'Cape Verde' 'Central African Republic' 'Chad' 'Comoros'
 'Congo-Brazzaville' 'Cote d`Ivoire' 'Democratic Republic of Congo'
 'Djibouti' 'Egypt' 'Equatorial Guinea' 'Eritrea' 'Ethiopia' 'Gabon'
 'Gambia' 'Ghana' 'Guinea' 'Guinea-Bissau' 'Kenya' 'Lesotho' 'Liberia'
 'Libya' 'Madagascar' 'Malawi' 'Mali' 'Mauritania' 'Morocco' 'Mozambique'
 'Namibia' 'Niger' 'Nigeria' 'Rwanda' 'Sao Tome and Principe' 'Senegal'
 'Sierra Leone' 'Somalia' 'South Africa' 'Sudan' 'Swaziland' 'Tanzania'
 'Togo' 'Tunisia' 'Uganda' 'Western Sahara' 'Zambia' 'Zimbabwe']

Setup polygon

The country selected needs to be transformed into a geometry object to be used in the dc.load() function.

[7]:
idx = african_countries[african_countries['COUNTRY'] == country].index[0]
geom = Geometry(geom=african_countries.iloc[idx].geometry, crs=african_countries.crs)

Load rainfall data

First, let’s have a look at the product information for CHIRPS rainfall. We can see that it is stored in monthly timestamps and its native resolution is approximately 0.05 degrees.

[8]:
dc.list_products().loc[dc.list_products()['name'].str.contains('chirps')]
[8]:
name description license default_crs default_resolution
name
rainfall_chirps_daily rainfall_chirps_daily Rainfall Estimates from Rain Gauge and Satelli... None None None
rainfall_chirps_monthly rainfall_chirps_monthly Rainfall Estimates from Rain Gauge and Satelli... None None None

Load data for long-term climatology (m)

Using the analysis parameters defined above, we will load CHIRPS monthly rainfall data for the 30-year reference period (m).

[9]:
query = {'geopolygon': geom,
         'time': time_m,
         'output_crs': 'epsg:6933',
         'resolution': resolution,
         'measurements': ['rainfall'],
         'dask_chunks':dask_chunks
        }

ds_rf = dc.load(product='rainfall_chirps_monthly', **query)
ds_rf
[9]:
<xarray.Dataset>
Dimensions:      (time: 372, y: 238, x: 155)
Coordinates:
  * time         (time) datetime64[ns] 1981-01-16T11:59:59.500000 ... 2011-12...
  * y            (y) float64 5.875e+05 5.825e+05 ... -5.925e+05 -5.975e+05
  * x            (x) float64 3.272e+06 3.278e+06 ... 4.038e+06 4.042e+06
    spatial_ref  int32 6933
Data variables:
    rainfall     (time, y, x) float32 dask.array<chunksize=(1, 238, 155), meta=np.ndarray>
Attributes:
    crs:           EPSG:6933
    grid_mapping:  spatial_ref

Mask the rainfall dataset using the country boundary

Below, the country polygon is rasterized so the rainfall dataset is masked within that raster.

[10]:
african_countries = african_countries.to_crs('epsg:6933')

mask = xr_rasterize(african_countries[african_countries['COUNTRY'] == country], ds_rf)

#mask the rainfall dataset
ds_rf = ds_rf.where(mask)

# Plot the mask
mask.plot()
[10]:
<matplotlib.collections.QuadMesh at 0x7faae435b610>
../../../_images/sandbox_notebooks_Real_world_examples_Rainfall_anomaly_CHIRPS_24_1.png

Calculate monthly rainfall climatology

We want to capture both the monthly mean (i.e. each month averaged over thirty years) and the monthly standard deviation of rainfall within the country polygon for each year from 1981 to 2011. Firstly, rainfall is grouped by month and a mean is calculated, then the standard devation in rainfall total for each month is calculated

[11]:
ds_rf_m = ds_rf.where(ds_rf !=-9999.) #convert missing values to NaN

# monthly means
climatology_mean = ds_rf_m.groupby('time.month').mean('time').compute()

#calculate monthly std dev
climatology_std = ds_rf_m.groupby('time.month').std('time').compute()

/usr/local/lib/python3.10/dist-packages/rasterio/warp.py:344: NotGeoreferencedWarning: Dataset has no geotransform, gcps, or rpcs. The identity matrix will be returned.
  _reproject(

Now we can plot the rainfall mean climatology, this is the average rainfall (over 30 years) for each month

[12]:
climatology_mean['rainfall'].plot.imshow(cmap='viridis', col='month', col_wrap=6, label=False);
../../../_images/sandbox_notebooks_Real_world_examples_Rainfall_anomaly_CHIRPS_28_0.png

Load data for the anomaly period

Using the analysis parameters defined above, we will load CHIRPS rainfall data for the period over which we want to calculate anomalies (x). We also need to mask this dataset to the country polygon.

[13]:
#load rainfall data for the anomaly period matching the spatial extent of the climatologies
ds_rf_x = dc.load(product='rainfall_chirps_monthly',
                  like=ds_rf.geobox,
                  time=time_x,
                  dask_chunks=dask_chunks)

#mask with country polygon
ds_rf_x=ds_rf_x.where(mask)

Calculate standardised anomalies

We can visualise the anomalies spatially and see if they are associated with certain landscape features.

Do the spatial anomalies shown in the plots below align with the aggregated anomalies shown above?

[14]:
stand_anomalies = xr.apply_ufunc(
    lambda x, m, s: (x - m) / s,
    ds_rf_x.groupby("time.month"),
    climatology_mean,
    climatology_std,
    output_dtypes=[ds_rf_x.rainfall.dtype],
    dask="allowed"
).compute()

Plot average rainfall anomalies across the country

Below, the spatial mean is taken so we can present the monthly anomalies aggregated across the selected country.

[15]:
spatial_mean_anoms = stand_anomalies.rainfall.mean(['x','y']).to_dataframe().drop(['spatial_ref', 'month'], axis=1)

Below, we plot a bar graph that will show the average rainfall anomaly across the country

[16]:
fig, ax = plt.subplots(figsize=(21,6))
ax.xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m"))
ax.set_ylim(-2,3.5)
ax.bar(spatial_mean_anoms.index,
       spatial_mean_anoms.rainfall,
       width=35, align='center',
       color=(spatial_mean_anoms['rainfall'] > 0).map({True: 'b', False: 'r'}))
ax.axhline(0, color='black', linestyle='--')
plt.title('Monthly standardised rainfall anomalies for '+country)
plt.ylabel('Std. deviations from monthly mean');
../../../_images/sandbox_notebooks_Real_world_examples_Rainfall_anomaly_CHIRPS_36_0.png

Per-pixel plots of rainfall anomalies

Average anomalies across the entire country obscure details on how rainfall anomalies are spatially distrbuted within the country. Below, enter a start and end date (in format 'YYYY-MM') that is within the time_x range you entered in the Analysis parameters section to plot per-pixel anomalies for the range of dates you specify.

[17]:
#Select a time-range to plot
start='2020-07'
end='2021-06'

Plot the per-pixel anomalies

[18]:
stand_anomalies['rainfall'].sel(time=slice(start,end)).plot(cmap='RdBu', label=False, col="time", col_wrap=6, robust=True);
../../../_images/sandbox_notebooks_Real_world_examples_Rainfall_anomaly_CHIRPS_40_0.png

Additional information

License: The code in this notebook is licensed under the Apache License, Version 2.0. Digital Earth Africa data is licensed under the Creative Commons by Attribution 4.0 license.

Contact: If you need assistance, please post a question on the Open Data Cube Slack channel or on the GIS Stack Exchange using the open-data-cube tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on Github.

Compatible datacube version:

[19]:
print(datacube.__version__)
1.8.15

Last Tested:

[20]:
from datetime import datetime
datetime.today().strftime('%Y-%m-%d')
[20]:
'2023-08-14'