Introduction to plotting

Keywords beginner’s guide; visualisation, visualisation; beginner’s guide, data used; landsat 8, visualisation; measurements, visualisation; timesteps, visualisation; colour maps, python package; matplotlib

Background

Data visualisation is an important component of working with Earth Observation data. The xarray Python package provides a range of straightforward data plotting options which allow users to quickly generate simple plots from multi-dimensional datasets. To generate more complex and informative plots from data loaded from Digital Earth Africa (DE Africa), the DE Africa Notebooks repository also provides a custom plotting module with additional easy-to-use functionality.

Description

This introductory notebook demonstrates how to visualise Digital Earth Africa satellite data returned from running a datacube query. The notebook demonstrates commonly-used xarray plotting methods, as well as custom functions provided in deafrica_tools.plotting.

Topics covered in this notebook include:

  • View your area of interest prior to querying the datacube

  • Querying the datacube and loading data

  • Plotting single band data (e.g. a single satellite band)

    • Selecting and plotting individual timesteps

    • Plotting multiple timesteps

    • Customising plot appearance

  • Plotting three-band true or false colour imagery

    • Plotting single timesteps

    • Plotting multiple timesteps

    • Customising plot appearance


Getting started

To run this introduction to plotting data loaded from the datacube, run all the cells in the notebook starting with the “Load packages” cell. For help with running notebook cells, refer back to the Jupyter Notebooks notebook.

Load packages

First we run %matplotlib inline, which ensures figures plot correctly in the Jupyter notebook.

We then need to load the datacube package, which allows us to load data.

Plotting the data requires functions which we can import from deafrica_tools.plotting. For this notebook, we need rgb and display_map. We can import both in one line by separating the names with a comma.

[1]:
%matplotlib inline

import datacube
from deafrica_tools.plotting import rgb, display_map
/env/lib/python3.6/site-packages/geopandas/_compat.py:88: UserWarning: The Shapely GEOS version (3.7.2-CAPI-1.11.0 ) is incompatible with the GEOS version PyGEOS was compiled with (3.9.0-CAPI-1.16.2). Conversions between both will be slow.
  shapely_geos_version, geos_capi_version_string

Connect to the datacube

We then connect to the datacube database so we can load DE Africa data.

[2]:
dc = datacube.Datacube(app="04_Plotting")

Analysis parameters

The following variables are required to establish a query for this notebook: - lat_range: The latitude range to analyse (e.g. (11.72, 11.52)). For reasonable load times, keep this to a range of ~0.1 degrees or less. - lon_range: The longitude range to analyse (e.g. (-15.63, -15.43)). For reasonable load times, keep this to a range of ~0.1 degrees or less. - time_range: The date range to analyse (e.g. ("2018-01-01", "2018-03-30")).

[3]:
lat_range = (11.72, 11.52)
lon_range = (-15.63, -15.43)
time_range = ("2017-01-01", "2017-03-30")

View the queried location

Before running a query and extracting and analysing data, it is useful to double-check that your location is correct. The display_map() function shows your selected area as a red rectangle on an interactive map. Clicking on any point of the map will reveal the latitude and longitude coordinates of that point.

[4]:
display_map(x=lon_range, y=lat_range)
[4]:

Query and view data

The variables determined above are used here to query the DE Africa datacube using the dc.load() function and load data introduced in the Loading data notebook. This notebook uses Landsat 8 Surface reflectance ls8_sr.

[5]:
ds = dc.load(product="ls8_sr",
             measurements=['blue','green','red','nir','swir_1','swir_2'],
             x=lon_range,
             y=lat_range,
             time=time_range,
             output_crs='EPSG:6933',
             resolution=(-30, 30))

print(ds)
<xarray.Dataset>
Dimensions:      (time: 6, x: 644, y: 834)
Coordinates:
  * time         (time) datetime64[ns] 2017-01-06T11:22:16.948666 ... 2017-03...
  * y            (y) float64 1.485e+06 1.485e+06 1.485e+06 ... 1.46e+06 1.46e+06
  * x            (x) float64 -1.508e+06 -1.508e+06 ... -1.489e+06 -1.489e+06
    spatial_ref  int32 6933
Data variables:
    blue         (time, y, x) uint16 10396 10397 10394 10404 ... 8291 8258 8208
    green        (time, y, x) uint16 11703 11698 11703 11714 ... 9794 9747 9688
    red          (time, y, x) uint16 11823 11831 11837 11834 ... 8373 8320 8321
    nir          (time, y, x) uint16 8996 8994 8995 8990 ... 8092 8062 8051 8037
    swir_1       (time, y, x) uint16 7614 7613 7627 7626 ... 8550 8559 8543 8519
    swir_2       (time, y, x) uint16 7542 7530 7532 7535 ... 8449 8449 8444 8421
Attributes:
    crs:           EPSG:6933
    grid_mapping:  spatial_ref

Plotting single band images

The xarray package provides built-in methods for plotting individual data variables or measurements. For example, we might want to make a plot for a single measurement like the swir_1 satellite band in the data we loaded above.

To do this, we first need to access the band we are after as an xarray.DataArray (to revise the difference between xarray.Dataset and xarray.DataArray objects, refer back to the Loading data notebook):

[6]:
print(ds.swir_1)
<xarray.DataArray 'swir_1' (time: 6, y: 834, x: 644)>
array([[[ 7614,  7613,  7627, ..., 13610, 13829, 14045],
        [ 7622,  7623,  7629, ..., 13587, 13795, 14460],
        [ 7610,  7638,  7626, ..., 13435, 13630, 14045],
        ...,
        [ 7594,  7610,  7668, ...,  7672,  7664,  7673],
        [ 7575,  7593,  7634, ...,  7678,  7679,  7674],
        [ 7586,  7578,  7617, ...,  7679,  7668,  7666]],

       [[ 7571,  7583,  7581, ..., 11822, 11993, 12371],
        [ 7575,  7582,  7619, ..., 11779, 12010, 12242],
        [ 7591,  7594,  7613, ..., 11777, 11988, 12195],
        ...,
        [ 8898,  9121,  9137, ..., 11705, 11537, 11535],
        [ 9029,  9085,  9107, ..., 11766, 11709, 11734],
        [ 9010,  9052,  9094, ..., 11885, 11799, 11796]],

       [[ 9024,  9018,  9010, ..., 14730, 15078, 15485],
        [ 9016,  9022,  9025, ..., 14663, 14844, 15442],
        [ 9014,  9020,  9037, ..., 14547, 15005, 15308],
        ...,
...
        ...,
        [ 8147,  8131,  8121, ...,  8488,  8549,  8608],
        [ 8124,  8118,  8110, ...,  8483,  8515,  8581],
        [ 8131,  8078,  8081, ...,  8484,  8514,  8583]],

       [[ 8565,  8563,  8570, ..., 14624, 15405, 15539],
        [ 8575,  8586,  8583, ..., 15189, 15669, 15273],
        [ 8588,  8576,  8579, ..., 14797, 15825, 15656],
        ...,
        [ 8383,  8296,  8291, ...,  8447,  8464,  8441],
        [ 8369,  8285,  8295, ...,  8457,  8436,  8432],
        [ 8362,  8277,  8288, ...,  8453,  8429,  8448]],

       [[ 8497,  8421,  8415, ..., 14707, 15239, 15847],
        [ 8461,  8422,  8394, ..., 15361, 15931, 16456],
        [ 8461,  8401,  8383, ..., 15271, 16247, 16153],
        ...,
        [ 8526,  8531,  8578, ...,  8492,  8542,  8536],
        [ 8517,  8531,  8558, ...,  8516,  8528,  8544],
        [ 8510,  8504,  8539, ...,  8559,  8543,  8519]]], dtype=uint16)
Coordinates:
  * time         (time) datetime64[ns] 2017-01-06T11:22:16.948666 ... 2017-03...
  * y            (y) float64 1.485e+06 1.485e+06 1.485e+06 ... 1.46e+06 1.46e+06
  * x            (x) float64 -1.508e+06 -1.508e+06 ... -1.489e+06 -1.489e+06
    spatial_ref  int32 6933
Attributes:
    units:         1
    nodata:        0
    crs:           EPSG:6933
    grid_mapping:  spatial_ref

Selecting and plotting a single timestep

You can see in the object header that this xarray.DataArray has data for six timesteps (i.e. <xarray.DataArray 'swir_1' (time: 6, y: 834, x: 644)>). To make a plot for a single timestep only, we need to select it using one of the following options:

  1. .isel(): This stands for “index selection”, and lets us easily select individual timesteps from a dataset by providing the number of the observation we want. Counting in Python begins at 0, so to select the first timestep in the xarray.DataArray we can specify .isel(time=0):

[7]:
first_timestep = ds.swir_1.isel(time=0)

print(first_timestep)
<xarray.DataArray 'swir_1' (y: 834, x: 644)>
array([[ 7614,  7613,  7627, ..., 13610, 13829, 14045],
       [ 7622,  7623,  7629, ..., 13587, 13795, 14460],
       [ 7610,  7638,  7626, ..., 13435, 13630, 14045],
       ...,
       [ 7594,  7610,  7668, ...,  7672,  7664,  7673],
       [ 7575,  7593,  7634, ...,  7678,  7679,  7674],
       [ 7586,  7578,  7617, ...,  7679,  7668,  7666]], dtype=uint16)
Coordinates:
    time         datetime64[ns] 2017-01-06T11:22:16.948666
  * y            (y) float64 1.485e+06 1.485e+06 1.485e+06 ... 1.46e+06 1.46e+06
  * x            (x) float64 -1.508e+06 -1.508e+06 ... -1.489e+06 -1.489e+06
    spatial_ref  int32 6933
Attributes:
    units:         1
    nodata:        0
    crs:           EPSG:6933
    grid_mapping:  spatial_ref
  1. .sel(): This allows us to select data using real-world coordinate labels like time. For example, from the Coordinates section, we can select the first timestep (i.e. The observation for January 6th 2017) from the xarray.DataArray by specifying .sel(time='2017-01-06'):

[8]:
first_timestep = ds.swir_1.sel(time='2017-01-06')

print(first_timestep)
<xarray.DataArray 'swir_1' (time: 1, y: 834, x: 644)>
array([[[ 7614,  7613,  7627, ..., 13610, 13829, 14045],
        [ 7622,  7623,  7629, ..., 13587, 13795, 14460],
        [ 7610,  7638,  7626, ..., 13435, 13630, 14045],
        ...,
        [ 7594,  7610,  7668, ...,  7672,  7664,  7673],
        [ 7575,  7593,  7634, ...,  7678,  7679,  7674],
        [ 7586,  7578,  7617, ...,  7679,  7668,  7666]]], dtype=uint16)
Coordinates:
  * time         (time) datetime64[ns] 2017-01-06T11:22:16.948666
  * y            (y) float64 1.485e+06 1.485e+06 1.485e+06 ... 1.46e+06 1.46e+06
  * x            (x) float64 -1.508e+06 -1.508e+06 ... -1.489e+06 -1.489e+06
    spatial_ref  int32 6933
Attributes:
    units:         1
    nodata:        0
    crs:           EPSG:6933
    grid_mapping:  spatial_ref

We can now use the .plot() method to plot swir1 data for our selected timestep:

[9]:
first_timestep.plot()
[9]:
<matplotlib.collections.QuadMesh at 0x7f6e4811e828>
../../../_images/sandbox_notebooks_Beginners_guide_04_Plotting_23_1.png

Plotting multiple timesteps

It is often useful to produce plots for a single measurement across time, for example to compare change between satellite observations or summary datasets. To plot multiple images, we can skip the isel() step above and plot the entire xarray.DataArray directly.

To plot multiple timesteps in one figure, we need to tell the .plot() function to put each timestep in a different column. We can do this by specifying .plot(col="time"):

[10]:
ds.swir_1.plot(col="time")
[10]:
<xarray.plot.facetgrid.FacetGrid at 0x7f6e48036e10>
../../../_images/sandbox_notebooks_Beginners_guide_04_Plotting_25_1.png

Note: This kind of plotting is called “facetted plotting”. For more information, refer to the xarray documentation

Customising plot appearance

You may notice that the plots above are dark and difficult to see clearly. To improve the appearance of xarray plots, you can use the robust=True argument to optimise the plot colours by clipping extreme values or outliers. This will use the 2nd and 98th percentiles of the data to compute the color limits:

[11]:
ds.swir_1.plot(col="time", robust=True)
[11]:
<xarray.plot.facetgrid.FacetGrid at 0x7f6e3b02a128>
../../../_images/sandbox_notebooks_Beginners_guide_04_Plotting_28_1.png

We can also easily use custom colour maps/styles to visualise our data using the cmap parameter.

When choosing a colour map for a plot, it is important to choose a set of colours that are perceived logically by the human eye. The best colour maps are “perceptually uniform”: these colour maps increase logically from dark to light colours, where equal increases in lightness/darkness correspond to equal changes in data values. Some best-practice perceptually uniform colour maps include:

"viridis", "plasma", "inferno", "magma", "cividis"

**Note**: For further reading about perceptually uniform colour maps in data visualisation, refer to the `matplotlib documentation <https://matplotlib.org/3.1.0/tutorials/colors/colormaps.html>`__.

It is also important to consider colour blindness when selecting a colour map. xarray supports many colour maps from the “colorbrewer” family of colour maps which are optimised for colour blindness. You can use the interactive online tool to browse all available colour maps, or choose from one of the following commonly used options:

"Greys", "Purples", "Blues", "Greens", "Oranges", "Reds",
"YlOrBr", "YlOrRd", "OrRd", "PuRd", "RdPu", "BuPu",
"GnBu", "PuBu", "YlGnBu", "PuBuGn", "BuGn", "YlGn"

For a full list of available colour maps you can refer to this list.

For example, to plot our data with the perceptually uniform magma colour map:

[12]:
ds.swir_1.plot(col="time", robust=True, cmap="magma")
[12]:
<xarray.plot.facetgrid.FacetGrid at 0x7f6e3ad9ca58>
../../../_images/sandbox_notebooks_Beginners_guide_04_Plotting_30_1.png

Plotting true or false colour RGB images

Although xarray makes it easy to plot single band images, plotting a three band colour photo-like image is less straightforward.

To make this easier, the deafrica-sandbox-notebooks repository provides a custom rgb() function that is designed for plotting three band images. The rgb() function maps three data variables/measurements from the loaded dataset to the red, green and blue channels that are used to make a three-colour image.

Providing the red, green and blue measurements from a dataset will produce a true colour image (akin to how humans view the landscape). Providing nir, red and green measurements or any other set of three satellite bands from a dataset will produce a false colour image. You can learn more about colour rendering here.

Hence, the rgb() function can be used to visualise the data returned by a query. It requires the minimum input of:

  • ds: The xarray.Dataset object

  • bands: Three bands for display (these must be measurements found in the dataset)

  • index: The timestep to view, default is 0

Plotting a single timestep

The time dimension of your xarray.Dataset describes how many timesteps exist for your location during your nominated time period. In the rgb() function, the index variable is asking for which timestep you want to view (similar to the isel() example above). Remember: counting in Python begins at 0 so to view the earliest timestep set index=0:

[13]:
# View a red, green, blue (true colour) image of the first timestep
rgb(ds, bands=["red", "green", "blue"], index=0)
../../../_images/sandbox_notebooks_Beginners_guide_04_Plotting_33_0.png

By changing the input bands, we can plot a false colour image which can provide different insights in a landscape. This band combination (swir_1, nir, green) emphasises growing vegetation in green, and water in deep blue:

[14]:
# View a swir_1, nir, green (false colour) image of the first timestep
rgb(ds, bands=['swir_1', 'nir', 'green'], index=0)
../../../_images/sandbox_notebooks_Beginners_guide_04_Plotting_35_0.png

Plotting multiple timesteps

As discussed in the single band example above, it can be useful to visualise multiple timesteps in a single plot (e.g. to compare change over time).

The rgb() function allows you to do this by providing a list of multiple images to plot using index=[X, X, ...]. For example, we can plot the first and fifth image in our dataset using index=[0, 4] (remembering that counting in Python starts at 0):

[15]:
# View a true colour image for the first and fith timesteps
rgb(ds, bands=['red', 'green', 'blue'], index=[0, 4])
../../../_images/sandbox_notebooks_Beginners_guide_04_Plotting_37_0.png

It is also possible to use rgb() to plot all timesteps in a dataset using the col="time" syntax we demonstrate in the single band example above:

[16]:
# Plot all timesteps in the dataset
rgb(ds, bands=['red', 'green', 'blue'], col="time")
../../../_images/sandbox_notebooks_Beginners_guide_04_Plotting_39_0.png

Customising plot appearance

By default, rgb() generates plots with robust=True to improve the appearance of the images by clipping out the darkest and brightest 2% of pixels, using the 2nd and 98th percentiles of the data to compute the color limits.

If this default provides poor results, the plot’s colour stretch can be customised using the percentile_stretch parameter. This allows you to clip the most extreme minimum and maximum values in the dataset, to improve the contrast and appearance of the plot.

For example, specifying percentile_stretch=[0.05, 0.95] will clip out the darkest and brightest 5% of pixels, focusing the colour stretch on the remaining 90% of less extreme values:

[17]:
rgb(ds,
    bands=['red', 'green', 'blue'],
    index=0,
    percentile_stretch=[0.05, 0.95])

../../../_images/sandbox_notebooks_Beginners_guide_04_Plotting_41_0.png

Additional information

License: The code in this notebook is licensed under the Apache License, Version 2.0. Digital Earth Africa data is licensed under the Creative Commons by Attribution 4.0 license.

Contact: If you need assistance, please post a question on the Open Data Cube Slack channel or on the GIS Stack Exchange using the open-data-cube tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on Github.

Compatible datacube version:

[18]:
print(datacube.__version__)
1.8.4.dev52+g07bc51a5

Last Tested:

[19]:
from datetime import date
print(date.today())
2021-05-19