Introduction to plotting
Products used: ls8_sr
Prerequisites: Users of this notebook should have a basic understanding of:
How to run a Jupyter notebook
Inspecting available DE Africa products and measurements
How to load data from DE Africa
Keywords beginner’s guide; visualisation, visualisation; beginner’s guide, data used; landsat 8, visualisation; measurements, visualisation; timesteps, visualisation; colour maps, python package; matplotlib
Background
Data visualisation is an important component of working with Earth Observation data. The xarray
Python package provides a range of straightforward data plotting options which allow users to quickly generate simple plots from multi-dimensional datasets. To generate more complex and informative plots from data loaded from Digital Earth Africa (DE Africa), the DE Africa Notebooks repository also provides a custom plotting module with additional easy-to-use functionality.
Description
This introductory notebook demonstrates how to visualise Digital Earth Africa satellite data returned from running a datacube query. The notebook demonstrates commonly-used xarray
plotting methods, as well as custom functions provided in deafrica_tools.plotting
.
Topics covered in this notebook include:
View your area of interest prior to querying the datacube
Querying the datacube and loading data
Plotting single band data (e.g. a single satellite band)
Selecting and plotting individual timesteps
Plotting multiple timesteps
Customising plot appearance
Plotting three-band true or false colour imagery
Plotting single timesteps
Plotting multiple timesteps
Customising plot appearance
Getting started
To run this introduction to plotting data loaded from the datacube, run all the cells in the notebook starting with the “Load packages” cell. For help with running notebook cells, refer back to the Jupyter Notebooks notebook.
Load packages
First we run %matplotlib inline
, which ensures figures plot correctly in the Jupyter notebook.
We then need to load the datacube
package, which allows us to load data.
Plotting the data requires functions which we can import from deafrica_tools.plotting
. For this notebook, we need rgb
and display_map
. We can import both in one line by separating the names with a comma.
[1]:
%matplotlib inline
import datacube
from deafrica_tools.plotting import rgb, display_map
Connect to the datacube
We then connect to the datacube database so we can load DE Africa data.
[2]:
dc = datacube.Datacube(app="04_Plotting")
Analysis parameters
The following variables are required to establish a query for this notebook: - lat_range
: The latitude range to analyse (e.g. (11.72, 11.52)
). For reasonable load times, keep this to a range of ~0.1 degrees or less. - lon_range
: The longitude range to analyse (e.g. (-15.63, -15.43)
). For reasonable load times, keep this to a range of ~0.1 degrees or less. - time_range
: The date range to analyse (e.g. ("2018-01-01", "2018-03-30")
).
[3]:
lat_range = (11.72, 11.52)
lon_range = (-15.63, -15.43)
time_range = ("2017-01-01", "2017-03-30")
View the queried location
Before running a query and extracting and analysing data, it is useful to double-check that your location is correct. The display_map()
function shows your selected area as a red rectangle on an interactive map. Clicking on any point of the map will reveal the latitude and longitude coordinates of that point.
[4]:
display_map(x=lon_range, y=lat_range)
[4]:
Query and view data
The variables determined above are used here to query the DE Africa datacube using the dc.load()
function and load data introduced in the Loading data notebook. This notebook uses Landsat 8 Surface reflectance ls8_sr
.
[5]:
ds = dc.load(product="ls8_sr",
measurements=['blue','green','red','nir','swir_1','swir_2'],
x=lon_range,
y=lat_range,
time=time_range,
output_crs='EPSG:6933',
resolution=(-30, 30))
print(ds)
<xarray.Dataset>
Dimensions: (time: 6, y: 834, x: 644)
Coordinates:
* time (time) datetime64[ns] 2017-01-06T11:22:16.948666 ... 2017-03...
* y (y) float64 1.485e+06 1.485e+06 1.485e+06 ... 1.46e+06 1.46e+06
* x (x) float64 -1.508e+06 -1.508e+06 ... -1.489e+06 -1.489e+06
spatial_ref int32 6933
Data variables:
blue (time, y, x) uint16 10396 10397 10394 10404 ... 8291 8258 8208
green (time, y, x) uint16 11703 11698 11703 11714 ... 9794 9747 9688
red (time, y, x) uint16 11823 11831 11837 11834 ... 8373 8320 8321
nir (time, y, x) uint16 8996 8994 8995 8990 ... 8092 8062 8051 8037
swir_1 (time, y, x) uint16 7614 7613 7627 7626 ... 8550 8559 8543 8519
swir_2 (time, y, x) uint16 7542 7530 7532 7535 ... 8449 8449 8444 8421
Attributes:
crs: EPSG:6933
grid_mapping: spatial_ref
Plotting single band images
The xarray
package provides built-in methods for plotting individual data variables or measurements. For example, we might want to make a plot for a single measurement like the swir_1
satellite band in the data we loaded above.
To do this, we first need to access the band we are after as an xarray.DataArray
(to revise the difference between xarray.Dataset
and xarray.DataArray
objects, refer back to the Loading data notebook):
[6]:
print(ds.swir_1)
<xarray.DataArray 'swir_1' (time: 6, y: 834, x: 644)>
array([[[ 7614, 7613, 7627, ..., 13610, 13829, 14045],
[ 7622, 7623, 7629, ..., 13587, 13795, 14460],
[ 7610, 7638, 7626, ..., 13435, 13630, 14045],
...,
[ 7594, 7610, 7668, ..., 7672, 7664, 7673],
[ 7575, 7593, 7634, ..., 7678, 7679, 7674],
[ 7586, 7578, 7617, ..., 7679, 7668, 7666]],
[[ 7571, 7583, 7581, ..., 11822, 11993, 12371],
[ 7575, 7582, 7619, ..., 11779, 12010, 12242],
[ 7591, 7594, 7613, ..., 11777, 11988, 12195],
...,
[ 8898, 9121, 9137, ..., 11705, 11537, 11535],
[ 9029, 9085, 9107, ..., 11766, 11709, 11734],
[ 9010, 9052, 9094, ..., 11885, 11799, 11796]],
[[ 9024, 9018, 9010, ..., 14730, 15078, 15485],
[ 9016, 9022, 9025, ..., 14663, 14844, 15442],
[ 9014, 9020, 9037, ..., 14547, 15005, 15308],
...,
...
...,
[ 8147, 8131, 8121, ..., 8488, 8549, 8608],
[ 8124, 8118, 8110, ..., 8483, 8515, 8581],
[ 8131, 8078, 8081, ..., 8484, 8514, 8583]],
[[ 8565, 8563, 8570, ..., 14624, 15405, 15539],
[ 8575, 8586, 8583, ..., 15189, 15669, 15273],
[ 8588, 8576, 8579, ..., 14797, 15825, 15656],
...,
[ 8383, 8296, 8291, ..., 8447, 8464, 8441],
[ 8369, 8285, 8295, ..., 8457, 8436, 8432],
[ 8362, 8277, 8288, ..., 8453, 8429, 8448]],
[[ 8497, 8421, 8415, ..., 14707, 15239, 15847],
[ 8461, 8422, 8394, ..., 15361, 15931, 16456],
[ 8461, 8401, 8383, ..., 15271, 16247, 16153],
...,
[ 8526, 8531, 8578, ..., 8492, 8542, 8536],
[ 8517, 8531, 8558, ..., 8516, 8528, 8544],
[ 8510, 8504, 8539, ..., 8559, 8543, 8519]]], dtype=uint16)
Coordinates:
* time (time) datetime64[ns] 2017-01-06T11:22:16.948666 ... 2017-03...
* y (y) float64 1.485e+06 1.485e+06 1.485e+06 ... 1.46e+06 1.46e+06
* x (x) float64 -1.508e+06 -1.508e+06 ... -1.489e+06 -1.489e+06
spatial_ref int32 6933
Attributes:
units: 1
nodata: 0
crs: EPSG:6933
grid_mapping: spatial_ref
Selecting and plotting a single timestep
You can see in the object header that this xarray.DataArray
has data for six timesteps (i.e. <xarray.DataArray 'swir_1' (time: 6, y: 834, x: 644)>
). To make a plot for a single timestep only, we need to select it using one of the following options:
.isel()
: This stands for “index selection”, and lets us easily select individual timesteps from a dataset by providing the number of the observation we want. Counting in Python begins at 0, so to select the first timestep in thexarray.DataArray
we can specify.isel(time=0)
:
[7]:
first_timestep = ds.swir_1.isel(time=0)
print(first_timestep)
<xarray.DataArray 'swir_1' (y: 834, x: 644)>
array([[ 7614, 7613, 7627, ..., 13610, 13829, 14045],
[ 7622, 7623, 7629, ..., 13587, 13795, 14460],
[ 7610, 7638, 7626, ..., 13435, 13630, 14045],
...,
[ 7594, 7610, 7668, ..., 7672, 7664, 7673],
[ 7575, 7593, 7634, ..., 7678, 7679, 7674],
[ 7586, 7578, 7617, ..., 7679, 7668, 7666]], dtype=uint16)
Coordinates:
time datetime64[ns] 2017-01-06T11:22:16.948666
* y (y) float64 1.485e+06 1.485e+06 1.485e+06 ... 1.46e+06 1.46e+06
* x (x) float64 -1.508e+06 -1.508e+06 ... -1.489e+06 -1.489e+06
spatial_ref int32 6933
Attributes:
units: 1
nodata: 0
crs: EPSG:6933
grid_mapping: spatial_ref
.sel()
: This allows us to select data using real-world coordinate labels liketime
. For example, from the Coordinates section, we can select the first timestep (i.e. The observation for January 6th 2017) from thexarray.DataArray
by specifying.sel(time='2017-01-06')
:
[8]:
first_timestep = ds.swir_1.sel(time='2017-01-06')
print(first_timestep)
<xarray.DataArray 'swir_1' (time: 1, y: 834, x: 644)>
array([[[ 7614, 7613, 7627, ..., 13610, 13829, 14045],
[ 7622, 7623, 7629, ..., 13587, 13795, 14460],
[ 7610, 7638, 7626, ..., 13435, 13630, 14045],
...,
[ 7594, 7610, 7668, ..., 7672, 7664, 7673],
[ 7575, 7593, 7634, ..., 7678, 7679, 7674],
[ 7586, 7578, 7617, ..., 7679, 7668, 7666]]], dtype=uint16)
Coordinates:
* time (time) datetime64[ns] 2017-01-06T11:22:16.948666
* y (y) float64 1.485e+06 1.485e+06 1.485e+06 ... 1.46e+06 1.46e+06
* x (x) float64 -1.508e+06 -1.508e+06 ... -1.489e+06 -1.489e+06
spatial_ref int32 6933
Attributes:
units: 1
nodata: 0
crs: EPSG:6933
grid_mapping: spatial_ref
We can now use the .plot()
method to plot swir1
data for our selected timestep:
[9]:
first_timestep.plot()
[9]:
<matplotlib.collections.QuadMesh at 0x7fd997ef6230>
Plotting multiple timesteps
It is often useful to produce plots for a single measurement across time, for example to compare change between satellite observations or summary datasets. To plot multiple images, we can skip the isel()
step above and plot the entire xarray.DataArray
directly.
To plot multiple timesteps in one figure, we need to tell the .plot()
function to put each timestep in a different column. We can do this by specifying .plot(col="time")
:
[10]:
ds.swir_1.plot(col="time")
[10]:
<xarray.plot.facetgrid.FacetGrid at 0x7fd98df63940>
Note: This kind of plotting is called “facetted plotting”. For more information, refer to the xarray documentation
Customising plot appearance
You may notice that the plots above are dark and difficult to see clearly. To improve the appearance of xarray
plots, you can use the robust=True
argument to optimise the plot colours by clipping extreme values or outliers. This will use the 2nd and 98th percentiles of the data to compute the color limits:
[11]:
ds.swir_1.plot(col="time", robust=True)
[11]:
<xarray.plot.facetgrid.FacetGrid at 0x7fd98dc32590>
We can also easily use custom colour maps/styles to visualise our data using the cmap
parameter.
When choosing a colour map for a plot, it is important to choose a set of colours that are perceived logically by the human eye. The best colour maps are “perceptually uniform”: these colour maps increase logically from dark to light colours, where equal increases in lightness/darkness correspond to equal changes in data values. Some best-practice perceptually uniform colour maps include:
"viridis", "plasma", "inferno", "magma", "cividis"
Note: For further reading about perceptually uniform colour maps in data visualisation, refer to the matplotlib documentation.
It is also important to consider colour blindness when selecting a colour map. xarray
supports many colour maps from the “colorbrewer” family of colour maps which are optimised for colour blindness. You can use the interactive online tool to browse all available colour maps, or choose from one of the following commonly used options:
"Greys", "Purples", "Blues", "Greens", "Oranges", "Reds",
"YlOrBr", "YlOrRd", "OrRd", "PuRd", "RdPu", "BuPu",
"GnBu", "PuBu", "YlGnBu", "PuBuGn", "BuGn", "YlGn"
For a full list of available colour maps you can refer to this list.
For example, to plot our data with the perceptually uniform magma
colour map:
[12]:
ds.swir_1.plot(col="time", robust=True, cmap="magma")
[12]:
<xarray.plot.facetgrid.FacetGrid at 0x7fd98da58df0>
Plotting true or false colour RGB images
Although xarray
makes it easy to plot single band images, plotting a three band colour photo-like image is less straightforward.
To make this easier, the deafrica-sandbox-notebooks
repository provides a custom rgb()
function that is designed for plotting three band images. The rgb()
function maps three data variables/measurements from the loaded dataset to the red, green and blue channels that are used to make a three-colour image.
Providing the red
, green
and blue
measurements from a dataset will produce a true colour image (akin to how humans view the landscape). Providing nir
, red
and green
measurements or any other set of three satellite bands from a dataset will produce a false colour image. You can learn more about colour rendering here.
Hence, the rgb()
function can be used to visualise the data returned by a query. It requires the minimum input of:
ds:
Thexarray.Dataset
objectbands:
Three bands for display (these must be measurements found in the dataset)index:
The timestep to view, default is0
Plotting a single timestep
The time dimension of your xarray.Dataset
describes how many timesteps exist for your location during your nominated time period. In the rgb()
function, the index
variable is asking for which timestep you want to view (similar to the isel()
example above). Remember: counting in Python begins at 0 so to view the earliest timestep set index=0
:
[13]:
# View a red, green, blue (true colour) image of the first timestep
rgb(ds, bands=["red", "green", "blue"], index=0)
By changing the input bands, we can plot a false colour image which can provide different insights in a landscape. This band combination (swir_1
, nir
, green
) emphasises growing vegetation in green, and water in deep blue:
[14]:
# View a swir_1, nir, green (false colour) image of the first timestep
rgb(ds, bands=['swir_1', 'nir', 'green'], index=0)
Plotting multiple timesteps
As discussed in the single band example above, it can be useful to visualise multiple timesteps in a single plot (e.g. to compare change over time).
The rgb()
function allows you to do this by providing a list of multiple images to plot using index=[X, X, ...]
. For example, we can plot the first and fifth image in our dataset using index=[0, 4]
(remembering that counting in Python starts at 0):
[15]:
# View a true colour image for the first and fith timesteps
rgb(ds, bands=['red', 'green', 'blue'], index=[0, 4])
It is also possible to use rgb()
to plot all timesteps in a dataset using the col="time"
syntax we demonstrate in the single band example above:
[16]:
# Plot all timesteps in the dataset
rgb(ds, bands=['red', 'green', 'blue'], col="time")
Customising plot appearance
By default, rgb()
generates plots with robust=True
to improve the appearance of the images by clipping out the darkest and brightest 2% of pixels, using the 2nd and 98th percentiles of the data to compute the color limits.
If this default provides poor results, the plot’s colour stretch can be customised using the percentile_stretch
parameter. This allows you to clip the most extreme minimum and maximum values in the dataset, to improve the contrast and appearance of the plot.
For example, specifying percentile_stretch=[0.05, 0.95]
will clip out the darkest and brightest 5% of pixels, focusing the colour stretch on the remaining 90% of less extreme values:
[17]:
rgb(ds,
bands=['red', 'green', 'blue'],
index=0,
percentile_stretch=[0.05, 0.95])
Recommended next steps
For more advanced information about working with Jupyter Notebooks or JupyterLab, you can explore JupyterLab documentation page.
To continue working through the notebooks in this beginner’s guide, the following notebooks are designed to be worked through in the following order:
Plotting (this notebook)
Once you have you have completed the above tutorials, join advanced users in exploring:
The “Datasets” directory in the repository, where you can explore DE Africa products in depth.
The “Frequently used code” directory, which contains a recipe book of common techniques and methods for analysing DE Africa data.
The “Real-world examples” directory, which provides more complex workflows and analysis case studies.
Additional information
License: The code in this notebook is licensed under the Apache License, Version 2.0. Digital Earth Africa data is licensed under the Creative Commons by Attribution 4.0 license.
Contact: If you need assistance, please post a question on the Open Data Cube Slack channel or on the GIS Stack Exchange using the open-data-cube
tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on
Github.
Compatible datacube version:
[18]:
print(datacube.__version__)
1.8.15
Last Tested:
[19]:
from datetime import date
print(date.today())
2023-08-11