Cloud and pixel quality masking for Landsat¶
Products used: ls8_sr
Keywords data methods; dilate, data methods; masking, cloud masking, data methods; buffering
Background¶
In the past, remote sensing researchers would reject partly cloud-affected scenes in favour of cloud-free scenes. However, multi-temporal analysis techniques increasingly make use of every quality assured pixel within a time series of observations. The capacity to automatically exclude low quality pixels (e.g. clouds, shadows or other invalid data) is essential for enabling these new remote sensing methods.
Analysis-ready satellite data from Digital Earth Africa includes pixel quality information that can be used to easily “mask” data (i.e. keep only certain pixels in an image) to obtain a time series containing only clear or cloud-free pixels.
Description¶
In this notebook, we show how to mask Digital Earth Africa Landsat satellite data using boolean masks. The notebook demonstrates how to:
Load in a time series of satellite data including the
pixel_qa
pixel quality bandInspect the band’s
flags_definition
attributesCreate a mask where pixels are cloudy, have cloud-shadow, or no-data
Apply binary morphological operators on the cloudy pixels to improve the mask
Masking of high aerosols
Apply the masks to the satellite data so we retain only the good quality observations, and plot the results
Rescaling of Landsat data and masking invalid data
Using
load_ard
to mask poor quality pixels, taking advantage of all its in-built features for masking
Digital Earth Africa provides wrapper functions that will automatically provide cloud-masked satellite data, more information can be found in the Using_load_ard notebook.
Getting started¶
First we import relevant packages and connect to the datacube. Then we define our example area of interest and load in a time series of satellite data.
[1]:
%matplotlib inline
import datacube
from pprint import pprint
from datacube.utils import masking
from odc.algo import mask_cleanup, erase_bad, to_f32
from deafrica_tools.plotting import rgb
from deafrica_tools.datahandling import mostcommon_crs, load_ard
Connect to the datacube¶
[2]:
dc = datacube.Datacube(app="Masking_data")
Create a query and load satellite data¶
To demonstrate how to mask satellite data, we will load Landsat 8 surface reflectance RGB data along with a pixel quality classification band called pixel_quality
.
[3]:
# Region of interest
lat, lon = 35.7718, -5.8096
buffer = 0.03
# Create a reusable query
query = {
'x': (lon-buffer, lon+buffer),
'y': (lat+buffer, lat-buffer),
'time': ('2016-09', '2016-10'),
}
# Identify the most common projection system in the input query
output_crs = mostcommon_crs(dc=dc, product='ls8_sr', query=query)
# Load data from the Landsat-8
data = dc.load(product="ls8_sr",
measurements=["blue", "green", "red",
"pixel_quality", "qa_aerosol"],
output_crs=output_crs,
resolution=(-30, 30),
align=(15, 15),
**query)
print(data)
/env/lib/python3.8/site-packages/deafrica_tools/datahandling.py:721: UserWarning: Multiple UTM zones ['epsg:32630', 'epsg:32629'] were returned for this query. Defaulting to the most common zone: epsg:32630
warnings.warn(
<xarray.Dataset>
Dimensions: (time: 7, y: 228, x: 188)
Coordinates:
* time (time) datetime64[ns] 2016-09-02T11:03:08.816501 ... 2016-...
* y (y) float64 3.966e+06 3.966e+06 ... 3.959e+06 3.959e+06
* x (x) float64 2.432e+05 2.432e+05 ... 2.488e+05 2.488e+05
spatial_ref int32 32630
Data variables:
blue (time, y, x) uint16 8398 8338 8295 8216 ... 16903 19403 21435
green (time, y, x) uint16 8813 8771 8707 8622 ... 20944 23082 23060
red (time, y, x) uint16 8235 8199 8136 8077 ... 22769 24061 24180
pixel_quality (time, y, x) uint16 23888 23888 23888 ... 21824 21824 21824
qa_aerosol (time, y, x) uint8 224 224 224 224 224 ... 194 224 224 64 96
Attributes:
crs: epsg:32630
grid_mapping: spatial_ref
The absence of satellite observation is indicated by a “nodata” value for the band, which is listed under the Attributes category of the returned xarray.DataArray
.
[4]:
data.red.attrs
[4]:
{'units': '1', 'nodata': 0, 'crs': 'epsg:32630', 'grid_mapping': 'spatial_ref'}
We see that the nodata
attribute reports the value 0
.
We can find the classification scheme of the pixel_qa
band in its flags definition.
[5]:
flags_def = data.pixel_quality.attrs["flags_definition"]
pprint(flags_def)
{'cirrus': {'bits': 2,
'values': {'0': 'not_high_confidence', '1': 'high_confidence'}},
'cirrus_confidence': {'bits': [14, 15],
'values': {'0': 'none',
'1': 'low',
'2': 'reserved',
'3': 'high'}},
'clear': {'bits': 6, 'values': {'0': False, '1': True}},
'cloud': {'bits': 3,
'values': {'0': 'not_high_confidence', '1': 'high_confidence'}},
'cloud_confidence': {'bits': [8, 9],
'values': {'0': 'none',
'1': 'low',
'2': 'medium',
'3': 'high'}},
'cloud_shadow': {'bits': 4,
'values': {'0': 'not_high_confidence',
'1': 'high_confidence'}},
'cloud_shadow_confidence': {'bits': [10, 11],
'values': {'0': 'none',
'1': 'low',
'2': 'reserved',
'3': 'high'}},
'dilated_cloud': {'bits': 1, 'values': {'0': 'not_dilated', '1': 'dilated'}},
'nodata': {'bits': 0, 'values': {'0': False, '1': True}},
'snow': {'bits': 5,
'values': {'0': 'not_high_confidence', '1': 'high_confidence'}},
'snow_ice_confidence': {'bits': [12, 13],
'values': {'0': 'none',
'1': 'low',
'2': 'reserved',
'3': 'high'}},
'water': {'bits': 7, 'values': {'0': 'land_or_cloud', '1': 'water'}}}
We see that pixel_quality
also reports the nodata
pixels, and along with the cloud
and cloud_shadow
pixels, it also picks up snow
and water
pixels.
Creating a cloud and pixel quality mask¶
We create a mask by specifying conditions that our pixels must satisfy. But we will only need the labels (not the values) to create a mask.
In this example below, we want to exclude multiple catagories of bad pixels, i.e. if a pixel has any of the flags (cloud, cirrus or cloud_shadow) set, the pixel will be masked.
[6]:
quality_flags = dict(
cloud="high_confidence", # True where there is cloud
cirrus="high_confidence",# True where there is cirrus cloud
cloud_shadow="high_confidence",# True where there is cloud shadow
)
# set cloud_mask: True=cloud, False=non-cloud
mask, _= masking.create_mask_value(flags_def, **quality_flags)
#add the cloud mask to our dataset
data['cloud_mask'] = (data['pixel_quality'] & mask) != 0
Below, we’ll plot the pixel quality mask along with the true colour satellite images.
Does the cloud mask exactly match the clouds you see in the RGB plots? Landsat’s pixel quality algorithm has known limitations that result in bright objects, such as beaches and cities, mistakenly being classified as clouds.
[7]:
#plot the locations where there are clouds and cloud shadows
data['cloud_mask'].plot(col="time", col_wrap=7, add_colorbar=False);

[8]:
# Plot in True colour
rgb(data, col="time", col_wrap=7)

Applying morphological processing on the cloud mask¶
We can improve on the false positives detected by Landsat’s pixel quality mask by applying binary morphological image processing techniques (e.g. binary_closing, binary_erosion etc.). The
Open-Data-Cube library odc-algo has a function, odc.algo.mask_cleanup
that can perform a few of these operations. Below, we will try to improve the cloud mask by apply a number of these filters.
Feel free to experiment with the values in filters
[9]:
# set the filters to apply. The integers refer to the number of pixels
filters = [("opening", 2),("dilation", 2)]
[10]:
# Use the mask_cleanup function to apply the filters
data['cloud_mask_filtered'] = mask_cleanup(data['cloud_mask'], mask_filters=filters)
[11]:
#plot the results
data['cloud_mask_filtered'].plot(col="time", col_wrap=7, add_colorbar=False);

Masking pixels with high aerosol¶
The atmospheric correction algorithm applied to Landsat uses aerosol as an input. When the aerosol value is high, the surface reflectance value may be unreliable especially over dark surfaces. Therefore for some applications, masking data with high aerosol may help remove unreliable observations.
In the example below, we use the masking.make_mask()
function to create a mask for pixels with “high aerosol” and combine it with the filtered cloud mask from above.
[12]:
# find the bit mask flags and print them
aerosol_flags = data.qa_aerosol.attrs["flags_definition"]
pprint(aerosol_flags)
#create a dict of flags we want to mask
aerosol_quality_flags = dict(
aerosol_level="high",
)
#create a boolean mask where high-aerosol = True
data['aerosol_mask'] = masking.make_mask(data['qa_aerosol'], **aerosol_quality_flags)
{'aerosol_level': {'bits': [6, 7],
'values': {'0': 'climatology',
'1': 'low',
'2': 'medium',
'3': 'high'}},
'interpolated_aerosol': {'bits': 5, 'values': {'0': False, '1': True}},
'nodata': {'bits': 0, 'values': {'0': False, '1': True}},
'valid_aerosol_retrieval': {'bits': 1, 'values': {'0': False, '1': True}},
'water': {'bits': 2, 'values': {'0': False, '1': True}}}
[13]:
data['aerosol_mask'].plot(col="time", col_wrap=7);

Now we combine the aerosol mask with the clear_filtered
mask to produce of mask where the data is unreliable either due to cloud or high aerosols
[14]:
data['cloud_aerosol_mask'] = data['aerosol_mask'] | data['cloud_mask_filtered']
#plot the result
data['cloud_aerosol_mask'].plot(col="time", col_wrap=7);

Applying the cloud-mask¶
We can now get the clear images we want by erasing the cloud, non-data, and/or high-aerosol pixels from the data. We’ll do this for the origial PQ mask, the PQ mask that underwent binary morphological image processing, and the mask with additional aerosol masking.
[15]:
# erase pixels with cloud
clear = erase_bad(data.drop_vars(['cloud_mask_filtered', 'cloud_mask', 'cloud_aerosol_mask', 'aerosol_mask']),
data['cloud_mask'])
# erase pixels with the cloud_filtering
clear_filtered = erase_bad(data.drop_vars(['cloud_mask_filtered', 'cloud_mask', 'cloud_aerosol_mask','aerosol_mask']),
data['cloud_mask_filtered'])
# erase pixels with cloud and high aerosol
clear_filtered_aerosol = erase_bad(data.drop_vars(['cloud_mask_filtered', 'cloud_mask', 'cloud_aerosol_mask','aerosol_mask']),
data['cloud_aerosol_mask'])
Plot the results of our ‘clear_filtered’ masking¶
As you can see, the morphological filtering operations have minimised the impact of the false-postives in the cloud mask over the cities and along the coast
[17]:
rgb(clear_filtered, col="time", col_wrap=7)

Plot the results of our ‘clear_filtered’ and ‘aerosol’ masked data¶
Masking high aerosols removes many pixels over the ocean where surface reflectance values are low and unreliable. Noting, however, this might also remove useful observations. Users should use this mask with care.
[18]:
rgb(clear_filtered_aerosol, col="time", col_wrap=7)

Recaling Landsat data and masking invalid data¶
Rescale the Landsat data to ~0-1 in accordance with the scale and offset factors provided by the USGS Landsat Collection 2 documentation.
Note: USGS Landsat data has different scale and offset values depending on the measurement. The example scale and offset values shown below only applies to the surface reflectance bands
Before we can rescale the values, we need to ensure the data is first converted to float32
, this makes sure the no-data values in the native data-type (0
) are converted to NaN
, thus ensuring the no-data values in the rescaled data are also NaN
.
[19]:
# Convert the data to `float32` and set no-data values to `NaN`:
clear_filtered = to_f32(clear_filtered)
[20]:
#Apply the scale and offset factors to the Landsat data
sr_bands = ["blue", "green", "red"]
for band in sr_bands:
clear_filtered[band] = 2.75e-5 * clear_filtered[band] - 0.2
Below we replot clear_filtered
in true colour. Note the land surface looks more vivid because the function rgb
is no longer including the 0
values in the colour ramps, and because the surface reflectance has been rescaled to its intended values (~0-1).
[21]:
rgb(clear_filtered, col="time", col_wrap=7)

Cloud masking, morphological filtering, and rescaling with load_ard
¶
Most of the steps undertaken above can be achieved by using the function deafrica_tools.datahanding.load_ard
. This function will create a cloud mask, apply morphological filters (if desired), and apply the Landsat scale and offset factors.
Note: There are some limitations to
load_ard
due to its design. An important part of its intended functionality is to provide a simple means of concatenating Landsat 5, 7, & 8 together to form a single xarray object. However, this means thatload_ard
can only apply pixel quality masking to pq-categories that are common across Landsat sensors. For example, Landsat-8 has a dedicated bit flag for cirrus bands, but Landsat 5 and 7 do not. This means thatload_ard
cannot accept'cirrus': 'high_confidence'
as a pq-category. The same issue also applies for masking of aerosols, Landat 5 and 7 have different means of recording high aerosol than Landsat 8, and thusload_ard
does not support masking of aerosols. However,load_ard
does remove negative surface reflectance values (negative values are an occassional artefact of the correction algorithms employed by USGS), which often coincide with areas prone to high aerosol values such as large waterbodies and along the coast.
[22]:
# set the filters to apply
filters = [("opening", 2),("dilation", 2)]
# # Load data from the Landsat-8
data = load_ard(dc=dc,
products=["ls8_sr"],
measurements=["blue", "green", "red"],
mask_filters=filters,
output_crs=output_crs,
resolution=(-30, 30),
align=(15,15),
**query)
# plot the data
rgb(data, col="time", col_wrap=7)
Using pixel quality parameters for USGS Collection 2
Finding datasets
ls8_sr
Applying morphological filters to pq mask [('opening', 2), ('dilation', 2)]
Applying pixel quality/cloud mask
Re-scaling Landsat C2 data
Loading 7 time steps

Additional information¶
License: The code in this notebook is licensed under the Apache License, Version 2.0. Digital Earth Africa data is licensed under the Creative Commons by Attribution 4.0 license.
Contact: If you need assistance, please post a question on the Open Data Cube Slack channel or on the GIS Stack Exchange using the open-data-cube
tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on
Github.
Compatible datacube version:
[23]:
print(datacube.__version__)
1.8.6
Last Tested:
[24]:
from datetime import datetime
datetime.today().strftime('%Y-%m-%d')
[24]:
'2021-11-17'