Masking data

Keywords data methods; dilate, data methods; masking, cloud masking, data methods; buffering

Background

In the past, remote sensing researchers would reject partly cloud-affected scenes in favour of cloud-free scenes. However, multi-temporal analysis techniques increasingly make use of every quality assured pixel within a time series of observations. The capacity to automatically exclude low quality pixels (e.g. clouds, shadows or other invalid data) is essential for enabling these new remote sensing methods.

Analysis-ready satellite data from Digital Earth Africa includes pixel quality information that can be used to easily “mask” data (i.e. keep only certain pixels in an image) to obtain a time series containing only clear or cloud-free pixels.

Description

In this notebook, we show how to mask Digital Earth Africa satellite data using boolean masks. The notebook demonstrates how to:

  1. Load in a time series of satellite data including the pixel_qa pixel quality band

  2. Inspect the band’s flags_definition attributes

  3. Create clear and cloud-free masks and apply these to the satellite data

  4. Buffer cloudy/shadowed pixels by dilating the masks by a specified number of pixels

  5. Mask out invalid nodata values and replace them with NaN

Digital Earth Africa provides wrapper functions that will automatically provide cloud-masked satellite data, more information can be found in the Using_load_ard notebook.


Getting started

First we import relevant packages and connect to the datacube. Then we define our example area of interest and load in a time series of satellite data.

[1]:
%matplotlib inline

import scipy.ndimage
import xarray
import numpy
import datacube
from datacube.storage.masking import make_mask
from datacube.storage.masking import mask_invalid_data

from deafrica_tools.plotting import rgb
from deafrica_tools.datahandling import dilate, mostcommon_crs
/env/lib/python3.6/site-packages/datacube/storage/masking.py:8: DeprecationWarning: datacube.storage.masking has moved to datacube.utils.masking
  category=DeprecationWarning)
/env/lib/python3.6/site-packages/geopandas/_compat.py:88: UserWarning: The Shapely GEOS version (3.7.2-CAPI-1.11.0 ) is incompatible with the GEOS version PyGEOS was compiled with (3.9.0-CAPI-1.16.2). Conversions between both will be slow.
  shapely_geos_version, geos_capi_version_string

Connect to the datacube

[2]:
dc = datacube.Datacube(app="Masking_data")

Create a query and load satellite data

To demonstrate how to mask satellite data, we will load Landsat 8 surface reflectance RGB data along with a pixel quality classification band called pixel_quality.

[3]:
# Region of interest
lat, lon = 12.791, -15.502
buffer = 0.1

# Create a reusable query
query = {
    'x': (lon-buffer, lon+buffer),
    'y': (lat+buffer, lat-buffer),
    'time': ('2016-08', '2016-12-06'),
}

# Identify the most common projection system in the input query
output_crs = mostcommon_crs(dc=dc, product='ls8_sr', query=query)

# Load data from the Landsat-8
data = dc.load(product="ls8_sr",
               measurements=["blue", "green", "red", "pixel_quality"],
               output_crs=output_crs,
               resolution=[-30, 30],
               align=(15, 15),
               **query)
print(data)
<xarray.Dataset>
Dimensions:        (time: 8, x: 726, y: 739)
Coordinates:
  * time           (time) datetime64[ns] 2016-08-15T11:21:45.232634 ... 2016-...
  * y              (y) float64 1.425e+06 1.425e+06 ... 1.403e+06 1.403e+06
  * x              (x) float64 4.346e+05 4.347e+05 ... 4.564e+05 4.564e+05
    spatial_ref    int32 32628
Data variables:
    blue           (time, y, x) uint16 8253 8736 8003 7862 ... 9519 8775 8916
    green          (time, y, x) uint16 10127 10761 9727 9548 ... 10413 9508 9782
    red            (time, y, x) uint16 9519 10424 9318 9010 ... 11601 9684 10011
    pixel_quality  (time, y, x) uint16 21824 21824 21824 ... 21824 21824 21824
Attributes:
    crs:           epsg:32628
    grid_mapping:  spatial_ref
[4]:
# Plot the data
rgb(data, col="time")
../../../_images/sandbox_notebooks_Frequently_used_code_Masking_data_10_0.png

The absence of satellite observation is indicated by a “nodata” value for the band, which is listed under the Attributes category of the returned xarray.DataArray.

[5]:
print(data.red)
<xarray.DataArray 'red' (time: 8, y: 739, x: 726)>
array([[[ 9519, 10424,  9318, ..., 18585, 21366, 22795],
        [ 9497, 10992, 10368, ..., 20004, 22206, 22902],
        [10510, 11042, 10990, ..., 19816, 21748, 22241],
        ...,
        [29451, 30429, 30940, ..., 13357, 12144,  9143],
        [29342, 30736, 30299, ..., 12619, 12096,  9457],
        [30444, 31074, 30231, ..., 11679, 10375,  9420]],

       [[ 9002,  9799,  9508, ..., 25367, 26185, 27477],
        [ 8989,  9733,  9848, ..., 25324, 27525, 28802],
        [ 9608, 10090,  9736, ..., 27321, 30503, 30816],
        ...,
        [11707, 13304, 12363, ..., 10994, 10600,  8600],
        [12216, 13399, 13373, ..., 10039,  9628,  9038],
        [13771, 14817, 14099, ...,  9938,  9012,  8978]],

       [[16202, 16285, 16341, ..., 14601, 14691, 14710],
        [16385, 16443, 16458, ..., 14575, 14657, 14735],
        [16458, 16466, 16469, ..., 14535, 14712, 14785],
        ...,
...
        ...,
        [ 8413,  8399,  8378, ..., 11621, 10354,  9103],
        [ 8340,  8342,  8375, ..., 11133, 10249,  9689],
        [ 8393,  8341,  8362, ..., 10645,  9354,  9495]],

       [[ 9067,  9592,  9057, ...,  8747,  8760,  8794],
        [ 9317,  9671,  9501, ...,  8866,  8822,  8702],
        [ 9770,  9616,  9462, ...,  8769,  8929,  8682],
        ...,
        [ 8607,  8658,  8676, ..., 11697, 10819,  9313],
        [ 8562,  8586,  8659, ..., 11440, 10706,  9864],
        [ 8727,  8602,  8633, ..., 10550,  9657,  9927]],

       [[ 9138,  9567,  9346, ...,  8779,  8817,  8930],
        [ 9190,  9837,  9393, ...,  9060,  8932,  8789],
        [ 9821,  9833,  9505, ...,  9065,  8935,  8764],
        ...,
        [ 8586,  8604,  8561, ..., 12400, 10989,  9317],
        [ 8490,  8515,  8544, ..., 11960, 10946, 10074],
        [ 8600,  8460,  8484, ..., 11601,  9684, 10011]]], dtype=uint16)
Coordinates:
  * time         (time) datetime64[ns] 2016-08-15T11:21:45.232634 ... 2016-12...
  * y            (y) float64 1.425e+06 1.425e+06 ... 1.403e+06 1.403e+06
  * x            (x) float64 4.346e+05 4.347e+05 ... 4.564e+05 4.564e+05
    spatial_ref  int32 32628
Attributes:
    units:         1
    nodata:        0
    crs:           epsg:32628
    grid_mapping:  spatial_ref

We see that the nodata attribute reports the value -9999.

We can find the classification scheme of the pixel_qa band in its flags definition.

[6]:
data.pixel_quality.attrs["flags_definition"]
[6]:
{'snow': {'bits': 5,
  'values': {'0': 'not_high_confidence', '1': 'high_confidence'}},
 'clear': {'bits': 6, 'values': {'0': False, '1': True}},
 'cloud': {'bits': 3,
  'values': {'0': 'not_high_confidence', '1': 'high_confidence'}},
 'water': {'bits': 7, 'values': {'0': 'land_or_cloud', '1': 'water'}},
 'cirrus': {'bits': 2,
  'values': {'0': 'not_high_confidence', '1': 'high_confidence'}},
 'nodata': {'bits': 0, 'values': {'0': False, '1': True}},
 'cloud_shadow': {'bits': 4,
  'values': {'0': 'not_high_confidence', '1': 'high_confidence'}},
 'dilated_cloud': {'bits': 1, 'values': {'0': 'not_dilated', '1': 'dilated'}},
 'cloud_confidence': {'bits': [8, 9],
  'values': {'0': 'none', '1': 'low', '2': 'medium', '3': 'high'}},
 'cirrus_confidence': {'bits': [14, 15],
  'values': {'0': 'none', '1': 'low', '2': 'reserved', '3': 'high'}},
 'snow_ice_confidence': {'bits': [12, 13],
  'values': {'0': 'none', '1': 'low', '2': 'reserved', '3': 'high'}},
 'cloud_shadow_confidence': {'bits': [10, 11],
  'values': {'0': 'none', '1': 'low', '2': 'reserved', '3': 'high'}}}
[7]:
data.pixel_quality.plot(col="time", col_wrap=4)
[7]:
<xarray.plot.facetgrid.FacetGrid at 0x7f6bfc9c7828>
../../../_images/sandbox_notebooks_Frequently_used_code_Masking_data_15_1.png

We see that pixel_quality also reports the nodata pixels, and along with the cloud and shadow pixels, it also picks up aerosol and water pixels.

Cloud-free images

Creating the clear-pixel mask

We create a mask by specifying conditions that our pixels must satisfy. But we will only need the labels (not the values) to create a mask.

[8]:
# Create the mask based on "valid" pixels
quality_flags_prod = {'cloud_shadow': 'not_high_confidence',
                      'cloud': 'not_high_confidence',
                      'nodata': False}

#Use the datacube make_mask fucntion to create the clear pixels mask
clear_mask = make_mask(data['pixel_quality'], **quality_flags_prod)

#plot
clear_mask.plot(col="time", col_wrap=4)
[8]:
<xarray.plot.facetgrid.FacetGrid at 0x7f6bfc665a90>
../../../_images/sandbox_notebooks_Frequently_used_code_Masking_data_19_1.png

Applying the clear-pixel mask

We can now get the clear images we want.

[9]:
# Apply the mask
clear = data.where(clear_mask)

rgb(clear, col="time")
../../../_images/sandbox_notebooks_Frequently_used_code_Masking_data_21_0.png

Mask dilation

Sometimes we want our cloud masks to be more conservative and mask out more than just the pixels that pixel_quality classified as cloud or cloud shadow. That is, sometimes we want a buffer around the cloud and the shadow. We can achieve this by dilating the mask using the dilate function from deafrica_tools.datahandling.

[10]:
# Set the number of pixels we want to buffer cloud and cloud shadow by
dilation_in_pixels = 10

# Apply the `dilate` function to each array in `cloud_free_mask`
# and return a matching xarray.DataArray object
buffered_mask = xarray.apply_ufunc(dilate,
                                   clear_mask,
                                   dilation_in_pixels,
                                   keep_attrs=True)

buffered_mask.plot(col="time", col_wrap=4)
[10]:
<xarray.plot.facetgrid.FacetGrid at 0x7f6bfc268c88>
../../../_images/sandbox_notebooks_Frequently_used_code_Masking_data_24_1.png
[11]:
# Apply the mask
buffered_cloud_free = data.where(buffered_mask)

rgb(buffered_cloud_free, col="time")
../../../_images/sandbox_notebooks_Frequently_used_code_Masking_data_25_0.png

Masking out invalid data

We sometimes may also want to mask out only the nodata pixels and retain everything the satellite observed. In this case we are effectively just replacing our nodata value with the more conventional NaN values.

[12]:
# Set invalid nodata pixels to NaN
valid_data = mask_invalid_data(data)

rgb(valid_data, col='time')
../../../_images/sandbox_notebooks_Frequently_used_code_Masking_data_28_0.png

Fortunately our region of interest did not have missing pixels, so these images look identical to the non-masked images from earlier.


Additional information

License: The code in this notebook is licensed under the Apache License, Version 2.0. Digital Earth Africa data is licensed under the Creative Commons by Attribution 4.0 license.

Contact: If you need assistance, please post a question on the Open Data Cube Slack channel or on the GIS Stack Exchange using the open-data-cube tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on Github.

Compatible datacube version:

[13]:
print(datacube.__version__)
1.8.4.dev52+g07bc51a5

Last Tested:

[14]:
from datetime import datetime
datetime.today().strftime('%Y-%m-%d')
[14]:
'2021-05-20'