deafrica_tools.datahandling¶
Functions for loading and handling Digital Earth Africa data.
Functions
|
Create a single band GeoTIFF file with data from an array. |
|
Dilate a binary array by a specified nummber of pixels using a disk-like radial dilation. |
|
Downloads and unzips a .zip file from an external URL to a local directory. |
|
Finds the first occuring non-null value along the given dimension. |
|
Finds the last occuring non-null value along the given dimension. |
|
Loads analysis ready data. |
|
Takes a given query and returns the most common CRS for observations returned for that spatial extent. |
|
Finds the nearest values to a target label along the given dimension, for all other dimensions. |
|
Brovey pan sharpening on surface reflectance input using numexpr and return three xarrays. :param band_1: Three input multispectral bands, either as xarray.DataArrays or numpy.arrays. These bands should have already been resampled to the spatial resolution of the panchromatic band. :type band_1: xarray.DataArray or numpy.array :param band_2: Three input multispectral bands, either as xarray.DataArrays or numpy.arrays. These bands should have already been resampled to the spatial resolution of the panchromatic band. :type band_2: xarray.DataArray or numpy.array :param band_3: Three input multispectral bands, either as xarray.DataArrays or numpy.arrays. These bands should have already been resampled to the spatial resolution of the panchromatic band. :type band_3: xarray.DataArray or numpy.array :param pan_band: A panchromatic band corresponding to the above multispectral bands that will be used to pan-sharpen the data. :type pan_band: xarray.DataArray or numpy.array. |
|
Fuse two WOfS water measurements represented as ndarray objects. |
- deafrica_tools.datahandling.array_to_geotiff(fname, data, geo_transform, projection, nodata_val=0, dtype=osgeo.gdal.GDT_Float32)¶
Create a single band GeoTIFF file with data from an array.
Because this works with simple arrays rather than xarray datasets from DEA, it requires geotransform info ((upleft_x, x_size, x_rotation, upleft_y, y_rotation, y_size)) and projection data (in “WKT” format) for the output raster. These are typically obtained from an existing raster using the following GDAL calls:
>>> from osgeo import gdal >>> gdal_dataset = gdal.Open(raster_path) >>> geotrans = gdal_dataset.GetGeoTransform() >>> prj = gdal_dataset.GetProjection()
or alternatively, directly from an xarray dataset:
>>> geotrans = xarraydataset.geobox.transform.to_gdal() >>> prj = xarraydataset.geobox.crs.wkt
- Parameters
fname (str) – Output geotiff file path including extension
data (numpy array) – Input array to export as a geotiff
geo_transform (tuple) – Geotransform for output raster; e.g. (upleft_x, x_size, x_rotation, upleft_y, y_rotation, y_size)
projection (str) – Projection for output raster (in “WKT” format)
nodata_val (int, optional) – Value to convert to nodata in the output raster; default 0
dtype (gdal dtype object, optional) – Optionally set the dtype of the output raster; can be useful when exporting an array of float or integer values. Defaults to gdal.GDT_Float32
- deafrica_tools.datahandling.dilate(array, dilation=10, invert=True)¶
Dilate a binary array by a specified nummber of pixels using a disk-like radial dilation.
By default, invalid (e.g. False or 0) values are dilated. This is suitable for applications such as cloud masking (e.g. creating a buffer around cloudy or shadowed pixels). This functionality can be reversed by specifying invert=False.
- Parameters
array (array) – The binary array to dilate.
dilation (int, optional) – An optional integer specifying the number of pixels to dilate by. Defaults to 10, which will dilate array by 10 pixels.
invert (bool, optional) – An optional boolean specifying whether to invert the binary array prior to dilation. The default is True, which dilates the invalid values in the array (e.g. False or 0 values).
- Returns
An array of the same shape as array, with valid data pixels dilated by the number of pixels specified by dilation.
- Return type
array
- deafrica_tools.datahandling.download_unzip(url, output_dir=None, remove_zip=True)¶
Downloads and unzips a .zip file from an external URL to a local directory.
- Parameters
url (str) – A string giving a URL path to the zip file you wish to download and unzip
output_dir (str, optional) – An optional string giving the directory to unzip files into. Defaults to None, which will unzip files in the current working directory
remove_zip (bool, optional) – An optional boolean indicating whether to remove the downloaded .zip file after files are unzipped. Defaults to True, which will delete the .zip file.
- deafrica_tools.datahandling.first(array: xarray.DataArray, dim: str, index_name: Optional[str] = None) → xarray.DataArray¶
Finds the first occuring non-null value along the given dimension.
- Parameters
array (xr.DataArray) – The array to search.
dim (str) – The name of the dimension to reduce by finding the first non-null value.
- Returns
reduced – An array of the first non-null values. The dim dimension will be removed, and replaced with a coord of the same name, containing the value of that dimension where the last value was found.
- Return type
xr.DataArray
- deafrica_tools.datahandling.last(array: xarray.DataArray, dim: str, index_name: Optional[str] = None) → xarray.DataArray¶
Finds the last occuring non-null value along the given dimension.
- Parameters
array (xr.DataArray) – The array to search.
dim (str) – The name of the dimension to reduce by finding the last non-null value.
index_name (str, optional) – If given, the name of a coordinate to be added containing the index of where on the dimension the nearest value was found.
- Returns
reduced – An array of the last non-null values. The dim dimension will be removed, and replaced with a coord of the same name, containing the value of that dimension where the last value was found.
- Return type
xr.DataArray
- deafrica_tools.datahandling.load_ard(dc, products=None, min_gooddata=0.0, categories_to_mask_ls={'cloud': 'high_confidence', 'cloud_shadow': 'high_confidence'}, categories_to_mask_s2=['cloud high probability', 'cloud medium probability', 'thin cirrus', 'cloud shadows', 'saturated or defective'], categories_to_mask_s1=['invalid data'], mask_filters=None, mask_pixel_quality=True, ls7_slc_off=True, predicate=None, dtype='auto', verbose=True, **kwargs)¶
Loads analysis ready data.
Loads and combines Landsat USGS Collections 2, Sentinel-2, and Sentinel-1 for multiple sensors (i.e. ls5t, ls7e, ls8c and ls9 for Landsat; s2a and s2b for Sentinel-2), optionally applies pixel quality masks, and drops time steps that contain greater than a minimum proportion of good quality (e.g. non- cloudy or shadowed) pixels.
The function supports loading the following DE Africa products:
- Landsat:
ls5_sr (‘sr’ denotes surface reflectance)
ls7_sr
ls8_sr
ls9_sr
ls5_st (‘st’ denotes surface temperature)
ls7_st
ls8_st
ls9_st
- Sentinel-2:
s2_l2a
- Sentinel-1:
s1_rtc
Last modified: Feb 2021
- Parameters
dc (datacube Datacube object) – The Datacube to connect to, i.e. dc = datacube.Datacube(). This allows you to also use development datacubes if required.
products (list) –
A list of product names to load data from. For example:
Landsat C2:
['ls5_sr', 'ls7_sr', 'ls8_sr', 'ls9_sr']
Sentinel-2:
['s2_l2a']
Sentinel-1:
['s1_rtc']
min_gooddata (float, optional) – An optional float giving the minimum percentage of good quality pixels required for a satellite observation to be loaded. Defaults to 0.0 which will return all observations regardless of pixel quality (set to e.g. 0.99 to return only observations with more than 99% good quality pixels).
categories_to_mask_ls (dict, optional) – An optional dictionary that is used to identify poor quality pixels for masking. This mask is used for both masking out low quality pixels (e.g. cloud or shadow), and for dropping observations entirely based on the min_gooddata calculation.
categories_to_mask_s2 (list, optional) – An optional list of Sentinel-2 Scene Classification Layer (SCL) names that identify poor quality pixels for masking.
categories_to_mask_s1 (list, optional) – An optional list of Sentinel-1 mask names that identify poor quality pixels for masking.
mask_filters (iterable of tuples, optional) –
Iterable tuples of morphological operations - (“<operation>”, <radius>) to apply on mask, where:
- operation: string, can be one of these morphological operations:
'closing'
= remove small holes in cloud - morphological closing'opening'
= shrinks away small areas of the mask'dilation'
= adds padding to the mask'erosion'
= shrinks bright regions and enlarges dark regions
radius: int e.g.
mask_filters=[('erosion', 5),("opening", 2),("dilation", 2)]
mask_pixel_quality (bool, optional) – An optional boolean indicating whether to apply the poor data mask to all observations that were not filtered out for having less good quality pixels than
min_gooddata
. E.g. ifmin_gooddata=0.99
, the filtered observations may still contain up to 1% poor quality pixels. The default ofFalse
simply returns the resulting observations without masking out these pixels;True
masks them and sets them to NaN using the poor data mask. This will convert numeric values to floating point values which can cause memory issues, set to False to prevent this.ls7_slc_off (bool, optional) – An optional boolean indicating whether to include data from after the Landsat 7 SLC failure (i.e. SLC-off). Defaults to
True
, which keeps all Landsat 7 observations > May 31 2003.predicate (function, optional) – An optional function that can be passed in to restrict the datasets that are loaded by the function. A filter function should take a datacube.model.Dataset object as an input (i.e. as returned from dc.find_datasets), and return a boolean. For example, a filter function could be used to return True on only datasets acquired in January:
dataset.time.begin.month == 1
dtype (string, optional) – An optional parameter that controls the data type/dtype that layers are coerced to after loading. Valid values: ‘’native’’,
'auto'
,'float{16|32|64}'
. When'auto'
is used, the data will be converted to'float32'
if masking is used, otherwise data will be returned in the native data type of the data. Be aware that if data is loaded in its native dtype, nodata and masked pixels will be returned with the data’s native nodata value (typically-999
), notNaN
. NOTE: If loading Landsat, the data is automatically rescaled so ‘native’ dtype will return a value error.verbose (bool, optional) – If True, print progress statements during loading
**kwargs (dict, optional) – A set of keyword arguments to
dc.load
that define the spatiotemporal query used to extract data. This typically includesmeasurements
,x`, ``y
,time
,resolution
,resampling
,group_by
andcrs
. Keyword arguments can either be listed directly in theload_ard
call like any other parameter (e.g.measurements=['red']
), or by passing in a query kwarg dictionary (e.g.**query
). For a list of possible options, see thedc.load
documentation: https://datacube-core.readthedocs.io/en/latest/dev/api/generate/datacube.Datacube.load.html
- Returns
combined_ds – An xarray dataset containing only satellite observations that contains greater than min_gooddata proportion of good quality pixels.
- Return type
xarray Dataset
- deafrica_tools.datahandling.mostcommon_crs(dc, product, query)¶
Takes a given query and returns the most common CRS for observations returned for that spatial extent. This can be useful when your study area lies on the boundary of two UTM zones, forcing you to decide which CRS to use for your output_crs in dc.load.
- Parameters
dc (datacube Datacube object) – The Datacube to connect to, i.e. dc = datacube.Datacube(). This allows you to also use development datacubes if required.
product (str) – A product name to load CRSs from
query (dict) – A datacube query including x, y and time range to assess for the most common CRS
- Returns
A EPSG string giving the most common CRS from all datasets returned by the query above
- Return type
str
- deafrica_tools.datahandling.nearest(array: xarray.DataArray, dim: str, target, index_name: Optional[str] = None) → xarray.DataArray¶
Finds the nearest values to a target label along the given dimension, for all other dimensions.
E.g. For a DataArray with dimensions (‘time’, ‘x’, ‘y’)
nearest_array = nearest(array, ‘time’, ‘2017-03-12’)
will return an array with the dimensions (‘x’, ‘y’), with non-null values found closest for each (x, y) pixel to that location along the time dimension.
The returned array will include the ‘time’ coordinate for each x,y pixel that the nearest value was found.
- Parameters
array (xr.DataArray) – The array to search.
dim (str) – The name of the dimension to look for the target label.
target (same type as array[dim]) – The value to look up along the given dimension.
index_name (str, optional) – If given, the name of a coordinate to be added containing the index of where on the dimension the nearest value was found.
- Returns
nearest_array – An array of the nearest non-null values to the target label. The dim dimension will be removed, and replaced with a coord of the same name, containing the value of that dimension closest to the given target label.
- Return type
xr.DataArray
- deafrica_tools.datahandling.pan_sharpen_brovey(band_1, band_2, band_3, pan_band)¶
Brovey pan sharpening on surface reflectance input using numexpr and return three xarrays. :param band_1: Three input multispectral bands, either as xarray.DataArrays or
numpy.arrays. These bands should have already been resampled to the spatial resolution of the panchromatic band.
- Parameters
band_2 (xarray.DataArray or numpy.array) – Three input multispectral bands, either as xarray.DataArrays or numpy.arrays. These bands should have already been resampled to the spatial resolution of the panchromatic band.
band_3 (xarray.DataArray or numpy.array) – Three input multispectral bands, either as xarray.DataArrays or numpy.arrays. These bands should have already been resampled to the spatial resolution of the panchromatic band.
pan_band (xarray.DataArray or numpy.array) – A panchromatic band corresponding to the above multispectral bands that will be used to pan-sharpen the data.
- Returns
band_1_sharpen, band_2_sharpen, band_3_sharpen – Three numpy arrays equivelent to band_1, band_2 and band_3 pan-sharpened to the spatial resolution of pan_band.
- Return type
numpy.arrays
- deafrica_tools.datahandling.wofs_fuser(dest, src)¶
Fuse two WOfS water measurements represented as ndarray objects.
Note: this is a copy of the function located here: https://github.com/GeoscienceAustralia/digitalearthau/blob/develop/digitalearthau/utils.py