deafrica_tools.spatial

Spatial analyses functions for Digital Earth Africa data.

Functions

add_geobox(ds[, crs])

Ensure that an xarray DataArray has a GeoBox and .odc.* accessor using odc.geo.

contours_to_arrays(gdf, col)

This function converts a polyline shapefile into an array with three columns giving the X, Y and Z coordinates of each vertex.

interpolate_2d(ds, x_coords, y_coords, z_coords)

This function takes points with X, Y and Z coordinates, and interpolates Z-values across the extent of an existing xarray dataset.

largest_region(bool_array, **kwargs)

Takes a boolean array and identifies the largest contiguous region of connected True values.

reverse_geocode(coords[, site_classes, ...])

Takes a latitude and longitude coordinate, and performs a reverse geocode to return a plain-text description of the location in the form:

subpixel_contours(da[, z_values, crs, ...])

Uses skimage.measure.find_contours to extract multiple z-value contour lines from a two-dimensional array (e.g.

sun_angles(dc, query)

For a given spatiotemporal query, calculate mean sun azimuth and elevation for each satellite observation, and return these as a new xarray.Dataset with 'sun_elevation' and 'sun_azimuth' variables.

transform_geojson_wgs_to_epsg(geojson, EPSG)

Takes a geojson dictionary and converts it from WGS84 (EPSG:4326) to desired EPSG

xr_rasterize(gdf, da[, attribute_col, crs, ...])

Rasterizes a vector geopandas.GeoDataFrame into a raster xarray.DataArray.

xr_vectorize(da[, attribute_col, crs, ...])

Vectorises a raster xarray.DataArray into a vector geopandas.GeoDataFrame.

zonal_stats_parallel(shp, raster, ...)

Summarizing raster datasets based on vector geometries in parallel.

deafrica_tools.spatial.add_geobox(ds, crs=None)

Ensure that an xarray DataArray has a GeoBox and .odc.* accessor using odc.geo.

If ds is missing a Coordinate Reference System (CRS), this can be supplied using the crs param.

Parameters
  • ds (xarray.Dataset or xarray.DataArray) – Input xarray object that needs to be checked for spatial information.

  • crs (str, optional) – Coordinate Reference System (CRS) information for the input ds array. If ds already has a CRS, then crs is not required. Default is None.

Returns

The input xarray object with added .odc.x attributes to access spatial information.

Return type

xarray.Dataset or xarray.DataArray

deafrica_tools.spatial.contours_to_arrays(gdf, col)

This function converts a polyline shapefile into an array with three columns giving the X, Y and Z coordinates of each vertex. This data can then be used as an input to interpolation procedures (e.g. using a function like interpolate_2d.

Last modified: October 2021

Parameters
  • gdf (Geopandas GeoDataFrame) – A GeoPandas GeoDataFrame of lines to convert into point coordinates.

  • col (str) – A string giving the name of the GeoDataFrame field to use as Z-values.

Returns

  • A numpy array with three columns giving the X, Y and Z coordinates

  • of each vertex in the input GeoDataFrame.

deafrica_tools.spatial.interpolate_2d(ds, x_coords, y_coords, z_coords, method='linear', factor=1, verbose=False, **kwargs)

This function takes points with X, Y and Z coordinates, and interpolates Z-values across the extent of an existing xarray dataset. This can be useful for producing smooth surfaces from point data that can be compared directly against satellite data derived from an OpenDataCube query.

Supported interpolation methods include ‘linear’, ‘nearest’ and ‘cubic (using scipy.interpolate.griddata), and ‘rbf’ (using scipy.interpolate.Rbf).

Last modified: February 2020

Parameters
  • ds (xarray DataArray or Dataset) – A two-dimensional or multi-dimensional array from which x and y dimensions will be copied and used for the area in which to interpolate point data.

  • x_coords (numpy array) – Arrays containing X and Y coordinates for all points (e.g. longitudes and latitudes).

  • y_coords (numpy array) – Arrays containing X and Y coordinates for all points (e.g. longitudes and latitudes).

  • z_coords (numpy array) – An array containing Z coordinates for all points (e.g. elevations). These are the values you wish to interpolate between.

  • method (string, optional) – The method used to interpolate between point values. This string is either passed to scipy.interpolate.griddata (for ‘linear’, ‘nearest’ and ‘cubic’ methods), or used to specify Radial Basis Function interpolation using scipy.interpolate.Rbf (‘rbf’). Defaults to ‘linear’.

  • factor (int, optional) – An optional integer that can be used to subsample the spatial interpolation extent to obtain faster interpolation times, then up-sample this array back to the original dimensions of the data as a final step. For example, setting factor=10 will interpolate data into a grid that has one tenth of the resolution of ds. This approach will be significantly faster than interpolating at full resolution, but will potentially produce less accurate or reliable results.

  • verbose (bool, optional) – Print debugging messages. Default False.

  • **kwargs – Optional keyword arguments to pass to either scipy.interpolate.griddata (if method is ‘linear’, ‘nearest’ or ‘cubic’), or scipy.interpolate.Rbf (is method is ‘rbf’).

Returns

interp_2d_array – An xarray DataArray containing with x and y coordinates copied from ds_array, and Z-values interpolated from the points data.

Return type

xarray DataArray

deafrica_tools.spatial.largest_region(bool_array, **kwargs)

Takes a boolean array and identifies the largest contiguous region of connected True values. This is returned as a new array with cells in the largest region marked as True, and all other cells marked as False.

Parameters
  • bool_array (boolean array) – A boolean array (numpy or xarray.DataArray) with True values for the areas that will be inspected to find the largest group of connected cells

  • **kwargs – Optional keyword arguments to pass to measure.label

Returns

largest_region – A boolean array with cells in the largest region marked as True, and all other cells marked as False.

Return type

boolean array

deafrica_tools.spatial.reverse_geocode(coords, site_classes=None, state_classes=None)

Takes a latitude and longitude coordinate, and performs a reverse geocode to return a plain-text description of the location in the form:

Site, State

E.g.: reverse_geocode(coords=(-35.282163, 149.128835))

‘Canberra, Australian Capital Territory’

Parameters
  • coords (tuple of floats) – A tuple of (latitude, longitude) coordinates used to perform the reverse geocode.

  • site_classes (list of strings, optional) –

    A list of strings used to define the site part of the plain text location description. Because the contents of the geocoded address can vary greatly depending on location, these strings are tested against the address one by one until a match is made.

    Defaults to:

    ['city', 'town', 'village', 'suburb', 'hamlet', 'county', 'municipality']

  • state_classes (list of strings, optional) – A list of strings used to define the state part of the plain text location description. These strings are tested against the address one by one until a match is made. Defaults to: [‘state’, ‘territory’].

Returns

  • If a valid geocoded address is found, a plain text location

  • description will be returned – ‘Site, State’

  • If no valid address is found, formatted coordinates will be returned

  • instead – ‘XX.XX S, XX.XX E’

deafrica_tools.spatial.subpixel_contours(da, z_values=[0.0], crs=None, attribute_df=None, output_path=None, min_vertices=2, dim='time', time_format='%Y-%m-%d', errors='ignore', verbose=True)

Uses skimage.measure.find_contours to extract multiple z-value contour lines from a two-dimensional array (e.g. multiple elevations from a single DEM), or one z-value for each array along a specified dimension of a multi-dimensional array (e.g. to map waterlines across time by extracting a 0 NDWI contour from each individual timestep in an xarray timeseries).

Contours are returned as a geopandas.GeoDataFrame with one row per z-value or one row per array along a specified dimension. The attribute_df parameter can be used to pass custom attributes to the output contour features.

Last modified: May 2023

Parameters
  • da (xarray DataArray) – A two-dimensional or multi-dimensional array from which contours are extracted. If a two-dimensional array is provided, the analysis will run in ‘single array, multiple z-values’ mode which allows you to specify multiple z_values to be extracted. If a multi-dimensional array is provided, the analysis will run in ‘single z-value, multiple arrays’ mode allowing you to extract contours for each array along the dimension specified by the dim parameter.

  • z_values (int, float or list of ints, floats) – An individual z-value or list of multiple z-values to extract from the array. If operating in ‘single z-value, multiple arrays’ mode specify only a single z-value.

  • crs (string or CRS object, optional) – If da’s coordinate reference system (CRS) cannot be determined, provide a CRS using this parameter. (e.g. ‘EPSG:3577’).

  • output_path (string, optional) – The path and filename for the output shapefile.

  • attribute_df (pandas.Dataframe, optional) – A pandas.Dataframe containing attributes to pass to the output contour features. The dataframe must contain either the same number of rows as supplied z_values (in ‘multiple z-value, single array’ mode), or the same number of rows as the number of arrays along the dim dimension (‘single z-value, multiple arrays mode’).

  • min_vertices (int, optional) – The minimum number of vertices required for a contour to be extracted. The default (and minimum) value is 2, which is the smallest number required to produce a contour line (i.e. a start and end point). Higher values remove smaller contours, potentially removing noise from the output dataset.

  • dim (string, optional) – The name of the dimension along which to extract contours when operating in ‘single z-value, multiple arrays’ mode. The default is ‘time’, which extracts contours for each array along the time dimension.

  • time_format (string, optional) – The format used to convert numpy.datetime64 values to strings if applied to data with a “time” dimension. Defaults to “%Y-%m-%d”.

  • errors (string, optional) – If ‘raise’, then any failed contours will raise an exception. If ‘ignore’ (the default), a list of failed contours will be printed. If no contours are returned, an exception will always be raised.

  • verbose (bool, optional) – Print debugging messages. Default is True.

Returns

output_gdf – A geopandas geodataframe object with one feature per z-value (‘single array, multiple z-values’ mode), or one row per array along the dimension specified by the dim parameter (‘single z-value, multiple arrays’ mode). If attribute_df was provided, these values will be included in the shapefile’s attribute table.

Return type

geopandas geodataframe

deafrica_tools.spatial.sun_angles(dc, query)

For a given spatiotemporal query, calculate mean sun azimuth and elevation for each satellite observation, and return these as a new xarray.Dataset with ‘sun_elevation’ and ‘sun_azimuth’ variables.

dcdatacube.Datacube object

Datacube instance used to load data.

querydict

A dictionary containing query parameters used to identify satellite observations and load metadata.

sun_angles_dsxarray.Dataset

An xarray.set containing a ‘sun_elevation’ and ‘sun_azimuth’ variables.

deafrica_tools.spatial.transform_geojson_wgs_to_epsg(geojson, EPSG)

Takes a geojson dictionary and converts it from WGS84 (EPSG:4326) to desired EPSG

Parameters
  • geojson (dict) – a geojson dictionary containing a ‘geometry’ key, in WGS84 coordinates

  • EPSG (int) – numeric code for the EPSG coordinate referecnce system to transform into

Returns

transformed_geojson – a geojson dictionary containing a ‘coordinates’ key, in the desired CRS

Return type

dict

deafrica_tools.spatial.xr_rasterize(gdf, da, attribute_col=None, crs=None, name=None, output_path=None, verbose=True, **rasterio_kwargs)

Rasterizes a vector geopandas.GeoDataFrame into a raster xarray.DataArray.

Parameters
  • gdf (geopandas.GeoDataFrame) – A geopandas.GeoDataFrame object containing the vector data you want to rasterise.

  • da (xarray.DataArray or xarray.Dataset) – The shape, coordinates, dimensions, and transform of this object are used to define the array that gdf is rasterized into. It effectively provides a spatial template.

  • attribute_col (string, optional) – Name of the attribute column in gdf containing values for each vector feature that will be rasterized. If None, the output will be a boolean array of 1’s and 0’s.

  • crs (str or CRS object, optional) – If da’s coordinate reference system (CRS) cannot be determined, provide a CRS using this parameter. (e.g. ‘EPSG:3577’).

  • name (str, optional) – An optional name used for the output ``xarray.DataArray`.

  • output_path (string, optional) – Provide an optional string file path to export the rasterized data as a GeoTIFF file.

  • verbose (bool, optional) – Print debugging messages. Default True.

  • **rasterio_kwargs – A set of keyword arguments to rasterio.features.rasterize. Can include: ‘all_touched’, ‘merge_alg’, ‘dtype’.

Returns

da_rasterized – The rasterized vector data.

Return type

xarray.DataArray

deafrica_tools.spatial.xr_vectorize(da, attribute_col=None, crs=None, dtype='float32', output_path=None, verbose=True, **rasterio_kwargs)

Vectorises a raster xarray.DataArray into a vector geopandas.GeoDataFrame.

Parameters
  • da (xarray.DataArray) – The input xarray.DataArray data to vectorise.

  • attribute_col (str, optional) – Name of the attribute column in the resulting geopandas.GeoDataFrame. Values from da converted to polygons will be assigned to this column. If None, the column name will default to ‘attribute’.

  • crs (str or CRS object, optional) – If da’s coordinate reference system (CRS) cannot be determined, provide a CRS using this parameter. (e.g. ‘EPSG:3577’).

  • dtype (str, optional) – Data type of must be one of int16, int32, uint8, uint16, or float32

  • output_path (string, optional) – Provide an optional string file path to export the vectorised data to file. Supports any vector file formats supported by geopandas.GeoDataFrame.to_file().

  • verbose (bool, optional) – Print debugging messages. Default True.

  • **rasterio_kwargs – A set of keyword arguments to rasterio.features.shapes. Can include mask and connectivity.

Returns

gdf

Return type

geopandas.GeoDataFrame

deafrica_tools.spatial.zonal_stats_parallel(shp, raster, statistics, out_shp, ncpus, **kwargs)

Summarizing raster datasets based on vector geometries in parallel. Each cpu recieves an equal chunk of the dataset. Utilizes the perrygeo/rasterstats package.

Parameters
  • shp (str) – Path to shapefile that contains polygons over which zonal statistics are calculated

  • raster (str) – Path to the raster from which the statistics are calculated. This can be a virtual raster (.vrt).

  • statistics (list) –

    list of statistics to calculate. e.g.

    [‘min’, ‘max’, ‘median’, ‘majority’, ‘sum’]

  • out_shp (str) – Path to export shapefile containing zonal statistics.

  • ncpus (int) – number of cores to parallelize the operations over.

  • kwargs – Any other keyword arguments to rasterstats.zonal_stats() See https://github.com/perrygeo/python-rasterstats for all options

Returns

Return type

Exports a shapefile to disk containing the zonal statistics requested