deafrica_tools.spatial

Description: This file contains a set of python functions for conducting spatial analyses on Digital Earth Australia data.

License: The code in this notebook is licensed under the Apache License, Version 2.0 (https://www.apache.org/licenses/LICENSE-2.0). Digital Earth Australia data is licensed under the Creative Commons by Attribution 4.0 license (https://creativecommons.org/licenses/by/4.0/).

Contact: If you need assistance, please post a question on the Open Data Cube Slack channel (http://slack.opendatacube.org/) or on the GIS Stack Exchange (https://gis.stackexchange.com/questions/ask?tags=open-data-cube) using the open-data-cube tag (you can view previously asked questions here: https://gis.stackexchange.com/questions/tagged/open-data-cube).

If you would like to report an issue with this script, file one on Github: https://github.com/digitalearthafrica/deafrica-sandbox-notebooks

Functions included:

xr_vectorize xr_rasterize subpixel_contours interpolate_2d contours_to_array largest_region transform_geojson_wgs_to_epsg zonal_stats_parallel

Last modified: November 2020

Functions

contours_to_arrays(gdf, col)

This function converts a polyline shapefile into an array with three columns giving the X, Y and Z coordinates of each vertex.

interpolate_2d(ds, x_coords, y_coords, z_coords)

This function takes points with X, Y and Z coordinates, and interpolates Z-values across the extent of an existing xarray dataset.

largest_region(bool_array, **kwargs)

Takes a boolean array and identifies the largest contiguous region of connected True values.

subpixel_contours(da[, z_values, crs, …])

Uses skimage.measure.find_contours to extract multiple z-value contour lines from a two-dimensional array (e.g.

transform_geojson_wgs_to_epsg(geojson, EPSG)

Takes a geojson dictionary and converts it from WGS84 (EPSG:4326) to desired EPSG

xr_rasterize(gdf, da[, attribute_col, crs, …])

Rasterizes a geopandas.GeoDataFrame into an xarray.DataArray.

xr_vectorize(da[, attribute_col, transform, …])

Vectorises a xarray.DataArray into a geopandas.GeoDataFrame.

zonal_stats_parallel(shp, raster, …)

Summarizing raster datasets based on vector geometries in parallel.

deafrica_tools.spatial.contours_to_arrays(gdf, col)

This function converts a polyline shapefile into an array with three columns giving the X, Y and Z coordinates of each vertex. This data can then be used as an input to interpolation procedures (e.g. using a function like interpolate_2d.

Last modified: October 2019

Parameters
  • gdf (Geopandas GeoDataFrame) – A GeoPandas GeoDataFrame of lines to convert into point coordinates.

  • col (str) – A string giving the name of the GeoDataFrame field to use as Z-values.

Returns

  • A numpy array with three columns giving the X, Y and Z coordinates

  • of each vertex in the input GeoDataFrame.

deafrica_tools.spatial.interpolate_2d(ds, x_coords, y_coords, z_coords, method='linear', factor=1, verbose=False, **kwargs)

This function takes points with X, Y and Z coordinates, and interpolates Z-values across the extent of an existing xarray dataset. This can be useful for producing smooth surfaces from point data that can be compared directly against satellite data derived from an OpenDataCube query.

Supported interpolation methods include ‘linear’, ‘nearest’ and ‘cubic (using scipy.interpolate.griddata), and ‘rbf’ (using scipy.interpolate.Rbf).

Last modified: February 2020

Parameters
  • ds (xarray DataArray or Dataset) – A two-dimensional or multi-dimensional array from which x and y dimensions will be copied and used for the area in which to interpolate point data.

  • x_coords (numpy array) – Arrays containing X and Y coordinates for all points (e.g. longitudes and latitudes).

  • y_coords (numpy array) – Arrays containing X and Y coordinates for all points (e.g. longitudes and latitudes).

  • z_coords (numpy array) – An array containing Z coordinates for all points (e.g. elevations). These are the values you wish to interpolate between.

  • method (string, optional) – The method used to interpolate between point values. This string is either passed to scipy.interpolate.griddata (for ‘linear’, ‘nearest’ and ‘cubic’ methods), or used to specify Radial Basis Function interpolation using scipy.interpolate.Rbf (‘rbf’). Defaults to ‘linear’.

  • factor (int, optional) – An optional integer that can be used to subsample the spatial interpolation extent to obtain faster interpolation times, then up-sample this array back to the original dimensions of the data as a final step. For example, setting factor=10 will interpolate data into a grid that has one tenth of the resolution of ds. This approach will be significantly faster than interpolating at full resolution, but will potentially produce less accurate or reliable results.

  • verbose (bool, optional) – Print debugging messages. Default False.

  • **kwargs – Optional keyword arguments to pass to either scipy.interpolate.griddata (if method is ‘linear’, ‘nearest’ or ‘cubic’), or scipy.interpolate.Rbf (is method is ‘rbf’).

Returns

interp_2d_array – An xarray DataArray containing with x and y coordinates copied from ds_array, and Z-values interpolated from the points data.

Return type

xarray DataArray

deafrica_tools.spatial.largest_region(bool_array, **kwargs)

Takes a boolean array and identifies the largest contiguous region of connected True values. This is returned as a new array with cells in the largest region marked as True, and all other cells marked as False.

Parameters
  • bool_array (boolean array) – A boolean array (numpy or xarray.DataArray) with True values for the areas that will be inspected to find the largest group of connected cells

  • **kwargs – Optional keyword arguments to pass to measure.label

Returns

largest_region – A boolean array with cells in the largest region marked as True, and all other cells marked as False.

Return type

boolean array

deafrica_tools.spatial.subpixel_contours(da, z_values=[0.0], crs=None, affine=None, attribute_df=None, output_path=None, min_vertices=2, dim='time', errors='ignore', verbose=False)

Uses skimage.measure.find_contours to extract multiple z-value contour lines from a two-dimensional array (e.g. multiple elevations from a single DEM), or one z-value for each array along a specified dimension of a multi-dimensional array (e.g. to map waterlines across time by extracting a 0 NDWI contour from each individual timestep in an xarray timeseries).

Contours are returned as a geopandas.GeoDataFrame with one row per z-value or one row per array along a specified dimension. The attribute_df parameter can be used to pass custom attributes to the output contour features.

Last modified: November 2020

Parameters
  • da (xarray DataArray) – A two-dimensional or multi-dimensional array from which contours are extracted. If a two-dimensional array is provided, the analysis will run in ‘single array, multiple z-values’ mode which allows you to specify multiple z_values to be extracted. If a multi-dimensional array is provided, the analysis will run in ‘single z-value, multiple arrays’ mode allowing you to extract contours for each array along the dimension specified by the dim parameter.

  • z_values (int, float or list of ints, floats) – An individual z-value or list of multiple z-values to extract from the array. If operating in ‘single z-value, multiple arrays’ mode specify only a single z-value.

  • crs (string or CRS object, optional) – An EPSG string giving the coordinate system of the array (e.g. ‘EPSG:3577’). If none is provided, the function will attempt to extract a CRS from the xarray object’s crs attribute.

  • affine (affine.Affine object, optional) – An affine.Affine object (e.g. from affine import Affine; Affine(30.0, 0.0, 548040.0, 0.0, -30.0, “6886890.0) giving the affine transformation used to convert raster coordinates (e.g. [0, 0]) to geographic coordinates. If none is provided, the function will attempt to obtain an affine transformation from the xarray object (e.g. either at `da.transform or da.geobox.transform).

  • output_path (string, optional) – The path and filename for the output shapefile.

  • attribute_df (pandas.Dataframe, optional) – A pandas.Dataframe containing attributes to pass to the output contour features. The dataframe must contain either the same number of rows as supplied z_values (in ‘multiple z-value, single array’ mode), or the same number of rows as the number of arrays along the dim dimension (‘single z-value, multiple arrays mode’).

  • min_vertices (int, optional) – The minimum number of vertices required for a contour to be extracted. The default (and minimum) value is 2, which is the smallest number required to produce a contour line (i.e. a start and end point). Higher values remove smaller contours, potentially removing noise from the output dataset.

  • dim (string, optional) – The name of the dimension along which to extract contours when operating in ‘single z-value, multiple arrays’ mode. The default is ‘time’, which extracts contours for each array along the time dimension.

  • errors (string, optional) – If ‘raise’, then any failed contours will raise an exception. If ‘ignore’ (the default), a list of failed contours will be printed. If no contours are returned, an exception will always be raised.

  • verbose (bool, optional) – Print debugging messages. Default False.

Returns

output_gdf – A geopandas geodataframe object with one feature per z-value (‘single array, multiple z-values’ mode), or one row per array along the dimension specified by the dim parameter (‘single z-value, multiple arrays’ mode). If attribute_df was provided, these values will be included in the shapefile’s attribute table.

Return type

geopandas geodataframe

deafrica_tools.spatial.transform_geojson_wgs_to_epsg(geojson, EPSG)

Takes a geojson dictionary and converts it from WGS84 (EPSG:4326) to desired EPSG

Parameters
  • geojson (dict) – a geojson dictionary containing a ‘geometry’ key, in WGS84 coordinates

  • EPSG (int) – numeric code for the EPSG coordinate referecnce system to transform into

Returns

transformed_geojson – a geojson dictionary containing a ‘coordinates’ key, in the desired CRS

Return type

dict

deafrica_tools.spatial.xr_rasterize(gdf, da, attribute_col=False, crs=None, transform=None, name=None, x_dim='x', y_dim='y', export_tiff=None, verbose=False, **rasterio_kwargs)

Rasterizes a geopandas.GeoDataFrame into an xarray.DataArray.

Parameters
  • gdf (geopandas.GeoDataFrame) – A geopandas.GeoDataFrame object containing the vector/shapefile data you want to rasterise.

  • da (xarray.DataArray or xarray.Dataset) – The shape, coordinates, dimensions, and transform of this object are used to build the rasterized shapefile. It effectively provides a template. The attributes of this object are also appended to the output xarray.DataArray.

  • attribute_col (string, optional) – Name of the attribute column in the geodataframe that the pixels in the raster will contain. If set to False, output will be a boolean array of 1’s and 0’s.

  • crs (str, optional) – CRS metadata to add to the output xarray. e.g. ‘epsg:3577’. The function will attempt get this info from the input GeoDataFrame first.

  • transform (affine.Affine object, optional) – An affine.Affine object (e.g. from affine import Affine; Affine(30.0, 0.0, 548040.0, 0.0, -30.0, “6886890.0) giving the affine transformation used to convert raster coordinates (e.g. [0, 0]) to geographic coordinates. If none is provided, the function will attempt to obtain an affine transformation from the xarray object (e.g. either at `da.transform or da.geobox.transform).

  • x_dim (str, optional) – An optional string allowing you to override the xarray dimension used for x coordinates. Defaults to ‘x’. Useful, for example, if x and y dims instead called ‘lat’ and ‘lon’.

  • y_dim (str, optional) – An optional string allowing you to override the xarray dimension used for y coordinates. Defaults to ‘y’. Useful, for example, if x and y dims instead called ‘lat’ and ‘lon’.

  • export_tiff (str, optional) – If a filepath is provided (e.g ‘output/output.tif’), will export a geotiff file. A named array is required for this operation, if one is not supplied by the user a default name, ‘data’, is used

  • verbose (bool, optional) – Print debugging messages. Default False.

  • **rasterio_kwargs – A set of keyword arguments to rasterio.features.rasterize Can include: ‘all_touched’, ‘merge_alg’, ‘dtype’.

Returns

xarr

Return type

xarray.DataArray

deafrica_tools.spatial.xr_vectorize(da, attribute_col='attribute', transform=None, crs=None, dtype='float32', export_shp=False, verbose=False, **rasterio_kwargs)

Vectorises a xarray.DataArray into a geopandas.GeoDataFrame.

Parameters
  • da (xarray dataarray or a numpy ndarray) –

  • attribute_col (str, optional) – Name of the attribute column in the resulting geodataframe. Values of the raster object converted to polygons will be assigned to this column. Defaults to ‘attribute’.

  • transform (affine.Affine object, optional) – An affine.Affine object (e.g. from affine import Affine; Affine(30.0, 0.0, 548040.0, 0.0, -30.0, “6886890.0) giving the affine transformation used to convert raster coordinates (e.g. [0, 0]) to geographic coordinates. If none is provided, the function will attempt to obtain an affine transformation from the xarray object (e.g. either at `da.transform or da.geobox.transform).

  • crs (str or CRS object, optional) – An EPSG string giving the coordinate system of the array (e.g. ‘EPSG:3577’). If none is provided, the function will attempt to extract a CRS from the xarray object’s crs attribute.

  • dtype (str, optional) – Data type must be one of int16, int32, uint8, uint16, or float32

  • export_shp (Boolean or string path, optional) – To export the output vectorised features to a shapefile, supply an output path (e.g. ‘output_dir/output.shp’. The default is False, which will not write out a shapefile.

  • verbose (bool, optional) – Print debugging messages. Default False.

  • **rasterio_kwargs – A set of keyword arguments to rasterio.features.shapes Can include mask and connectivity.

Returns

gdf

Return type

Geopandas GeoDataFrame

deafrica_tools.spatial.zonal_stats_parallel(shp, raster, statistics, out_shp, ncpus, **kwargs)

Summarizing raster datasets based on vector geometries in parallel. Each cpu recieves an equal chunk of the dataset. Utilizes the perrygeo/rasterstats package.

Parameters
  • shp (str) – Path to shapefile that contains polygons over which zonal statistics are calculated

  • raster (str) – Path to the raster from which the statistics are calculated. This can be a virtual raster (.vrt).

  • statistics (list) –

    list of statistics to calculate. e.g.

    [‘min’, ‘max’, ‘median’, ‘majority’, ‘sum’]

  • out_shp (str) – Path to export shapefile containing zonal statistics.

  • ncpus (int) – number of cores to parallelize the operations over.

  • kwargs – Any other keyword arguments to rasterstats.zonal_stats() See https://github.com/perrygeo/python-rasterstats for all options

Returns

Return type

Exports a shapefile to disk containing the zonal statistics requested