Rasterising vectors & vectorising rasters¶
Products used: wofs_ls_summary_annual
Keywords data used; WOfS, data methods; rasterize, data methods; vectorize, data format; GeoTIFF, data format; shapefile
Background¶
Many remote sensing and/or geospatial workflows require converting between vector data (e.g. shapefiles) and raster data (e.g. pixel-based data like that in an xarray.DataArray
). For example, we may need to use a shapefile as a mask to limit the analysis extent of a raster, or have raster data that we want to convert into vector data to allow for easy geometry operations.
Description¶
In this notebook, we show how to use the Digital Earth Africa function xr_rasterize
and xr_vectorize
in deafrica_tools.spatial. The notebook demonstrates how to:
Load in data from the Water Observations from Space (WOfS) product
Vectorise the pixel-based
xarray.DataArray
WOfS object into a vector-basedgeopandas.GeoDataFrame
object containing persistent water-bodies as polygonsExport the
geopandas.GeoDataFrame
as a shapefileRasterise the
geopandas.GeoDataFrame
vector data back into anxarray.DataArray
object and export the results as a GeoTIFF
Getting started¶
To run this analysis, run all the cells in the notebook, starting with the “Load packages” cell.
Load packages¶
[1]:
%matplotlib inline
import datacube
from deafrica_tools.datahandling import mostcommon_crs
from deafrica_tools.spatial import xr_vectorize, xr_rasterize
/env/lib/python3.8/site-packages/datacube/storage/masking.py:7: DeprecationWarning: datacube.storage.masking has moved to datacube.utils.masking
warnings.warn("datacube.storage.masking has moved to datacube.utils.masking",
/env/lib/python3.8/site-packages/geopandas/_compat.py:106: UserWarning: The Shapely GEOS version (3.8.0-CAPI-1.13.1 ) is incompatible with the GEOS version PyGEOS was compiled with (3.9.1-CAPI-1.14.2). Conversions between both will be slow.
warnings.warn(
Connect to the datacube¶
[2]:
dc = datacube.Datacube(app='Rasterise_vectorise')
Load WOfS data from the datacube¶
We will load in an annual summary from the Water Observations from Space (WOfS) product to provide us with some data to work with.
[3]:
#enter a location
lat, lon = 13.50, -15.42
buffer = 0.2
# Create a reusable query
query = {
'x': (lon-buffer, lon+buffer),
'y': (lat+buffer, lat-buffer),
'time': ('2017')
}
# Identify the most common projection system in the input query
output_crs = mostcommon_crs(dc=dc, product='ls8_sr', query=query)
# Load WoFS through the datacube
ds = dc.load(product='wofs_ls_summary_annual',
output_crs=output_crs,
align=(15, 15),
resolution=(-30, 30),
**query)
print(ds)
<xarray.Dataset>
Dimensions: (time: 1, x: 1447, y: 1478)
Coordinates:
* time (time) datetime64[ns] 2017-07-02T11:59:59.999999
* y (y) float64 1.515e+06 1.515e+06 1.515e+06 ... 1.47e+06 1.47e+06
* x (x) float64 4.328e+05 4.329e+05 ... 4.762e+05 4.762e+05
spatial_ref int32 32628
Data variables:
count_wet (time, y, x) int16 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
count_clear (time, y, x) int16 30 30 30 30 30 30 30 ... 26 26 26 26 26 26
frequency (time, y, x) float32 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0
Attributes:
crs: epsg:32628
grid_mapping: spatial_ref
Plot the WOfS summary¶
Let’s plot the WOfS data to get an idea of the objects we will be transforming. In the code below, we first select the pixels where the satellite has observed water at least 25% of the year, this is so we can isolate the more persistent water bodies and reduce some of the noise before we vectorise the raster.
[4]:
# Select pixels that are classified as water > 25 % of the year
water_bodies = ds.frequency > 0.25
# Plot the data
water_bodies.plot(size=5)
[4]:
<matplotlib.collections.QuadMesh at 0x7fe314278ac0>

Vectorising an xarray.DataArray
¶
To convert our xarray.DataArray
object into a vector based geopandas geodataframe
, we can use the DE Africa function `xr_vectorize
<https://docs.digitalearthafrica.org/en/latest/sandbox/notebooks/Tools/gen/deafrica_tools.spatial.html#deafrica_tools.spatial.xr_vectorize>`__ in the deafrica_tools.spatial module. This tool is based on the
rasterio.features.shape function, and can accept any of the arguments in rasterio.features.shape
using the same syntax.
In the cell below, we use the argument mask=water_bodies.values==1
to indicate we only want to convert the values in the xarray object that are equal to 1.
Note: Both
xr_rasterize
andxr_vectorize
will attempt to automatically obtain thecrs
andtransform
from the input data, but if the data does not contain this information, you will need to manually provide this. In the cell below, we will get thecrs
andtransform
from the original dataset.
[5]:
gdf = xr_vectorize(water_bodies,
crs=ds.crs,
transform=ds.geobox.transform,
mask=water_bodies.values==1)
print(gdf.head())
attribute geometry
0 1.0 POLYGON ((462495.000 1514655.000, 462495.000 1...
1 1.0 POLYGON ((465945.000 1514655.000, 465945.000 1...
2 1.0 POLYGON ((466125.000 1514655.000, 466125.000 1...
3 1.0 POLYGON ((466275.000 1514655.000, 466275.000 1...
4 1.0 POLYGON ((462315.000 1514625.000, 462315.000 1...
/env/lib/python3.8/site-packages/pyproj/crs/crs.py:53: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
return _prepare_from_string(" ".join(pjargs))
Export as shapefile¶
Our function also allows us to very easily export the GeoDataFrame
as a shapefile
for use in other applications using the export_shp
parameter.
[7]:
gdf = xr_vectorize(da=water_bodies,
crs=ds.crs,
transform=ds.geobox.transform,
mask=water_bodies.values == 1.,
export_shp='test.shp')
/env/lib/python3.8/site-packages/pyproj/crs/crs.py:53: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
return _prepare_from_string(" ".join(pjargs))
Rasterising a shapefile¶
Using the `xr_rasterize
<https://docs.digitalearthafrica.org/en/latest/sandbox/notebooks/Tools/gen/deafrica_tools.spatial.html#deafrica_tools.spatial.xr_rasterize>`__ function in the deafrica_tools.spatial module (based on the rasterio function: rasterio.features.rasterize, and can accept any of the arguments
in rasterio.features.rasterize
using the same syntax) we can turn the geopandas.GeoDataFrame
back into a xarray.DataArray
.
As we already have the GeoDataFrame
loaded we don’t need to read in the shapefile, but if we wanted to read in a shapefile first we can use gpd.read_file().
This function uses an xarray.dataArray
object as a template for converting the geodataframe
into a raster object (the template provides the size
, crs
, dimensions
, transform
, and attributes
of the output array).
[8]:
water_bodies_again = xr_rasterize(gdf=gdf,
da=water_bodies,
transform=ds.geobox.transform,
crs=ds.crs)
print(water_bodies_again)
<xarray.DataArray (y: 1478, x: 1447)>
array([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]], dtype=uint8)
Coordinates:
* y (y) float64 1.515e+06 1.515e+06 1.515e+06 ... 1.47e+06 1.47e+06
* x (x) float64 4.328e+05 4.329e+05 4.329e+05 ... 4.762e+05 4.762e+05
Export as GeoTIFF¶
xr_rasterize
also allows for exporting the results as a GeoTIFF using the parameter export_tiff
. To do this, a named
array is required. If one is not provided, the functon wil provide a default one.
[9]:
water_bodies_again = xr_rasterize(gdf=gdf,
da=water_bodies,
transform=ds.geobox.transform,
crs=ds.crs,
export_tiff='test.tif')
Additional information¶
License: The code in this notebook is licensed under the Apache License, Version 2.0. Digital Earth Africa data is licensed under the Creative Commons by Attribution 4.0 license.
Contact: If you need assistance, please post a question on the Open Data Cube Slack channel or on the GIS Stack Exchange using the open-data-cube
tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on
Github.
Compatible datacube version:
[10]:
print(datacube.__version__)
1.8.5
Last Tested:
[11]:
from datetime import datetime
datetime.today().strftime('%Y-%m-%d')
[11]:
'2021-09-16'