Rasterising vectors & vectorising rasters

Keywords data used; WOfS, data methods; rasterize, data methods; vectorize, data format; GeoTIFF, data format; shapefile

Background

Many remote sensing and/or geospatial workflows require converting between vector data (e.g. shapefiles) and raster data (e.g. pixel-based data like that in an xarray.DataArray). For example, we may need to use a shapefile as a mask to limit the analysis extent of a raster, or have raster data that we want to convert into vector data to allow for easy geometry operations.

Description

In this notebook, we show how to use the Digital Earth Africa function xr_rasterize and xr_vectorize in deafrica_tools.spatial. The notebook demonstrates how to:

  1. Load in data from the Water Observations from Space (WOfS) product

  2. Vectorise the pixel-based xarray.DataArray WOfS object into a vector-based geopandas.GeoDataFrame object containing persistent water-bodies as polygons

  3. Export the geopandas.GeoDataFrame as a shapefile

  4. Rasterise the geopandas.GeoDataFrame vector data back into an xarray.DataArray object and export the results as a GeoTIFF


Getting started

To run this analysis, run all the cells in the notebook, starting with the “Load packages” cell.

Load packages

[1]:
%matplotlib inline

import datacube
import geopandas as gpd
from datacube.utils.geometry import Geometry

from deafrica_tools.datahandling import mostcommon_crs
from deafrica_tools.spatial import xr_vectorize, xr_rasterize
from deafrica_tools.areaofinterest import define_area

Connect to the datacube

[2]:
dc = datacube.Datacube(app='Rasterise_vectorise')

Load WOfS data from the datacube

We will load in an annual summary from the Water Observations from Space (WOfS) product to provide us with some data to work with.

To define the area of interest, there are two methods available:

  1. By specifying the latitude, longitude, and buffer. This method requires you to input the central latitude, central longitude, and the buffer value in square degrees around the center point you want to analyze. For example, lat = 10.338, lon = -1.055, and buffer = 0.1 will select an area with a radius of 0.1 square degrees around the point with coordinates (10.338, -1.055).

  2. By uploading a polygon as a GeoJSON or Esri Shapefile. If you choose this option, you will need to upload the geojson or ESRI shapefile into the Sandbox using Upload Files button 0980244f1cc04301b53b3d4b10f89f46 in the top left corner of the Jupyter Notebook interface. ESRI shapefiles must be uploaded with all the related files (.cpg, .dbf, .shp, .shx). Once uploaded, you can use the shapefile or geojson to define the area of interest. Remember to update the code to call the file you have uploaded.

To use one of these methods, you can uncomment the relevant line of code and comment out the other one. To comment out a line, add the "#" symbol before the code you want to comment out. By default, the first option which defines the location using latitude, longitude, and buffer is being used.

[3]:
# Select a location
# Method 1: Specify the latitude, longitude, and buffer
aoi = define_area(lat=13.50, lon=-15.42, buffer=0.2)

# Method 2: Use a polygon as a GeoJSON or Esri Shapefile.
# aoi = define_area(vector_path='aoi.shp')

#Create a geopolygon and geodataframe of the area of interest
geopolygon = Geometry(aoi["features"][0]["geometry"], crs="epsg:4326")
geopolygon_gdf = gpd.GeoDataFrame(geometry=[geopolygon], crs=geopolygon.crs)

# Get the latitude and longitude range of the geopolygon
lat_range = (geopolygon_gdf.total_bounds[1], geopolygon_gdf.total_bounds[3])
lon_range = (geopolygon_gdf.total_bounds[0], geopolygon_gdf.total_bounds[2])

# Create a reusable query
query = {
    'x': lon_range,
    'y': lat_range,
    'time': ('2017')
}
# Identify the most common projection system in the input query
output_crs = mostcommon_crs(dc=dc, product='ls8_sr', query=query)

# Load WoFS through the datacube
ds = dc.load(product='wofs_ls_summary_annual',
             output_crs=output_crs,
             align=(15, 15),
             resolution=(-30, 30),
             **query)

print(ds)

<xarray.Dataset>
Dimensions:      (time: 1, y: 1478, x: 1447)
Coordinates:
  * time         (time) datetime64[ns] 2017-07-02T11:59:59.999999
  * y            (y) float64 1.515e+06 1.515e+06 1.515e+06 ... 1.47e+06 1.47e+06
  * x            (x) float64 4.328e+05 4.329e+05 ... 4.762e+05 4.762e+05
    spatial_ref  int32 32628
Data variables:
    count_wet    (time, y, x) int16 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
    count_clear  (time, y, x) int16 30 30 30 30 30 30 30 ... 26 26 26 26 26 26
    frequency    (time, y, x) float32 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0
Attributes:
    crs:           epsg:32628
    grid_mapping:  spatial_ref

Plot the WOfS summary

Let’s plot the WOfS data to get an idea of the objects we will be transforming. In the code below, we first select the pixels where the satellite has observed water at least 25% of the year, this is so we can isolate the more persistent water bodies and reduce some of the noise before we vectorise the raster.

[4]:
# Select pixels that are classified as water > 25 % of the year
water_bodies = ds.frequency > 0.25

# Plot the data
water_bodies.plot(size=5)
[4]:
<matplotlib.collections.QuadMesh at 0x7fbaa0c3fd60>
../../../_images/sandbox_notebooks_Frequently_used_code_Rasterise_vectorise_12_1.png

Vectorising an xarray.DataArray

To convert our xarray.DataArray object into a vector based geopandas geodataframe, we can use the DE Africa function `xr_vectorize <https://docs.digitalearthafrica.org/en/latest/sandbox/notebooks/Tools/gen/deafrica_tools.spatial.html#deafrica_tools.spatial.xr_vectorize>`__ in the deafrica_tools.spatial module. This tool is based on the rasterio.features.shape function, and can accept any of the arguments in rasterio.features.shape using the same syntax.

In the cell below, we use the argument mask=water_bodies.values==1 to indicate we only want to convert the values in the xarray object that are equal to 1.

Note: Both xr_rasterize and xr_vectorize will attempt to automatically obtain the crs and transform from the input data, but if the data does not contain this information, you will need to manually provide this. In the cell below, we will get the crs and transform from the original dataset.

[5]:
gdf = xr_vectorize(water_bodies,
                   crs=ds.crs,
                   mask=water_bodies.values==1)

print(gdf.head())

   attribute                                           geometry
0        1.0  POLYGON ((462495.000 1514655.000, 462495.000 1...
1        1.0  POLYGON ((465945.000 1514655.000, 465945.000 1...
2        1.0  POLYGON ((466125.000 1514655.000, 466125.000 1...
3        1.0  POLYGON ((466275.000 1514655.000, 466275.000 1...
4        1.0  POLYGON ((462315.000 1514625.000, 462315.000 1...

Plot our vectorised raster

[6]:
gdf.plot(figsize=(6, 6))
[6]:
<Axes: >
../../../_images/sandbox_notebooks_Frequently_used_code_Rasterise_vectorise_16_1.png

Export as shapefile

Our function also allows us to very easily export the GeoDataFrame as a shapefile for use in other applications using the export_shp parameter.

[7]:
gdf = xr_vectorize(da=water_bodies,
                   crs=ds.crs,
                   mask=water_bodies.values == 1.,
                   output_path='test.shp')
Exporting vector data to test.shp

Rasterising a shapefile

Using the `xr_rasterize <https://docs.digitalearthafrica.org/en/latest/sandbox/notebooks/Tools/gen/deafrica_tools.spatial.html#deafrica_tools.spatial.xr_rasterize>`__ function in the deafrica_tools.spatial module (based on the rasterio function: rasterio.features.rasterize, and can accept any of the arguments in rasterio.features.rasterize using the same syntax) we can turn the geopandas.GeoDataFrame back into a xarray.DataArray.

As we already have the GeoDataFrame loaded we don’t need to read in the shapefile, but if we wanted to read in a shapefile first we can use gpd.read_file().

This function uses an xarray.dataArray object as a template for converting the geodataframe into a raster object (the template provides the size, crs, dimensions, transform, and attributes of the output array).

[8]:
water_bodies_again = xr_rasterize(gdf=gdf,
                                  da=water_bodies,
                                  crs=ds.crs)

print(water_bodies_again)

<xarray.DataArray (y: 1478, x: 1447)>
array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=uint8)
Coordinates:
  * y            (y) float64 1.515e+06 1.515e+06 1.515e+06 ... 1.47e+06 1.47e+06
  * x            (x) float64 4.328e+05 4.329e+05 ... 4.762e+05 4.762e+05
    spatial_ref  int32 32628

Export as GeoTIFF

xr_rasterize also allows for exporting the results as a GeoTIFF using the parameter export_tiff. To do this, a named array is required. If one is not provided, the functon wil provide a default one.

[9]:
water_bodies_again = xr_rasterize(gdf=gdf,
                                  da=water_bodies,
                                  crs=ds.crs,
                                  output_path='test.tif')
Exporting raster data to test.tif

Additional information

License: The code in this notebook is licensed under the Apache License, Version 2.0. Digital Earth Africa data is licensed under the Creative Commons by Attribution 4.0 license.

Contact: If you need assistance, please post a question on the Open Data Cube Slack channel or on the GIS Stack Exchange using the open-data-cube tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on Github.

Compatible datacube version:

[10]:
print(datacube.__version__)
1.8.15

Last Tested:

[11]:
from datetime import datetime
datetime.today().strftime('%Y-%m-%d')
[11]:
'2023-09-19'