# Object-based filtering of pixel classifications ¶

Keywords data methods; image segmentation, data methods; GEOBIA

## Background¶

Geographic Object-Based Image Analysis (GEOBIA), aims to group pixels together into meaningful image-objects. There are two advantages to a GEOBIA workflow: one, we can reduce the ‘salt and pepper’ effect typical of classifying pixels; and two, we can increase the computational efficiency of our workflow by grouping pixels into fewer, larger, but more meaningful objects. A review of the emerging trends in GEOBIA can be found in Chen et al. (2017).

## Description¶

In this notebook, we take the pixel-based classifications generated in the 4_Classify_satellite_data.ipynb notebook, and filter the classifications by image-objects. To do this, we first need to conduct image segmentation using the function rsgislib.segmentation.runShepherdSegmentation. This image segmentation algorithm is fast and scalable. The image segmentation is conducted on the NDVI layer output in the previous notebook. To filter the pixel observations, we assign to each segment the majority (mode) pixel classification using the scipy.ndimage.measurements import _stats module.

1. Convert the NDVI layer to a .kea file format (a requirement for the Remote Sensing and GIS Software Library, RSGISLib)

2. Run the image segmentation

3. Calculate the mode statistic for each segment

4. Write the new object-based classification to disk as a COG

5. An advanced section that demonstrates running a tiled, parallel image segmentation (useful if segmenting a very large GeoTIFF)

## Getting started¶

To run this analysis, run all the cells in the notebook, starting with the “Load packages” cell.

[1]:

import os
import sys
import gdal
import shutil
import xarray as xr
import numpy as np
import subprocess as sp
import matplotlib.pyplot as plt
from odc.io.cgroups import get_cpu_quota
from datacube.utils.cog import write_cog
from rsgislib.segmentation import segutils
from scipy.ndimage.measurements import _stats

from deafrica_tools.classification import HiddenPrints

/env/lib/python3.6/site-packages/geopandas/_compat.py:88: UserWarning: The Shapely GEOS version (3.7.2-CAPI-1.11.0 ) is incompatible with the GEOS version PyGEOS was compiled with (3.9.0-CAPI-1.16.2). Conversions between both will be slow.
shapely_geos_version, geos_capi_version_string
category=DeprecationWarning)


## Analysis Parameters¶

• pred_tif: The path and name of the prediction GeoTIFF output in the previous notebook.

• tif_to_seg: The geotiff to use as an input to the image segmentation, in the default example this was an NDVI layer output in the last notebook.

• min_seg_size: An integer which specifies the minimum number of pixels within a segment; segments with fewer than then minimum number of pixels are merged with adjacent segments.

• numClusters: An integer which specifies the number of clusters within the KMeans clustering. A good default is 60.

• results: A folder location to store the classified GeoTIFFs.

[2]:

pred_tif = 'results/prediction.tif'

tif_to_seg = 'results/NDVI.tif'

min_seg_size = 50 #in number of pixels

numClusters = 60 #number of k-means clusters

results = 'results/'



## Generate an object-based classification¶

### Convert to .kea format¶

[3]:

#inputs to image seg
kea_file = tif_to_seg[:-4]+'.kea'
segmented_kea_file = tif_to_seg[:-4]+'_segmented.kea'

#convert tiff to kea
gdal.Translate(destName=kea_file,
srcDS=tif_to_seg,
format='KEA',
outputSRS='EPSG:6933')

[3]:

<osgeo.gdal.Dataset; proxy of <Swig Object of type 'GDALDatasetShadow *' at 0x7fb0302cbf90> >


### Run image segmentation¶

[4]:

%%time
#store temp files somewhere
tmp='tmp/'
if not os.path.exists(tmp):
os.mkdir(tmp)

#run image seg
with HiddenPrints():
segutils.runShepherdSegmentation(inputImg=kea_file,
outputClumps=segmented_kea_file,
tmpath=tmp,
numClusters=numClusters,
minPxls=min_seg_size)

CPU times: user 1min, sys: 334 ms, total: 1min 1s
Wall time: 1min 1s


### Open segments and pixel-based predictions¶

[5]:

segments=xr.open_rasterio(segmented_kea_file).squeeze().values
pred = xr.open_rasterio(pred_tif).squeeze().drop_vars('band')


### Calculate mode¶

Within each segment, the majority classification is calculated and assigned to that segment.

[6]:

count, _sum =_stats(pred, labels=segments, index=segments)
mode = _sum > (count/2)
mode = xr.DataArray(mode,  coords=pred.coords, dims=pred.dims, attrs=pred.attrs).astype(np.int16)


### Clean up intermediate files¶

[7]:

shutil.rmtree(tmp)
os.remove(kea_file)
os.remove(segmented_kea_file)


### Write result to disk¶

[8]:

write_cog(mode, results+ 'prediction_object_.tif', overwrite=True)

/env/lib/python3.6/site-packages/pyproj/crs/crs.py:280: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
projstring = _prepare_from_string(projparams)

[8]:

PosixPath('results/prediction_object_.tif')


### Plot result¶

Below we plot the the pixel-based classification alongside the newly created object-based classification. You can see the ‘salt and pepper’ effect of individual pixels being classified as crop has been removed in the object based classification, resulting in a ‘cleaner’ classification.

[9]:

fig, axes = plt.subplots(1, 2, sharey=True, figsize=(16, 8))
axes[0].set_title('Pixel-based Classification')
axes[1].set_title('Object-based Classification (mode)')
plt.tight_layout()


## Optional: Tiled, parallel image segmentation¶

Image segmentation at large scales can be both time and memory intensive. The module deafrica_segmentation.performTiledSegmentation builds upon the image segmentation algorithm developed by Shepherd et al. (2019) (implemented in the package RSGISLib) to run image segmentation across multiple CPUs. A full description of their approach can be found in Clewey et al. (2014) A Python-Based Open Source System for Geographic Object-Based Image Analysis (GEOBIA) Utilizing Raster Attribute Tables. The code below demonstrates how to use the deafrica_segmentation.performTiledSegmentation module to conduct a tiled, parallel image segmentation.

The tiling approach is based on the bounding coordinates of the GeoTIFF. If a GeoTIFF is irregularly shaped such that a tile(s) contains none of the input GeoTIFF, then the segmentation will fail. If this occurs, check the <>S1Tiles.shp file output during stage 1 of the algorithm. Overlay this file on top of your input GeoTIFF to check if there are empty tiles. At the moment, the only solution is to change the extent of the GeoTIFF to be more regularly shaped. The validDataTileFraction variable will handle tiles that contain a small fraction of the input GeoTIFF, tiles containing less than the specified fraction are merged with a neighbouring tile. The image below shows an example of the tiling approach with merged tiles:

Below, we will conduct the same analysis as we did in the first example above, but this time the image segmentation will be conducted using the deafrica_segmentation.performTiledSegmentation() function. For the default example, this will be slower than the serial version, however, when conducting image segmentation over very large GeoTIFFs, this option will be preferred.

[10]:

#import the parallel segementation module
sys.path.append('../../Scripts')
from deafrica_segmentation import performTiledSegmentation


## Analysis Parameters¶

• validDataTileFraction: The fraction of a tile that should contain valid data. Below this threshold, a tile will be merged with its neighbour. e.g. 0.3

• tile_width, tile_height: The tile size parameters in number of pixels

[11]:

#new parameters to add
validDataTileFraction=0.2
tile_width, tile_height = 1500, 1500

#previous parameters we added above, reposting here
pred_tif = 'results/prediction.tif'
tif_to_seg = 'results/NDVI.tif'
min_seg_size = 100
results = 'results/'



### Automatically find the number of CPUs¶

[12]:

ncpus=round(get_cpu_quota())
print('ncpus = '+str(ncpus))

ncpus = 2


## Tiled, parallel image segmentation¶

### Convert .tif to .kea¶

[13]:

#store temp files somewhere
tmp='tmp/'
if not os.path.exists(tmp):
os.mkdir(tmp)

#inputs to image seg
kea_file = tif_to_seg[:-4]+'.kea'
segmented_kea_file = tif_to_seg[:-4]+'_segmented.kea'

#convert tiff to kea
gdal.Translate(destName=kea_file,
srcDS=tif_to_seg,
format='KEA',
outputSRS='EPSG:6933')


[13]:

<osgeo.gdal.Dataset; proxy of <Swig Object of type 'GDALDatasetShadow *' at 0x7fb0d4f850c0> >


### Run the parallel, tiled segmentation¶

This will take a couple of minutes to run.

[14]:

#run the segmentation
with HiddenPrints():
performTiledSegmentation(kea_file,
segmented_kea_file,
tmpDIR=tmp,
numClusters=numClusters,
validDataThreshold=validDataTileFraction,
tileWidth=tile_width,
tileHeight=tile_height,
minPxls=min_seg_size,
ncpus=ncpus)

#remove tmp folder
shutil.rmtree(tmp)



### Open segments and pixel-based predictions¶

[15]:

segments=xr.open_rasterio(segmented_kea_file).squeeze().values
pred = xr.open_rasterio(pred_tif).squeeze().drop_vars('band')


### Calculate mode¶

[16]:

count, _sum =_stats(pred, labels=segments, index=segments)
mode = _sum > (count/2)
mode = xr.DataArray(mode,  coords=pred.coords, dims=pred.dims, attrs=pred.attrs).astype(np.int16)


### Clean up intermediate files¶

[17]:

os.remove(kea_file)
os.remove(segmented_kea_file)


### Plot the result¶

[18]:

mode.plot(size=6);


Contact: If you need assistance, please post a question on the Open Data Cube Slack channel or on the GIS Stack Exchange using the open-data-cube tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on Github.

Last Tested:

[19]:

from datetime import datetime
datetime.today().strftime('%Y-%m-%d')

[19]:

'2021-03-05'