Detection of Floating Marine Debris

Background

Marine plastic pollution is a global concern that threatens marine ecosystems and human health. Detecting and quantifying plastic waste in the marine environment is crucial for addressing this problem. Remote sensing techniques, particularly using satellite imagery, have emerged as a promising tool for identifying and monitoring plastic debris in the ocean. In this notebook, we will explore the use of the Floating Debris Index (FDI) and Normalized Difference Vegetation Index (NDVI) for detecting floating plastics in Sentinel-2 satellite imagery. These indices have demonstrated superior performance in detecting floating marine plastics in various locations such as Greece (Topouzelis et al., 2019, 2020), Cyprus (Themistocleous et al., 2020), and Durban (Biermann et al., 2020).

Detecting Microplastics:

Microplastics are small plastic particles that pose a significant threat to marine life and human health. Detecting their presence in water bodies has become a global priority. To address this, we will utilize the FDI and NDVI applied to Sentinel-2 satellite imagery.

  1. Floating Debris Index (FDI): The Floating Debris Index (FDI) is a spectral index specifically designed to detect and quantify floating debris, including microplastics, on the water surface. By calculating the FDI using Sentinel-2 satellite imagery, we can effectively identify areas where microplastics are present in the study area. The FDI takes advantage of the unique spectral characteristics exhibited by floating debris, allowing us to distinguish them from other features in the image. A thresholding technique will be applied to the FDI values, enabling the separation of microplastics from the surrounding water and other materials.

  2. Normalized Difference Vegetation Index (NDVI): Although primarily designed for vegetation monitoring, the Normalized Difference Vegetation Index (NDVI) can be adapted to detect floating microplastics as well. Microplastics often exhibit spectral properties that distinguish them from natural features. By calculating the NDVI using Sentinel-2 imagery, we can leverage these spectral differences to identify potential microplastic locations. The NDVI quantifies the difference between the near-infrared and red spectral bands, providing an indication of the presence of microplastics based on their unique spectral responses. This allows us to distinguish microplastics from other materials in the image, aiding in the detection and characterization of floating plastic debris.

Description

Notebook Workflow: The following steps will be covered in this notebook to detect floating microplastics:

  1. Data Preprocessing:

    • Obtain Sentinel-2 satellite imagery for the study area.

    • Perform necessary pre-processing steps such as atmospheric correction and radiometric calibration.

  2. Calculation of Spectral Indices:

    • Calculate the FDI and NDVI based on the spectral bands of the Sentinel-2 imagery.

  3. Thresholding Techniques:

    • Apply thresholding techniques to the FDI and NDVI to separate microplastics from other features in the imagery.

    • Determine suitable threshold values based on previous studies or experimentation.

  4. Microplastic Detection and Quantification:

    • Identify and extract potential microplastic locations based on the thresholded indices.

    • Generate statistics and metrics to quantify the extent and distribution of microplastics in the study area.

  5. Result Analysis:

    • Analyze the detected microplastic distribution patterns and their potential sources.

    • Compare the results with available ground-truth data to validate the findings.

For assessing the accuracy of our microplastic detection method, this notebook uses the Marine Debris Dataset for Object Detection in Planetscope Imagery Dataset. This dataset comprises 256x256 pixel images extracted from PlanetScope optical imagery, annotated with bounding boxes and geographical coordinates. The dataset covers diverse marine debris types, including plastics, algae, sargassum, wood, and other artifacts. This dataset will serve as ground truth, enabling us to quantitatively measure our detection model’s accuracy.

Import necessary libraries

[1]:
# Load the necessary python packages.
%matplotlib inline
import os
import datacube
import seaborn as sns
import numpy as np
import xarray as xr
import geopandas as gpd
import matplotlib.pyplot as plt
from datacube.utils.geometry import Geometry
from shapely.geometry import LineString
import matplotlib.patches as mpatches
import matplotlib.lines as mlines
from sklearn import preprocessing

from deafrica_tools.spatial import xr_vectorize
from deafrica_tools.plotting import rgb
from deafrica_tools.plotting import display_map
from deafrica_tools.spatial import xr_rasterize
from deafrica_tools.areaofinterest import define_area
from deafrica_tools.bandindices import calculate_indices

Connect to the datacube

[2]:
dc = datacube.Datacube(app='detect_microplastics')

Analysis parameters

Select location

To define the area of interest, there are two methods available:

  1. By specifying the latitude, longitude, and buffer. This method requires you to input the central latitude, central longitude, and the buffer value in square degrees around the center point you want to analyze. For example, lat = 10.338, lon = -1.055, and buffer = 0.1 will select an area with a radius of 0.1 square degrees around the point with coordinates (10.338, -1.055).

  2. By uploading a polygon as a GeoJSON or Esri Shapefile. If you choose this option, you will need to upload the geojson or ESRI shapefile into the Sandbox using Upload Files button upload_files_icon.png in the top left corner of the Jupyter Notebook interface. ESRI shapefiles must be uploaded with all the related files (.cpg, .dbf, .shp, .shx). Once uploaded, you can use the shapefile or geojson to define the area of interest. Remember to update the code to call the file you have uploaded.

To use one of these methods, you can uncomment the relevant line of code and comment out the other one. To comment out a line, add the "#" symbol before the code you want to comment out. By default, the first option which defines the location using latitude, longitude, and buffer is being used.

[3]:
# Method 1: Specify the latitude, longitude, and buffer
# aoi = define_area(lat=10.338, lon=-1.055, buffer=0.1)

# Method 2: Use a polygon as a GeoJSON or Esri Shapefile.
aoi = define_area(
    vector_path=
    f"../Supplementary_data/Floating_microplastics/nasa_marine_debris_labels_ghana_bounding_box.geojson"
)

#Create a geopolygon and geodataframe of the area of interest
geopolygon = Geometry(aoi["features"][0]["geometry"], crs="epsg:4326")
geopolygon_gdf = gpd.GeoDataFrame(geometry=[geopolygon], crs=geopolygon.crs)

# Get the latitude and longitude range of the geopolygon
lat_range = (geopolygon_gdf.total_bounds[1], geopolygon_gdf.total_bounds[3])
lon_range = (geopolygon_gdf.total_bounds[0], geopolygon_gdf.total_bounds[2])
[4]:
display_map(x=lon_range, y=lat_range)
[4]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Load the Sentinel-2 data for the area of interest and time range

Note: In this notebook, the presented example focuses on a single timestamp, specifically ‘2018-10-31,’ for which we have access to ground truth data in the form of the Marine Debris Dataset for Object Detection in Planetscope Imagery Dataset within the designated area of interest. However, the workflow’s flexibility allows for seamless adaptation to analye data over a range of timestamps. This capability enables you to examine the presence and distribution of debris across different points in time within the specified date range. Whether you’re interested in a specific time period or conducting comparative analyses across multiple timestamps, the workflow can be customized to suit your research needs.

[5]:
product = 's2_l2a'
query = {
    'geopolygon': geopolygon,
    'time': ('2018-10-31'),
    'output_crs': 'EPSG:6933',
    'resolution': (-10, 10),
    'measurements': ['blue', 'green', 'red', 'red_edge_2', 'nir', 'swir_1'],
    'group_by': "solar_day",
}

ds = dc.load(product=product, **query)
ds
[5]:
<xarray.Dataset>
Dimensions:      (time: 1, y: 1633, x: 1538)
Coordinates:
  * time         (time) datetime64[ns] 2018-10-31T10:19:11
  * y            (y) float64 7.068e+05 7.068e+05 ... 6.905e+05 6.905e+05
  * x            (x) float64 -3.416e+04 -3.416e+04 ... -1.88e+04 -1.88e+04
    spatial_ref  int32 6933
Data variables:
    blue         (time, y, x) uint16 852 1020 1320 1630 1394 ... 440 424 411 402
    green        (time, y, x) uint16 1180 1510 1864 2096 ... 319 294 301 312
    red          (time, y, x) uint16 1430 2192 2200 2498 ... 214 211 220 218
    red_edge_2   (time, y, x) uint16 2978 2653 2653 2750 ... 203 209 209 199
    nir          (time, y, x) uint16 3194 3164 2832 2812 ... 198 189 180 197
    swir_1       (time, y, x) uint16 3112 3588 3588 3547 ... 168 172 172 160
Attributes:
    crs:           EPSG:6933
    grid_mapping:  spatial_ref

Normalise the intensity of the bands

Normalizing the intensity of the bands used in the Floating Debris Index (FDI) is important to ensure that the FDI is not biased by differences in brightness or reflectance values across the image. Normalization helps to standardize the range of values across the bands, so that each band contributes equally to the index regardless of its original intensity values.

If the intensity values of the bands are not normalized, it can lead to inaccuracies in the FDI results, such as false positives or false negatives. For example, if one band has significantly higher intensity values than the others, it may dominate the FDI calculation and lead to an overestimation of the amount of floating debris present in the image.

By normalizing the intensity values, the FDI is calculated based on the relative differences in reflectance values between the two bands, rather than the absolute intensity values. This helps to ensure that the FDI is a more accurate indicator of the presence of floating debris, and reduces the impact of other factors that may affect the brightness or reflectance values of the image, such as cloud cover or varying illumination conditions.

[6]:
# Function to normalize the band values using scikit-learn.
def normalize_bands(ds):
    bands = list(ds.data_vars)
    for band in bands:
        # Get the band's data as a 2D array (time x pixel).
        band_data = ds[band].values.reshape(-1, 1)

        # Initialize the MinMaxScaler.
        scaler = preprocessing.MinMaxScaler()

        # Fit the scaler on the data and transform it to normalize the values.
        normalized_data = scaler.fit_transform(band_data)

        # Update the dataset with the normalized values.
        ds[band] = xr.DataArray(normalized_data.reshape(ds[band].shape),
                                dims=ds[band].dims)

    return ds


# Normalize the bands.
normalized_ds = normalize_bands(ds)
rgb(normalized_ds, col="time")
../../../_images/sandbox_notebooks_Real_world_examples_Floating_marine_debris_12_0.png

Mask non-marine areas

World Bank-approved coastlines downloaded from https://datacatalog.worldbank.org/search/dataset/0038272

[7]:
world_coastlines = gpd.read_file(
    "../Supplementary_data/Floating_microplastics/WB_Coastlines_10m.shp")

#Clip the coastline to the area of interest
coastline = gpd.clip(world_coastlines, geopolygon_gdf)
coastline.explore(column='featurecla', cmap='inferno', categorical=True)
[7]:
Make this Notebook Trusted to load map: File -> Trust Notebook
[8]:
# Check the CRS and reproject if necessary
if coastline.crs != 'epsg:6933':
    coastline = coastline.to_crs('epsg:6933')

# Assuming you have a GeoDataFrame called 'coastline' with a LineString geometry column
side = "left"  # Choose the side of the line for buffering ("right" or "left")
buffer_distance = 3000  # Specify the buffer distance in the appropriate units

# Convert LineString to a list of LineString geometries
line_geometries = list(coastline.geometry)

# Create a buffered line on the desired side
buffered_geometries = []
for line in line_geometries:
    if side == "right":
        offset_line = line.parallel_offset(buffer_distance, "right")
    elif side == "left":
        offset_line = line.parallel_offset(buffer_distance, "left")
    buffer = offset_line.buffer(buffer_distance)
    buffer_on_side = buffer.difference(line)
    buffered_geometries.append(buffer_on_side)

# Create a new GeoDataFrame with the buffered geometries
coastline_buffer = gpd.GeoDataFrame(geometry=buffered_geometries,
                                    crs=coastline.crs)

# Plot the buffer on the image to see which parts of the image will be masked.
# This can help with determining if the buffer size is sufficient and removes inland pixels

# Create a figure and axis with a larger figsize
fig, ax = plt.subplots(figsize=(12, 8))

# Plot the rgb
rgb(normalized_ds.isel(time=0), ax=ax)

# Plot the coastline buffer over the dataset
coastline_buffer.plot(ax=ax, color='red', alpha=0.5)

# Plot the coastline over the coastline buffer
coastline.plot(ax=ax, color='blue', alpha=0.5)

# Set the aspect ratio of the plot
ax.set_aspect('equal')

# Set the plot limits to match the extent of the normalized_ds data
ax.set_xlim(normalized_ds.x.min(), normalized_ds.x.max())
ax.set_ylim(normalized_ds.y.min(), normalized_ds.y.max())

# Show the plot
plt.show()
../../../_images/sandbox_notebooks_Real_world_examples_Floating_marine_debris_16_0.png
[9]:
coastline_raster = xr_rasterize(gdf=coastline_buffer,
                                da=ds,
                                crs=ds.crs)

ds_masked_coastline = normalized_ds.where(coastline_raster == 0)
rgb(ds_masked_coastline.isel(time=0))
../../../_images/sandbox_notebooks_Real_world_examples_Floating_marine_debris_18_0.png

Removing additional land areas using the WOfS ‘all-time’ Summary

The World Bank coastlines dataset is used to mask out land areas. However, due to the dataset’s global nature, inaccuracies can occur where coastlines extend into the ocean or further inland. To address this, the Water Observation from Space (WOfS) ‘all-time’ summary layer is employed to remove erroneously included land areas within the region of interest. Unfortunately, there are no provisions to re-include accidentally excluded water or marine areas in the current notebook.

[10]:
wofs_alltime = dc.load(product='wofs_ls_summary_alltime',
                       geopolygon=geopolygon,
                       resolution=(-30, 30))

wofs_alltime.frequency.plot(size=6,
                            cmap=sns.color_palette("mako_r", as_cmap=True))
plt.title('WOfS All-Time Summary');
../../../_images/sandbox_notebooks_Real_world_examples_Floating_marine_debris_20_0.png
[11]:
wofs_alltime_wet = wofs_alltime.where(wofs_alltime.frequency >= 0.5)
wofs_alltime_wet = wofs_alltime_wet.isel(time=0)

ds_masked_coastline = ds_masked_coastline.where(
    wofs_alltime_wet.frequency >= 0.5)
rgb(ds_masked_coastline.isel(time=0))
../../../_images/sandbox_notebooks_Real_world_examples_Floating_marine_debris_21_0.png

Calculate and plot the Floating Debris Index (FDI) and the Normalized Difference Vegetation Index (NDVI)

  1. Floating Debris Index (FDI): The Floating Debris Index (FDI) is a spectral index specifically designed to detect and quantify floating debris, including microplastics, on the water surface. By calculating the FDI using Sentinel-2 satellite imagery, we can effectively identify areas where microplastics are present in the study area. The FDI takes advantage of the unique spectral characteristics exhibited by floating debris, allowing us to distinguish them from other features in the image. A thresholding technique will be applied to the FDI values, enabling the separation of microplastics from the surrounding water and other materials.

    Floating Debris Index (FDI)

    \begin{equation} \text{FDI} = R_{rs},NIR - (R_{rs},RE2 + (R_{rs}, SWIR1 - R_{rs}, RE2) * \frac{(\lambda_{NIR} - \lambda_{RED})}{(\lambda_{SWIR1} - \lambda_{RED})} * 10) \end{equation}

    where \(R_{rs,NIR}\), \(R_{rs, RE2}\), \(R_{rs, SWIR1}\) are the reflectance values in the near infrared, red edge 2, and short-wave infrared bands, respectively, \(λ_i\) are the wavelength with subscript band name.

    Note wavelength values are the central wavelengths from https://www.usgs.gov/centers/eros/science/usgs-eros-archive-sentinel-2

  2. Normalized Difference Vegetation Index (NDVI): Although primarily designed for vegetation monitoring, the Normalized Difference Vegetation Index (NDVI) can be adapted to detect floating microplastics as well. Microplastics often exhibit spectral properties that distinguish them from natural features. By calculating the NDVI using Sentinel-2 imagery, we can leverage these spectral differences to identify potential microplastic locations. The NDVI quantifies the difference between the near-infrared and red spectral bands, providing an indication of the presence of microplastics based on their unique spectral responses. This allows us to distinguish microplastics from other materials in the image, aiding in the detection and characterization of floating plastic debris.

    Normalized Difference Vegetation Index (NDVI)

    \begin{equation} \text{NDVI} = \frac{R_{rs},NIR - R_{rs},RED}{R_{rs},NIR + R_{rs},RED} \end{equation}

    where \(R_{rs,NIR}\), \(R_{rs, RED}\) are the reflectance values in the near infrared, and red bands, respectively.

[12]:
wavelength_nir = 842 * 1e-9
wavelength_red = 665 * 1e-9
wavelength_swir1 = 1610 * 1e-9

ds_masked_coastline["FDI"] = ds_masked_coastline.nir - (
    ds_masked_coastline.red_edge_2 +
    (ds_masked_coastline.swir_1 - ds_masked_coastline.red_edge_2) *
    ((wavelength_nir - wavelength_red) /
     (wavelength_swir1 - wavelength_red)) * 10)
ds_masked_coastline = calculate_indices(ds_masked_coastline,
                                        index=['NDVI'],
                                        satellite_mission='s2')

ds_masked_coastline
[12]:
<xarray.Dataset>
Dimensions:      (time: 1, y: 545, x: 513)
Coordinates:
  * time         (time) datetime64[ns] 2018-10-31T10:19:11
  * y            (y) float64 7.068e+05 7.068e+05 ... 6.905e+05 6.905e+05
  * x            (x) float64 -3.416e+04 -3.412e+04 ... -1.882e+04 -1.88e+04
    spatial_ref  int32 6933
Data variables:
    blue         (time, y, x) float64 nan nan nan ... 0.02091 0.02373 0.02076
    green        (time, y, x) float64 nan nan nan ... 0.005249 0.006422 0.006031
    red          (time, y, x) float64 nan nan nan ... 0.007245 0.00716 0.007497
    red_edge_2   (time, y, x) float64 nan nan nan ... 0.01487 0.01732 0.01671
    nir          (time, y, x) float64 nan nan nan ... 0.003983 0.006143 0.006076
    swir_1       (time, y, x) float64 nan nan nan ... 0.005977 0.007571 0.006508
    FDI          (time, y, x) float64 nan nan nan ... 0.005768 0.007084 0.008472
    NDVI         (time, y, x) float64 nan nan nan ... -0.2905 -0.07645 -0.1047
Attributes:
    crs:           EPSG:6933
    grid_mapping:  spatial_ref

This code defines the affine and CRS (Coordinate Reference System) variables of the dataset ds_masked_coastline, which are required for plotting the data. It then creates a new dataset ds_fdi which contains the FDI values obtained from the previous step.

The output plot can be used to visualize the distribution of floating debris, including microplastics, in water bodies over time.

[13]:
ds_fdi = ds_masked_coastline["FDI"].squeeze()
ds_ndvi = ds_masked_coastline["NDVI"].squeeze()

# Calculate the number of rows and columns needed
try:
    num_plots = ds_fdi.sizes['time']
except KeyError:
    num_plots = 1  # if only one timestamp
num_cols = 2
num_rows = num_plots

# Create the figure and subplots
fig, axs = plt.subplots(num_rows, num_cols, figsize=(15, 5 * num_rows))

# Loop over each timestamp and plot the corresponding FDI and NDVI images
for i in range(num_plots):
    time_val = str(ds_masked_coastline["FDI"].time.values[i]
                  )[:-13]  # Extract timestamp and convert to string

    # Plot FDI
    if num_plots == 1:
        ds_fdi.plot(ax=axs[0], cmap='RdBu_r')
        axs[0].set_title(f'FDI - Timestamp: {time_val}')
    else:
        ds_fdi.isel(time=i).plot(ax=axs[i, 0], cmap='RdBu_r')
        axs[i, 0].set_title(f'FDI - Timestamp: {time_val}')

    # Plot NDVI
    if num_plots == 1:
        ds_ndvi.plot(ax=axs[1], cmap='RdYlGn')
        axs[1].set_title(f'NDVI - Timestamp: {time_val}')
    else:
        ds_ndvi.isel(time=i).plot(ax=axs[i, 1], cmap='RdYlGn')
        axs[i, 1].set_title(f'NDVI - Timestamp: {time_val}')

# Remove any unused subplots
for i in range(num_plots, num_rows):
    fig.delaxes(axs[i, 0])
    fig.delaxes(axs[i, 1])

plt.tight_layout()
plt.show()
../../../_images/sandbox_notebooks_Real_world_examples_Floating_marine_debris_25_0.png

Select and apply threshold

The best threshold for identifying microplastics from the FDI will depend on the characteristics of your data and the specific application you are working on.

One way to determine the best threshold is to create a histogram of the FDI values and visually inspect it to identify a threshold value that separates the distribution into two distinct groups, one representing microplastics and the other representing non-microplastics.

Threshold for each timestamp and plot histograms

The 3-sigma rule is a statistical method used to determine if a data point is an outlier, which means it is significantly different from the other data points. In this case, we used it to identify pixels in the satellite imagery that were likely to be microplastics.

To do this, we first calculated the mean and standard deviation of values for all pixels in the satellite imagery. We then set a threshold at 3 standard deviations above the mean, which means any pixel with a value above this threshold was classified as a microplastic.

Plot histograms of the FDI and NDVI values for each timestamp, with the threshold value plotted as a vertical red dashed line. This is a good way to visualize the distribution of the FDI and NDVI values and how the threshold value relates to the data.

[14]:
fdi_values = ds_fdi.values
ndvi_values = ds_ndvi.values

# Calculate thresholds for each timestamp
threshold_per_timestamp_fdi = []
threshold_per_timestamp_ndvi = []

if len(fdi_values.shape) == 2:
    # Only one timestamp
    fdi_values = fdi_values[None, :]  # Add time dimension
    ndvi_values = ndvi_values[None, :]  # Add time dimension

    # FDI threshold calculation
    fdi_values_i = fdi_values[0, :]
    fdi_values_i = fdi_values_i[~np.isnan(fdi_values_i)]
    threshold_i_fdi = np.mean(fdi_values_i) + (3 * np.std(fdi_values_i))
    threshold_per_timestamp_fdi.append(threshold_i_fdi)

    # NDVI threshold calculation
    ndvi_values_i = ndvi_values[0, :]
    ndvi_values_i = ndvi_values_i[~np.isnan(ndvi_values_i)]
    threshold_i_ndvi = np.mean(ndvi_values_i) + (3 * np.std(ndvi_values_i))
    threshold_per_timestamp_ndvi.append(threshold_i_ndvi)
else:
    # Multiple timestamps
    for i in range(fdi_values.shape[0]):
        # Remove missing values for this timestamp
        fdi_values_i = fdi_values[i, :]
        ndvi_values_i = ndvi_values[i, :]
        fdi_values_i = fdi_values_i[~np.isnan(fdi_values_i)]
        ndvi_values_i = ndvi_values_i[~np.isnan(ndvi_values_i)]

        # FDI threshold calculation
        threshold_i_fdi = np.mean(fdi_values_i) + (3 * np.std(fdi_values_i))
        threshold_per_timestamp_fdi.append(threshold_i_fdi)

        # NDVI threshold calculation
        threshold_i_ndvi = np.mean(ndvi_values_i) + (3 * np.std(ndvi_values_i))
        threshold_per_timestamp_ndvi.append(threshold_i_ndvi)

# Print the calculated thresholds for FDI and NDVI
print("Threshold per timestamp (FDI):", threshold_per_timestamp_fdi)
print("Threshold per timestamp (NDVI):", threshold_per_timestamp_ndvi)

# Create the figure and subplots
fig, axs = plt.subplots(num_rows, num_cols, figsize=(15, 5 * num_rows))

# Loop over each timestamp and plot the corresponding histograms for FDI and NDVI
for i in range(num_plots):
    time_val = str(ds_masked_coastline["FDI"].time.values[i]
                  )[:-13]  # Extract timestamp and convert to string

    # FDI histogram
    if num_plots == 1:
        axs[0].hist(fdi_values.flatten(), bins=500)
        axs[0].axvline(threshold_per_timestamp_fdi[i],
                       color='r',
                       linestyle='dashed',
                       linewidth=1)
        axs[0].set_xlabel('FDI')
        axs[0].set_ylabel('Frequency')
        axs[0].set_title(f'FDI - Timestamp: {time_val}')
    else:
        axs[i, 0].hist(fdi_values[i].flatten(), bins=500)
        axs[i, 0].axvline(threshold_per_timestamp_fdi[i],
                          color='r',
                          linestyle='dashed',
                          linewidth=1)
        axs[i, 0].set_xlabel('FDI')
        axs[i, 0].set_ylabel('Frequency')
        axs[i, 0].set_title(f'FDI - Timestamp: {time_val}')

    # NDVI histogram
    if num_plots == 1:
        axs[1].hist(ndvi_values.flatten(), bins=500)
        axs[1].axvline(threshold_per_timestamp_ndvi[i],
                       color='r',
                       linestyle='dashed',
                       linewidth=1)
        axs[1].set_xlabel('NDVI')
        axs[1].set_ylabel('Frequency')
        axs[1].set_title(f'NDVI - Timestamp: {time_val}')
    else:
        axs[i, 1].hist(ndvi_values[i].flatten(), bins=500)
        axs[i, 1].axvline(threshold_per_timestamp_ndvi[i],
                          color='r',
                          linestyle='dashed',
                          linewidth=1)
        axs[i, 1].set_xlabel('NDVI')
        axs[i, 1].set_ylabel('Frequency')
        axs[i, 1].set_title(f'NDVI - Timestamp: {time_val}')

# Remove any unused subplots
for i in range(num_plots, num_rows):
    fig.delaxes(axs[i, 0])
    fig.delaxes(axs[i, 1])

plt.tight_layout()
plt.show()
Threshold per timestamp (FDI): [0.040771627583107706]
Threshold per timestamp (NDVI): [0.27205848613712474]
../../../_images/sandbox_notebooks_Real_world_examples_Floating_marine_debris_28_1.png

Create and plot FDI and NDVI binary masks

This code creates a series of binary masks for a dataset of satellite imagery at different time steps. The masks are created based on a threshold value calculated for each time step, and the resulting areas of the masks that correspond to microplastic pollution are calculated and stored in a list. The number of microplastics within the area of interest for each time step is also calculated and stored in a separate list. The masks are then plotted as subplots in a grid format with custom colors. The code also removes any unused subplots and adjusts the spacing between subplots. The final output is a figure showing the masks for each time step and the corresponding microplastic area and number of microplastics.

[15]:
# Create binary maps based on thresholds
binary_fdi = ds_fdi > threshold_per_timestamp_fdi[0]
binary_ndvi = ds_ndvi > threshold_per_timestamp_ndvi[0]

# Create the figure and subplots
fig, axs = plt.subplots(nrows=num_plots, ncols=2, figsize=(14, 6 * num_plots))

# Loop over each timestamp and plot the corresponding microplastic maps
for i in range(num_plots):
    time_val = str(ds_masked_coastline["FDI"].time.values[i]
                  )[:-13]  # Extract timestamp and convert to string

    # FDI microplastic map
    if num_plots == 1:
        microplastics_map_fdi = binary_fdi == 1
        microplastics_map_fdi = microplastics_map_fdi.where(
            microplastics_map_fdi, np.nan)
        plot_fdi = microplastics_map_fdi.plot(ax=axs[0], cmap='Reds')
        axs[0].set_title(f'FDI Microplastic Map - Timestamp: {time_val}')
    else:
        microplastics_map_fdi = binary_fdi.isel(time=i) == 1
        microplastics_map_fdi = microplastics_map_fdi.where(
            microplastics_map_fdi, np.nan)
        plot_fdi = microplastics_map_fdi.plot(ax=axs[i, 0], cmap='Reds')
        axs[i, 0].set_title(f'FDI Microplastic Map - Timestamp: {time_val}')

    # NDVI microplastic map
    if num_plots == 1:
        microplastics_map_ndvi = binary_ndvi == 1
        microplastics_map_ndvi = microplastics_map_ndvi.where(
            microplastics_map_ndvi, np.nan)
        plot_ndvi = microplastics_map_ndvi.plot(ax=axs[1], cmap='Greens')
        axs[1].set_title(f'NDVI Microplastic Map - Timestamp: {time_val}')
    else:
        microplastics_map_ndvi = binary_ndvi.isel(time=i) == 1
        microplastics_map_ndvi = microplastics_map_ndvi.where(
            microplastics_map_ndvi, np.nan)
        plot_ndvi = microplastics_map_ndvi.plot(ax=axs[i, 1], cmap='Greens')
        axs[i, 1].set_title(f'NDVI Microplastic Map - Timestamp: {time_val}')

    # Hide color bar
    plot_fdi.colorbar.remove()
    plot_ndvi.colorbar.remove()

# Remove any unused subplots
for i in range(num_plots, num_rows):
    fig.delaxes(axs[i, 0])
    fig.delaxes(axs[i, 1])

plt.tight_layout()
plt.show()
../../../_images/sandbox_notebooks_Real_world_examples_Floating_marine_debris_30_0.png

Create a single binary map of pixels that are classified as microplastics using both FDI and NDVI.

[16]:
# Determine the number of plots
num_plots = 1 if 'time' not in ds_fdi.dims else ds_fdi.sizes['time']

# Create the figure and subplots
fig, axs = plt.subplots(nrows=num_plots,
                        ncols=1,
                        figsize=(7, 6 * num_plots),
                        squeeze=False)

# Create an array to store the microplastics masks
microplastics_masks = []

# Loop over each timestamp and plot the microplastics map
for i in range(num_plots):
    if num_plots > 1:
        time_val = str(ds_masked_coastline["FDI"].time.values[i]
                      )[:-13]  # Extract timestamp and convert to string
        ax = axs[i, 0]
        microplastics_map = binary_fdi.isel(time=i) & binary_ndvi.isel(time=i)
    else:
        time_val = str(ds_masked_coastline["FDI"].time.values
                      )[:-13]  # Extract timestamp and convert to string
        ax = axs[0, 0]
        microplastics_map = binary_fdi & binary_ndvi

    # Store the microplastics map in the array
    microplastics_masks.append(microplastics_map)

    # Plot the microplastics map with custom colormap
    microplastics_map.where(microplastics_map == 1).plot(ax=ax,
                                                         cmap='binary',
                                                         vmin=0,
                                                         vmax=1)
    ax.set_title(f'Predicted Microplastics: {time_val}')

    # Remove the colorbar
    ax.collections[0].colorbar.remove()

plt.tight_layout()
plt.show()
../../../_images/sandbox_notebooks_Real_world_examples_Floating_marine_debris_32_0.png

Vectorise the identified microplastics pixels for a timestamp of choice

[17]:
microplastics_gdf = xr_vectorize(microplastics_masks[0],
                                 crs=ds_masked_coastline.crs,
                                 mask=microplastics_masks[0] == 1)

microplastics_gdf.explore()
[17]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Accuracy assessment

This notebook uses the NASA Marine Debris Dataset for Object Detection in Planetscope Imagery dataset as validation data for the area of interest (timestamp: (2018-10-31)). The dataset, provided in geojson format, is sourced from the Supplementary_data folder. It encompasses annotated vector data, including bounding boxes and geographical coordinates, serving as ground truth for marine debris object detection.

In the accuracy assessment, it is essential to acknowledge that the floating microplastics observed in the Sentinel image used may have moved between its acquisition time and the acquisition of the PlanetScope imagery used to create the NASA Marine Debris Dataset for Object Detection in Planetscope Imagery dataset. Due to the temporal gap and the dynamic nature of floating objects, it is challenging to determine the exact positions of the debris at the time of both image acquisitions. To account for this uncertainty, a buffer has been introduced around the validation dataset, allowing for potential movement of the debris during the time lapse between the two datasets. This approach acknowledges the possibility of displacement and provides a more comprehensive assessment of accuracy by considering the potential shifts in the positions of the floating microplastics.

[18]:
# Load the NASA Marine Debris Dataset for Object Detection in Planetscope Imagery dataset
marine_debris_dataset = gpd.read_file(
    f"../Supplementary_data/Floating_microplastics/nasa_marine_debris_labels_ghana.geojson"
)
marine_debris_dataset = marine_debris_dataset.to_crs('epsg:6933')

fig, ax = plt.subplots(figsize=(15, 10))
# Plot the identified microplastics
microplastics_gdf.plot(ax=ax, facecolor='yellow')
# Plot marine_debris_dataset on the second axis
rgb(ds.isel(time=0), ax=ax)
marine_debris_dataset.plot(ax=ax, facecolor='none', edgecolor='blue')
ax.set_title('Detected microplastics and NASA Marine Debris Dataset')

# Create custom legend handles and labels
legend_handles = [
    mpatches.Patch(facecolor='none',
                   edgecolor='blue',
                   label='NASA Marine Dataset'),
    mpatches.Patch(facecolor='yellow', label='Predicted Microplastic'),
]
ax.legend(handles=legend_handles)
plt.show()
../../../_images/sandbox_notebooks_Real_world_examples_Floating_marine_debris_37_0.png

The plot above illustrates that the predicted microplastics and the validation data are not perfectly aligned, which is expected due to the movement of floating marine objects. To account for this, introducing buffers around the validation data is necessary, considering the potential movement of microplastics between the time of the Sentinel image capture and the Planetscope Imagery used for validation. These buffers allow for a more robust analysis, accommodating the dynamic behavior of marine debris and enabling a comprehensive examination of potential microplastic locations.

[19]:
# Convert geopolygon_gdf to EPSG 6933
geopolygon_gdf = geopolygon_gdf.to_crs("EPSG:6933")

# Plot the marine dataset buffers, microplastics, geopolygon, and clipped coastline
fig, ax = plt.subplots(figsize=(15, 10))

# Create buffers around the marine debris points
buffer_distance = 100  # Specify the buffer distance in the CRS units (meters)
marine_debris_dataset['buffer'] = marine_debris_dataset.geometry.buffer(
    buffer_distance)

# Plot the rgb image
rgb(ds.isel(time=0), ax=ax)
# Plot the marine dataset buffers
buffer_gdf = gpd.GeoDataFrame(geometry=marine_debris_dataset['buffer'])
buffer_gdf.plot(ax=ax, facecolor='none', edgecolor='blue')

# Plot the microplastics
microplastics_gdf.plot(ax=ax, facecolor='yellow')

# Plot the geopolygon
geopolygon_gdf.plot(ax=ax, facecolor='none', edgecolor='red')

# Set plot title and labels
ax.set_title('Detected microplastics and NASA Marine Debris Dataset buffers')
ax.set_xlabel('Longitude')
ax.set_ylabel('Latitude')

# Create custom legend handles and labels
legend_handles = [
    mpatches.Patch(facecolor='none',
                   edgecolor='blue',
                   label='NASA Marine Dataset Buffer'),
    mpatches.Patch(facecolor='yellow', label='Predicted Microplastic'),
    mpatches.Patch(facecolor='none', edgecolor='red', label='Area of Interest'),
]

# Add legend
ax.legend(handles=legend_handles)
plt.show()
../../../_images/sandbox_notebooks_Real_world_examples_Floating_marine_debris_40_0.png

In this accuracy assessment, we sought to evaluate the performance of microplastic detection in satellite imagery by comparing it with the marine dataset buffers representing potential marine debris areas. To ensure accuracy, we assigned unique identifiers to each buffer and microplastic polygon. By conducting a spatial join operation, we identified the areas of overlap between the buffers and detected microplastics. The goal was to determine how many of the marine dataset buffers contained microplastics. Through this approach, we avoided overcounting buffers with multiple microplastics, as each buffer was counted only once, regardless of the number of intersecting microplastics.

Disclaimer: It is important to note that the accuracy assessment results are dependent on the size of the buffer defined. In this case, the buffer size was selected arbitrarily and not based on specific criteria or scientific guidelines. Therefore, the accuracy percentage obtained should be interpreted within the context of the chosen buffer size. Different buffer sizes may yield different accuracy outcomes.

[20]:
# Assign a unique identifier to each marine dataset buffer
buffer_gdf['buffer_id'] = range(len(buffer_gdf))

# Assign a unique identifier to each microplastic polygon
microplastics_gdf['microplastic_id'] = range(len(microplastics_gdf))

# Spatial join between the marine dataset buffers and the microplastics
intersection = gpd.overlay(buffer_gdf, microplastics_gdf, how='intersection')

# Count the number of unique buffer IDs with intersecting microplastics
num_buffers_with_microplastics = len(intersection['buffer_id'].unique())

# Calculate the total number of marine dataset buffers
num_buffers = len(buffer_gdf)

# Calculate the accuracy
accuracy = (num_buffers_with_microplastics / num_buffers) * 100

# Print the accuracy assessment
print("Accuracy Assessment:")
print(f"Number of marine dataset buffers: {num_buffers}")
print(
    f"Number of marine dataset buffers with microplastics: {num_buffers_with_microplastics}"
)
print(f"Accuracy: {accuracy:.2f}%")
Accuracy Assessment:
Number of marine dataset buffers: 51
Number of marine dataset buffers with microplastics: 36
Accuracy: 70.59%

The accuracy assessment of microplastic identification within a 100m buffer zone of marine dataset buffers revealed promising results. Although the buffers and the detected microplastics did not overlap perfectly due to movement and natural variations, visual examination indicated a discernible pattern of microplastic presence that aligned with our method of identification. While the accuracy rate of 70.59% indicates that our approach captured a significant portion of the actual microplastics within the buffers, it is important to consider the visual evidence that supports the effectiveness of our method. The offset observed between the detected microplastics and the buffers is expected due to the dynamic nature of microplastic dispersion in marine environments. Overall, the visual observations strengthen our confidence in the accuracy assessment, highlighting the ability of our method to identify microplastics even when perfect spatial alignment is not achieved.

  1. Biermann, L., Clewley, D., Martinez-Vicente, V., Topouzelis, K., 2020. Finding Plastic Patches in Coastal Waters using Optical Satellite Data. Sci. Rep. 10, 1–10. https://doi.org/10.1038/s41598-020-62298-z

  2. Themistocleous, K., Papoutsa, C., Michaelides, S., Hadjimitsis, D., 2020. Investigating detection of floating plastic litter from space using sentinel-2 imagery. Remote Sens. 12. https://doi.org/10.3390/RS12162648

  3. Topouzelis, K., Papageorgiou, D., Karagaitanakis, A., Papakonstantinou, A., Ballesteros, M.A., 2020. Remote sensing of sea surface artificial floating plastic targets with Sentinel2 and unmanned aerial systems (plastic litter project 2019). Remote Sens. 12. https://doi.org/10.3390/rs12122013

  4. Topouzelis, K., Papakonstantinou, A., Garaba, S.P., 2019. Detection of floating plastics from satellite and unmanned aerial systems (Plastic Litter Project 2018). Int. J. Appl. Earth Obs. Geoinf. 79, 175–183. https://doi.org/10.1016/j.jag.2019.03.011


Additional information

License: The code in this notebook is licensed under the Apache License, Version 2.0. Digital Earth Africa data is licensed under the Creative Commons by Attribution 4.0 license.

Contact: If you need assistance, please post a question on the Open Data Cube Slack channel or on the GIS Stack Exchange using the open-data-cube tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on Github.

Compatible datacube version:

[21]:
from datetime import datetime

datetime.today().strftime('%Y-%m-%d')
[21]:
'2023-09-19'