Identifying ships with Sentinel-1

**Tags**: :index:`data used; sentinel-1`, :index:`analysis; ship detection`


Being able to spot ships and shipping lanes from satellite imagery can be useful for gaining a holistic picture of maritime traffic. The properties of radar data can be utilised to detect where ships appear over time, as well as highlighting the presence of shipping lanes.

When working with radar data, water commonly appears dark due to its relatively smooth surface resulting in very low backscatter, and consequently, low intensities are recorded by the satellite in both polarisation bands. However, if a large ship is on the water, the backscatter at the ship’s location will be much higher than the water due to double-bounce scattering effects.

The ESA/EC Copernicus Sentinel-1 mission (Sentinel-1A and 1B) has a frequent revisit time of a few days. This helps to build a large catalogue of observations that can be leveraged to identify shipping lanes.

In this example, we attempt to identify ships around the Suez Canal in Egypt during March 2021. Significant changes in the number and distribution pattern of the ships are detected, showing the impact of a blockage. More about the event can be found in this thread on Twitter.


Ships are identified by taking advantage of the fact that ships on the water result in a large radar backscatter signal.

The steps demonstrated in this notebook include:

  1. Loading Sentinel-1 backscatter image for an area of interest.

  2. Extracting open water mask using the Water Observations from Space (WOfS) annual summary.

  3. Counting the number of vessels before and after the event and save the results.

  4. Visualiinge the maximum backscatter values from a time series to identify shipping lanes.

Getting started

To run this analysis, run all the cells in the notebook, starting with the “Load packages” cell.

Load packages

Import Python packages that are used for the analysis.

%matplotlib inline

# Force GeoPandas to use Shapely instead of PyGEOS
# In a future release, GeoPandas will switch to using Shapely by default.
import os
os.environ['USE_PYGEOS'] = '0'

import xarray as xr
import numpy as np
import geopandas as gpd
import matplotlib.pyplot as plt

import datacube
from datacube.utils.geometry import Geometry
from deafrica_tools.spatial import xr_vectorize, xr_rasterize
from deafrica_tools.plotting import display_map
from deafrica_tools.datahandling import load_ard, wofs_fuser, dilate
from deafrica_tools.areaofinterest import define_area

Connect to the datacube

Connect to the datacube so we can access DE Africa data. The app parameter is a unique name for the analysis which is based on the notebook file name.

dc = datacube.Datacube(app="Ship_detection")

Analysis parameters

The following cell sets the parameters, which define the area of interest and the length of time to conduct the analysis over. The parameters are

  • lat: The central latitude to analyse (e.g. 10.338).

  • lon: The central longitude to analyse (e.g. -1.055).

  • buffer: The number of square degrees to load around the central latitude and longitude. For reasonable loading times, set this as 0.1 or lower.

  • time_range: The date range to analyse (e.g. ('2021'))

Select location

To define the area of interest, there are two methods available:

  1. By specifying the latitude, longitude, and buffer. This method requires you to input the central latitude, central longitude, and the buffer value in square degrees around the center point you want to analyze. For example, lat = 10.338, lon = -1.055, and buffer = 0.1 will select an area with a radius of 0.1 square degrees around the point with coordinates (10.338, -1.055).

  2. By uploading a polygon as a GeoJSON or Esri Shapefile. If you choose this option, you will need to upload the geojson or ESRI shapefile into the Sandbox using Upload Files button 9a7c8a35267548e9b92b746ad5793892 in the top left corner of the Jupyter Notebook interface. ESRI shapefiles must be uploaded with all the related files (.cpg, .dbf, .shp, .shx). Once uploaded, you can use the shapefile or geojson to define the area of interest. Remember to update the code to call the file you have uploaded.

To use one of these methods, you can uncomment the relevant line of code and comment out the other one. To comment out a line, add the "#" symbol before the code you want to comment out. By default, the first option which defines the location using latitude, longitude, and buffer is being used.

If running the notebook for the first time, keep the default settings below. This will demonstrate how the analysis works and provide meaningful results. The example covers the Suez Canel in Egypt during March 2021.

# Method 1: Specify the latitude, longitude, and buffer
aoi = define_area(lat=29.95, lon=32.536, buffer=0.1)

# Method 2: Use a polygon as a GeoJSON or Esri Shapefile.
# aoi = define_area(vector_path='aoi.shp')

#Create a geopolygon and geodataframe of the area of interest
geopolygon = Geometry(aoi["features"][0]["geometry"], crs="epsg:4326")
geopolygon_gdf = gpd.GeoDataFrame(geometry=[geopolygon],

# Get the latitude and longitude range of the geopolygon
lat_range = (geopolygon_gdf.total_bounds[1], geopolygon_gdf.total_bounds[3])
lon_range = (geopolygon_gdf.total_bounds[0], geopolygon_gdf.total_bounds[2])

# timeframe
timerange = ('2021-03-21', '2021-03-25')

View the selected location

The next cell will display the selected area on an interactive map. Feel free to zoom in and out to get a better understanding of the area you’ll be analysing. Clicking on any point of the map will reveal the latitude and longitude coordinates of that point.

display_map(x=lon_range, y=lat_range)
Make this Notebook Trusted to load map: File -> Trust Notebook

Load and view Sentinel-1 data

The first step in the analysis is to load Sentinel-1 backscatter data for the specified area of interest.

# Load the Sentinel-1 data
S1 = load_ard(dc=dc,
              measurements=['vv', 'vh'],
Using pixel quality parameters for Sentinel 1
Finding datasets
Applying pixel quality/cloud mask
Loading 3 time steps
Dimensions:      (time: 3, latitude: 1000, longitude: 1000)
  * time         (time) datetime64[ns] 2021-03-21T03:44:49.596587 ... 2021-03...
  * latitude     (latitude) float64 30.05 30.05 30.05 ... 29.85 29.85 29.85
  * longitude    (longitude) float64 32.44 32.44 32.44 ... 32.64 32.64 32.64
    spatial_ref  int32 4326
Data variables:
    vv           (time, latitude, longitude) float32 0.04583 0.03337 ... 0.1556
    vh           (time, latitude, longitude) float32 0.009784 ... 0.004633
    crs:           EPSG:4326
    grid_mapping:  spatial_ref

Clip the datasets to the shape of the area of interest

A geopolygon represents the bounds and not the actual shape because it is designed to represent the extent of the geographic feature being mapped, rather than the exact shape. In other words, the geopolygon is used to define the outer boundary of the area of interest, rather than the internal features and characteristics.

Clipping the data to the exact shape of the area of interest is important because it helps ensure that the data being used is relevant to the specific study area of interest. While a geopolygon provides information about the boundary of the geographic feature being represented, it does not necessarily reflect the exact shape or extent of the area of interest.

#Rasterise the area of interest polygon
aoi_raster = xr_rasterize(gdf=geopolygon_gdf,da=S1, transform=S1.geobox.transform,
#Mask the dataset to the rasterised area of interest
S1 = S1.where(aoi_raster==1)

Optional speckle filtering

Specke filtering is not applied in this example because there is a high contrast between water and ship signals. Applying speckle filtering with a small smoothing window may help improve sensitivity for smaller ships.

An example of how to apply a speckle filter can be found in the radar water detection notebook.

Convert data to decibels (dB)

Sentinel-1 backscatter data has two measurements, VV and VH, which correspond to the polarisation of the light sent and received by the satellite. VV refers to the satellite sending out vertically-polarised light and receiving vertically-polarised light back, whereas VH refers to the satellite sending out vertically-polarised light and receiving horizontally-polarised light back. These two measurement bands can tell us different information about the area we’re studying.

When working with radar backscatter, it is common to work with the data in units of decibels (dB), rather than linear intensity. To convert from recorded Digital Number (DN) to dB in Sentinel-1 imagery, we use the following formula:

\[\begin{aligned} \text{dB} = 10 \times \log_{10}(\text{DN}). \end{aligned}\]
# Scale to plot data in decibels
S1["vh_dB"] = 10 * np.log10(S1.vh)
S1["vv_dB"] = 10 * np.log10(S1.vv)
/usr/local/lib/python3.10/dist-packages/xarray/core/ RuntimeWarning: divide by zero encountered in log10
  result_data = func(*input_data)

Visualise data before and after the event

We focus on the first and the last observations within this period of time. The ship blockage incident started on 23 March 2021 and lasted almost a week, so we inspect one image from before the event, and one during.

Images below show a high constrast between dark water surface and bright ships.

fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2, figsize=(18, 7))

# Visualise baseline image before the event
S1.vv_dB.isel(time=0).plot(robust=True, ax=ax1)

# Visualise the image after the event
S1.vv_dB.isel(time=2).plot(robust=True, ax=ax2);

Extract open water area

Surface water can be mapped by thresholding radar backscatter. An detailed example is provided in the radar water detection notebook. To eliminate the impact of ships and waves, both causing elevated backscatter, minimum backscatter values detected over time for each pixel can be used. e.g.

water_mask = S1.vv.min(dim="time")<0.015

For this notebook, however, we use another readily available product in DE Africa, namely the Water Observations from Space (WOfS) annual summary to extract the open water area.

The water detection frequency measurement of WOfS annual summary from the latest available year is loaded to match the Sentinel-1 pixel grid using the option like in dc.load() and bilinear resampling.

# Load WOfS summary through the datacube
wofs = dc.load(product='wofs_ls_summary_annual',

Open water surface is extracted where water has been detected more than 80% of the year. For an optimal result, the mask is further adjusted to remove gaps and small water bodies.

# Select pixels that are classified as water > 80 % of the year
water_mask = wofs.frequency > 0.8
# close small holes within the water mask and remove a few pixels on the edge for cleaner result
water_mask = xr.DataArray(dilate(~dilate(water_mask, 3, invert=False), 5, invert=True),
# optional step to select only the largest water body

water_bodies = xr_vectorize(water_mask,
                  ,  # wofs crs is not recoganized, so using S1 instead as they are the same
                            mask=water_mask.values == 1)

largest = water_bodies[water_bodies.area == water_bodies.area.max()]

# create mask
water_mask = xr_rasterize(largest, S1)
/tmp/ipykernel_7576/ UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

  largest = water_bodies[water_bodies.area == water_bodies.area.max()]
# final water mask

Apply water mask and threshold for ship detection

In this example, a threshold of 0 dB is chosen. Ships are detected where backscatter values are higher than this threshold. The binary image is vectorised so pixels from the same ship are grouped as one object.

Visual inspection confirms reasonable detection of large cargo ships. The threshold can be adjusted for different area and vessel types. With known ship locations, the threshold can be optimised using training data.

# set ship detection threshold in vv to 0 dB
ship_vv_db = 0
def detect_ships(da, time_idx, thresh,, transform=S1.geobox.transform):
    S1_ships = da.isel(time=time_idx) > thresh
    ships_vector = xr_vectorize(S1_ships.values,
                                mask=S1_ships.values == 1)
    return ships_vector
time_idx = 0
ships_time0 = detect_ships(S1.vv_dB.where(water_mask), time_idx, ship_vv_db)
    f'ships_{str(S1.time.values[time_idx])[0:10]}.geojson', driver='GeoJSON')
print("Number of ships detected at this time:", len(ships_time0))
Number of ships detected at this time: 51
time_idx = 2
ships_time2 = detect_ships(S1.vv_dB.where(water_mask), time_idx, ship_vv_db)
    f'ships_{str(S1.time.values[time_idx])[0:10]}.geojson', driver='GeoJSON')
print("Number of ships detected at this time:", len(ships_time2))
Number of ships detected at this time: 69
# visualize the ship locations

fig, ax = plt.subplots(1, 2, figsize=(10, 10), sharex=True, sharey=True)
ax[0].set_title('vessels before the event')
ax[1].set_title('vessels after the event');

Caveats and possible improvements

We have applied a simple thresholding method to idenfity ships in the above sections. Only VV backscatter has been used and no speckle filtering has been done. This method is based on the assumption that ships produce very high backscatter signals and all bright objects within the water area are ships.

Additional analysis may be help improve the method: * Threshold for ship pixels are chosen based on visual assessment. The threshold is relatively high so smaller ships may be missed. This threshold may be optimzed with labeled training data for specific use cases. * It is not clear if all the bright objects near the piers are ships. The location and shape of the objects may be used to remove false positives. * Rigid structures onboard the ships may result in multiple disconnected bright spots over one ship. These smaller objects may be grouped to give more reliable ship count.

Despite the above limitations, we demonstrate that with a few analysis steps, DE Africa’s Sentinel-1 backscatter can be used to detect and count large ships.

Identify shipping lanes

Ship locations detected across time can be used to map out popular routes.

In the cells below, we load all Sentinel-1 observations from 2020. Plotting maximum backscatter values over time allows clear identification of the shipping lanes.

Data is lazy-loaded using the dask_chunks options to reduce memory requirement.

# Load the Sentinel-1 data
S1 = load_ard(dc=dc,
              measurements=['vv', 'vh'],
Using pixel quality parameters for Sentinel 1
Finding datasets
Applying pixel quality/cloud mask
Returning 227 time steps as a dask array
S1.vh.where(water_mask).max(dim='time').plot.imshow(robust=True, size=10);
/usr/local/lib/python3.10/dist-packages/dask/array/ RuntimeWarning: All-NaN slice encountered
  return np.nanmax(x_chunk, axis=axis, keepdims=keepdims)

Next steps

When you are done, return to the “Set up analysis” cell, modify some values (e.g. latitude and longitude) and rerun the analysis.

There are a number of key ports covered by Sentinel-1 data in Africa. The available data can be viewed on the DEAfrica Explorer, but we also list the latitude and longitude coordinates for a few key ports below.

Port of Durban in South Africa

latitude = -29.87
longitude = 31.03

Port of Dar Es Salaam in Tanzania

latitude = -6.83
longitude = 39.29

Port de Tanger Med in Morocco

latitude = 35.86
longitude = -5.53

Additional information

License: The code in this notebook is licensed under the Apache License, Version 2.0. Digital Earth Africa data is licensed under the Creative Commons by Attribution 4.0 license.

Contact: If you need assistance, please post a question on the Open Data Cube Slack channel or on the GIS Stack Exchange using the open-data-cube tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on Github.

Compatible datacube version:


Last Tested:

from datetime import datetime'%Y-%m-%d')