Loading data from Digital Earth Africa

Keywords beginner’s guide; loading data, loading data; beginner’s guide, data used; landsat 8 geomedian, data used; WOfS, data methods; resampling data methods; reprojecting, data attributes; coordinate reference system

Background

Loading data from the Digital Earth Africa (DE Africa) instance of the Open Data Cube requires the construction of a data query that specifies the what, where, and when of the data request. Each query returns a multi-dimensional xarray object containing the contents of your query. It is essential to understand the xarray data structures as they are fundamental to the structure of data loaded from the datacube. Manipulations, transformations and visualisation of xarray objects provide datacube users with the ability to explore and analyse DE Africa datasets, as well as pose and answer scientific questions.

Description

This notebook will introduce how to load data from the Digital Earth Africa datacube through the construction of a query and use of the dc.load() function. Topics covered include:

  • Loading data using dc.load()

  • Interpreting the resulting xarray.Dataset object

    • Inspecting an individual xarray.DataArray

  • Customising parameters passed to the dc.load() function

    • Loading specific measurements

    • Loading data for coordinates in a custom coordinate reference system (CRS)

    • Projecting data to a new CRS and spatial resolution

    • Specifying a specific spatial resampling method

  • Loading data using a reusable dictionary query

  • Loading matching data from multiple products using like

  • Adding a progress bar to the data load


Getting started

To run this introduction to loading data from DE Africa, run all the cells in the notebook starting with the “Load packages” cell. For help with running notebook cells, refer back to the Jupyter Notebooks notebook.

Load packages

First we need to load the datacube package. This will allow us to query the datacube database and load some data. The with_ui_cbk function from odc.ui will allow us to show a progress bar when loading large amounts of data.

[1]:
import datacube
from odc.ui import with_ui_cbk

Connect to the datacube

We then need to connect to the datacube database. We will then be able to use the dc datacube object to load data. The app parameter is a unique name used to identify the notebook that does not have any effect on the analysis.

[2]:
dc = datacube.Datacube(app="03_Loading_data")

Loading data using dc.load()

Loading data from the datacube uses the dc.load() function.

The function requires the following minimum arguments:

  • product: A specific product to load (to revise DE Africa products, see the Products and measurements notebook).

  • x: Defines the spatial region in the x dimension. By default, the x and y arguments accept queries in a geographical co-ordinate system WGS84, identified by the EPSG code 4326.

  • y: Defines the spatial region in the y dimension. The dimensions longitude/latitude and x/y can be used interchangeably.

  • time: Defines the temporal extent. The time dimension can be specified using a tuple of datetime objects or strings in the “YYYY”, “YYYY-MM” or “YYYY-MM-DD” format.

An optional arguement which provides ease of use and ease of identification of the measurements to load is:

  • measurements: This argument is used to provide a list of measurement names to load, as listed in dc.list_measurements(). For satellite datasets, measurements contain data for each individual satellite band (e.g. near infrared). If not provided, all measurements for the product will be returned, and they will have the default names from the satellite data.

Let’s run a query to load 2018 data from the Sentinel 2 annual geomedian product for part of Nxai Pan National park in Botswana. For this example, we can use the following parameters:

  • product: gm_s2_annual

  • x: (24.60, 24.80)

  • y: (-20.05, -20.25)

  • time: ("2018-01-01", "2018-12-31")

  • measurements: ['blue', 'green', 'red', 'nir', 'swir_1', 'swir_2', 'emad', 'bcmad', 'smad', 'red_edge_1']

Run the following cell to load all datasets from the gm_s2_annual product that match this spatial and temporal extent:

[3]:
ds = dc.load(product="gm_s2_annual",
             x=(24.65, 24.75),
             y=(-20.05, -20.15),
             time=("2018-01-01", "2018-12-31"),
             measurements=['blue', 'green', 'red', 'nir', 'swir_1', 'swir_2', 'emad', 'bcmad', 'smad', 'red_edge_1'])

print(ds)
<xarray.Dataset>
Dimensions:      (time: 1, y: 1201, x: 966)
Coordinates:
  * time         (time) datetime64[ns] 2018-07-02T11:59:59.999999
  * y            (y) float64 -2.507e+06 -2.507e+06 ... -2.519e+06 -2.519e+06
  * x            (x) float64 2.378e+06 2.378e+06 ... 2.388e+06 2.388e+06
    spatial_ref  int32 6933
Data variables:
    blue         (time, y, x) uint16 789 814 827 825 820 ... 765 707 650 647
    green        (time, y, x) uint16 1137 1162 1174 1177 ... 1294 1245 1207 1215
    red          (time, y, x) uint16 1419 1453 1479 1494 ... 1151 1063 993 1006
    nir          (time, y, x) uint16 2607 2631 2644 2677 ... 1840 1761 1693 1636
    swir_1       (time, y, x) uint16 3350 3362 3392 3387 ... 750 657 544 403
    swir_2       (time, y, x) uint16 2442 2453 2482 2467 ... 509 442 375 303
    emad         (time, y, x) float32 784.9 801.8 832.2 ... 1.629e+03 1.695e+03
    bcmad        (time, y, x) float32 0.0516 0.05199 0.05344 ... 0.2089 0.2288
    smad         (time, y, x) float32 0.001915 0.001836 ... 0.03017 0.04047
    red_edge_1   (time, y, x) uint16 1824 1835 1867 1877 ... 1805 1782 1722 1679
Attributes:
    crs:           epsg:6933
    grid_mapping:  spatial_ref

Interpreting the resulting xarray.Dataset

The variable ds has returned an xarray.Dataset containing all data that matched the spatial and temporal query parameters inputted into dc.load.

Dimensions

  • Identifies the number of timesteps returned in the search (time: 1) as well as the number of pixels in the x and y directions of the data query.

Coordinates

  • time identifies the date attributed to each returned timestep.

  • x and y are the coordinates for each pixel within the spatial bounds of your query.

Data variables

  • These are the measurements available for the nominated product. For every date (time) returned by the query, the measured value at each pixel (y, x) is returned as an array for each measurement. Each data variable is itself an xarray.DataArray object (see below).

Attributes

  • crs identifies the coordinate reference system (CRS) of the loaded data.

Inspecting an individual xarray.DataArray

The xarray.Dataset we loaded above is itself a collection of individual xarray.DataArray objects that hold the actual data for each data variable/measurement. For example, all measurements listed under Data variables above (e.g. blue, green, red, nir, swir_1, swir_2) are xarray.DataArray objects.

We can inspect the data in these xarray.DataArray objects using either of the following syntaxes:

ds["measurement_name"]

or:

ds.measurement_name

Being able to access data from individual data variables/measurements allows us to manipulate and analyse data from individual satellite bands or specific layers in a dataset. For example, we can access data from the near infra-red satellite band (i.e. nir):

[4]:
print(ds.nir)
<xarray.DataArray 'nir' (time: 1, y: 1201, x: 966)>
array([[[2607, 2631, 2644, ..., 2293, 2256, 2268],
        [2646, 2655, 2681, ..., 2263, 2251, 2273],
        [2688, 2672, 2693, ..., 2225, 2233, 2265],
        ...,
        [2721, 2733, 2755, ..., 3054, 2691, 2148],
        [2704, 2728, 2745, ..., 2357, 1948, 1755],
        [2708, 2741, 2734, ..., 1761, 1693, 1636]]], dtype=uint16)
Coordinates:
  * time         (time) datetime64[ns] 2018-07-02T11:59:59.999999
  * y            (y) float64 -2.507e+06 -2.507e+06 ... -2.519e+06 -2.519e+06
  * x            (x) float64 2.378e+06 2.378e+06 ... 2.388e+06 2.388e+06
    spatial_ref  int32 6933
Attributes:
    units:         1
    nodata:        0
    crs:           epsg:6933
    grid_mapping:  spatial_ref

Note that the object header informs us that it is an xarray.DataArray containing data for the nir satellite band.

Like an xarray.Dataset, the array also includes information about the data’s dimensions (i.e. (time: 1, y: 801, x: 644)), coordinates and attributes. This particular data variable/measurement contains some additional information that is specific to the nir band, including details of array’s nodata value (i.e. nodata: -999).

Note: For a more in-depth introduction to xarray data structures, refer to the official xarray documentation

Customising the dc.load() function

The dc.load() function can be tailored to refine a query.

Customisation options include:

  • measurements: This argument is used to provide a list of measurement names to load, as listed in dc.list_measurements(). For satellite datasets, measurements contain data for each individual satellite band (e.g. near infrared). If not provided, all measurements for the product will be returned.

  • crs: The coordinate reference system (CRS) of the query’s x and y coordinates is assumed to be WGS84/EPSG:4326 unless the crs field is supplied, even if the stored data is in another projection or the output_crs is specified. The crs parameter is required if your query’s coordinates are in any other CRS.

  • group_by: Satellite datasets based around scenes can have multiple observations per day with slightly different time stamps as the satellite collects data along its path. These observations can be combined by reducing the time dimension to the day level using group_by=solar_day.

  • output_crs and resolution: To reproject or change the resolution the data, supply the output_crs and resolution fields.

  • resampling: This argument allows you to specify a custom spatial resampling method to use when data is reprojected into a different CRS.

Example syntax on the use of these options follows in the cells below.

For help or more customisation options, run help(dc.load) in an empty cell or visit the function’s documentation page

Specifying measurements

By default, dc.load() will load all measurements in a product.

To load data from the red, green and blue satellite bands only, we can add measurements=["red", "green", "blue"] to our query:

[5]:
# Note the optional inclusion of the measurements list
ds_rgb = dc.load(product="gm_s2_annual",
                 measurements=["red", "green", "blue"],
                 x=(24.60, 24.80),
                 y=(-20.05, -20.25),
                 time=("2018-01-01", "2018-12-31"))

print(ds_rgb)
<xarray.Dataset>
Dimensions:      (time: 1, y: 2400, x: 1930)
Coordinates:
  * time         (time) datetime64[ns] 2018-07-02T11:59:59.999999
  * y            (y) float64 -2.507e+06 -2.507e+06 ... -2.531e+06 -2.531e+06
  * x            (x) float64 2.374e+06 2.374e+06 ... 2.393e+06 2.393e+06
    spatial_ref  int32 6933
Data variables:
    red          (time, y, x) uint16 1306 1272 1320 1299 ... 1385 1394 1363 1365
    green        (time, y, x) uint16 989 996 1037 1010 ... 1102 1108 1086 1095
    blue         (time, y, x) uint16 695 700 746 709 752 ... 784 785 791 766 771
Attributes:
    crs:           epsg:6933
    grid_mapping:  spatial_ref

Note that the Data variables component of the xarray.Dataset now includes only the measurements specified in the query (i.e. the red, green and blue satellite bands).

Loading data for coordinates in any CRS

By default, dc.load() assumes that your query x and y coordinates are provided in degrees in the WGS84/EPSG:4326 CRS. If your coordinates are in a different coordinate system, you need to specify this using the crs parameter.

In the example below, we load data for a set of x and y coordinates defined in Africa Albers Equal Area Conic (EPSG:6933), and ensure that the dc.load() function accounts for this by including crs="EPSG:6933":

[6]:
# Note the new `x` and `y` coordinates and `crs` parameter
ds_custom_crs = dc.load(product="gm_s2_annual",
                        time=("2018-01-01", "2018-12-31"),
                        x=(2373555, 2392845),
                        y=(-2507265, -2531265),
                        crs="EPSG:6933")

print(ds_custom_crs)
<xarray.Dataset>
Dimensions:      (time: 1, y: 2401, x: 1930)
Coordinates:
  * time         (time) datetime64[ns] 2018-07-02T11:59:59.999999
  * y            (y) float64 -2.507e+06 -2.507e+06 ... -2.531e+06 -2.531e+06
  * x            (x) float64 2.374e+06 2.374e+06 ... 2.393e+06 2.393e+06
    spatial_ref  int32 6933
Data variables: (12/14)
    B02          (time, y, x) uint16 723 667 670 683 672 ... 775 784 785 791 766
    B03          (time, y, x) uint16 1016 961 976 982 ... 1108 1102 1108 1086
    B04          (time, y, x) uint16 1338 1258 1254 1269 ... 1392 1385 1394 1363
    B05          (time, y, x) uint16 1638 1625 1592 1587 ... 1755 1756 1743 1735
    B06          (time, y, x) uint16 1929 1923 1931 1931 ... 2213 2213 2185 2181
    B07          (time, y, x) uint16 2098 2092 2103 2105 ... 2440 2440 2405 2399
    ...           ...
    B11          (time, y, x) uint16 3139 3107 3034 3026 ... 3361 3368 3345 3327
    B12          (time, y, x) uint16 2298 2255 2172 2161 ... 2458 2465 2429 2410
    SMAD         (time, y, x) float32 0.001018 0.00113 ... 0.003506 0.003435
    EMAD         (time, y, x) float32 654.1 681.0 685.7 ... 905.2 855.7 845.2
    BCMAD        (time, y, x) float32 0.04878 0.05154 ... 0.05118 0.05052
    COUNT        (time, y, x) uint16 55 55 55 55 55 55 55 ... 55 54 54 54 54 54
Attributes:
    crs:           epsg:6933
    grid_mapping:  spatial_ref

CRS reprojection

Certain applications may require that you output your data into a specific CRS. You can reproject your output data by specifying the new output_crs and identifying the resolution required.

In this example, we will reproject our data to a new CRS (UTM Zone 34S, EPSG:32734) and resolution (250 x 250 m). Note that for most CRSs, the first resolution value is negative (e.g. (-250, 250)):

[7]:
ds_reprojected = dc.load(product="gm_s2_annual",
                         x=(24.60, 24.80),
                         y=(-20.05, -20.25),
                         time=("2018-01-01", "2018-12-31"),
                         output_crs="EPSG:32734",
                         resolution=(-250, 250))

print(ds_reprojected)
<xarray.Dataset>
Dimensions:      (time: 1, y: 91, x: 87)
Coordinates:
  * time         (time) datetime64[ns] 2018-07-02T11:59:59.999999
  * y            (y) float64 7.779e+06 7.779e+06 ... 7.757e+06 7.756e+06
  * x            (x) float64 8.761e+05 8.764e+05 ... 8.974e+05 8.976e+05
    spatial_ref  int32 32734
Data variables: (12/14)
    B02          (time, y, x) uint16 683 715 778 817 704 ... 792 747 706 704 744
    B03          (time, y, x) uint16 940 997 1077 1117 ... 1046 1009 1006 1042
    B04          (time, y, x) uint16 1189 1277 1373 1419 ... 1307 1251 1258 1284
    B05          (time, y, x) uint16 1494 1604 1705 1744 ... 1664 1609 1618 1638
    B06          (time, y, x) uint16 1818 1950 2033 2061 ... 2084 2026 2047 2069
    B07          (time, y, x) uint16 1984 2128 2214 2239 ... 2293 2216 2248 2270
    ...           ...
    B11          (time, y, x) uint16 3198 3269 3321 3373 ... 3277 3069 3122 3181
    B12          (time, y, x) uint16 2389 2407 2484 2533 ... 2375 2144 2199 2282
    SMAD         (time, y, x) float32 0.001581 0.001958 ... 0.003856 0.003551
    EMAD         (time, y, x) float32 780.1 794.3 768.2 ... 735.2 777.5 815.4
    BCMAD        (time, y, x) float32 0.06216 0.05727 ... 0.05445 0.05552
    COUNT        (time, y, x) uint16 55 56 56 54 55 54 54 ... 54 55 55 54 53 53
Attributes:
    crs:           EPSG:32734
    grid_mapping:  spatial_ref

Note that the crs attribute in the Attributes section has changed to EPSG:32734. Due to the larger 250 m resolution, there are also now less pixels on the x and y dimensions (e.g. x: 87, y: 91 compared to x: 1930, y: 2100 in earlier examples).

Spatial resampling methods

When a product is re-projected to a different CRS and/or resolution, the new pixel grid may differ from the original input pixels by size, number and alignment. It is therefore necessary to apply a spatial “resampling” rule that allocates input pixel values into the new pixel grid.

By default, dc.load() resamples pixel values using “nearest neighbour” resampling, which allocates each new pixel with the value of the closest input pixel. Depending on the type of data and the analysis being run, this may not be the most appropriate choice (e.g. for continuous data).

The resampling parameter in dc.load() allows you to choose a custom resampling method from the following options:

"nearest", "cubic", "bilinear", "cubic_spline", "lanczos",
"average", "mode", "gauss", "max", "min", "med", "q1", "q3"

For example, we can request that all loaded data is resampled using “average” resampling:

[8]:
# Note the additional `resampling` parameter
ds_averageresampling = dc.load(product="gm_s2_annual",
                                x=(24.60, 24.80),
                                y=(-20.05, -20.25),
                                time=("2018-01-01", "2018-12-31"),
                                resolution=(-250, 250),
                                resampling="average")

print(ds_averageresampling)

<xarray.Dataset>
Dimensions:      (time: 1, y: 96, x: 78)
Coordinates:
  * time         (time) datetime64[ns] 2018-07-02T11:59:59.999999
  * y            (y) float64 -2.507e+06 -2.508e+06 ... -2.531e+06 -2.531e+06
  * x            (x) float64 2.374e+06 2.374e+06 ... 2.393e+06 2.393e+06
    spatial_ref  int32 6933
Data variables: (12/14)
    B02          (time, y, x) uint16 795 827 656 701 808 ... 753 768 765 750 717
    B03          (time, y, x) uint16 1086 1083 917 989 ... 1075 1067 1058 1025
    B04          (time, y, x) uint16 1390 1406 1172 1271 ... 1335 1328 1310 1270
    B05          (time, y, x) uint16 1713 1655 1494 1613 ... 1695 1687 1673 1634
    B06          (time, y, x) uint16 2032 1862 1898 2049 ... 2122 2109 2112 2060
    B07          (time, y, x) uint16 2211 2019 2087 2255 ... 2336 2319 2319 2254
    ...           ...
    B11          (time, y, x) uint16 3547 3632 3145 3249 ... 3381 3283 3209 3092
    B12          (time, y, x) uint16 2752 2959 2254 2301 ... 2487 2367 2288 2184
    SMAD         (time, y, x) float32 0.001492 0.0008584 ... 0.003857 0.003664
    EMAD         (time, y, x) float32 811.6 630.3 642.9 ... 865.0 836.6 757.6
    BCMAD        (time, y, x) float32 0.05379 0.04321 ... 0.05563 0.05322
    COUNT        (time, y, x) uint16 54 55 55 54 53 53 52 ... 55 53 53 55 53 54
Attributes:
    crs:           epsg:6933
    grid_mapping:  spatial_ref

You can also provide a Python dictionary to request a different sampling method for different measurements. This can be particularly useful when some measurements contain contain categorical data which require resampling methods such as “nearest” or “mode” that do not modify the input pixel values.

In the example below, we specify resampling={"red": "nearest", "*": "average"}, which will use “nearest” neighbour resampling for the red satellite band only. "*": "average" will apply “average” resampling for all other satellite bands:

[9]:
ds_customresampling = dc.load(product="gm_s2_annual",
                                    x=(24.60, 24.80),
                                    y=(-20.05, -20.25),
                                    time=("2018-01-01", "2018-12-31"),
                                    resolution=(-250, 250),
                                    resampling={"red": "nearest", "*": "average"})

print(ds_customresampling)
<xarray.Dataset>
Dimensions:      (time: 1, y: 96, x: 78)
Coordinates:
  * time         (time) datetime64[ns] 2018-07-02T11:59:59.999999
  * y            (y) float64 -2.507e+06 -2.508e+06 ... -2.531e+06 -2.531e+06
  * x            (x) float64 2.374e+06 2.374e+06 ... 2.393e+06 2.393e+06
    spatial_ref  int32 6933
Data variables: (12/14)
    B02          (time, y, x) uint16 795 827 656 701 808 ... 753 768 765 750 717
    B03          (time, y, x) uint16 1086 1083 917 989 ... 1075 1067 1058 1025
    B04          (time, y, x) uint16 1390 1406 1172 1271 ... 1335 1328 1310 1270
    B05          (time, y, x) uint16 1713 1655 1494 1613 ... 1695 1687 1673 1634
    B06          (time, y, x) uint16 2032 1862 1898 2049 ... 2122 2109 2112 2060
    B07          (time, y, x) uint16 2211 2019 2087 2255 ... 2336 2319 2319 2254
    ...           ...
    B11          (time, y, x) uint16 3547 3632 3145 3249 ... 3381 3283 3209 3092
    B12          (time, y, x) uint16 2752 2959 2254 2301 ... 2487 2367 2288 2184
    SMAD         (time, y, x) float32 0.001492 0.0008584 ... 0.003857 0.003664
    EMAD         (time, y, x) float32 811.6 630.3 642.9 ... 865.0 836.6 757.6
    BCMAD        (time, y, x) float32 0.05379 0.04321 ... 0.05563 0.05322
    COUNT        (time, y, x) uint16 54 55 55 54 53 53 52 ... 55 53 53 55 53 54
Attributes:
    crs:           epsg:6933
    grid_mapping:  spatial_ref

Note: For more information about spatial resampling methods, see the following guide

Loading data using the query dictionary syntax

It is often useful to re-use a set of query parameters to load data from multiple products. To achieve this, we can load data using the “query dictionary” syntax. This involves placing the query parameters we used to load data above inside a Python dictionary object which we can re-use for multiple data loads:

[10]:
query = {"x": (24.60, 24.80),
         "y": (-20.05, -20.25),
         "time": ("2018-01-01", "2018-12-31")}

We can then use this query dictionary object as an input to dc.load().

The ** syntax below is Python’s “keyword argument unpacking” operator. This operator takes the named query parameters listed in the dictionary we created (e.g. "x": (153.3, 153.4)), and “unpacks” them into the dc.load() function as new arguments. For more information about unpacking operators, refer to the Python documentation

[11]:
ds = dc.load(product="gm_s2_annual",
             **query)

print(ds)
<xarray.Dataset>
Dimensions:      (time: 1, y: 2400, x: 1930)
Coordinates:
  * time         (time) datetime64[ns] 2018-07-02T11:59:59.999999
  * y            (y) float64 -2.507e+06 -2.507e+06 ... -2.531e+06 -2.531e+06
  * x            (x) float64 2.374e+06 2.374e+06 ... 2.393e+06 2.393e+06
    spatial_ref  int32 6933
Data variables: (12/14)
    B02          (time, y, x) uint16 695 700 746 709 752 ... 784 785 791 766 771
    B03          (time, y, x) uint16 989 996 1037 1010 ... 1102 1108 1086 1095
    B04          (time, y, x) uint16 1306 1272 1320 1299 ... 1385 1394 1363 1365
    B05          (time, y, x) uint16 1637 1589 1587 1624 ... 1756 1743 1735 1718
    B06          (time, y, x) uint16 1934 1924 1926 1963 ... 2213 2185 2181 2161
    B07          (time, y, x) uint16 2103 2095 2100 2137 ... 2440 2405 2399 2374
    ...           ...
    B11          (time, y, x) uint16 3118 3034 3038 3118 ... 3368 3345 3327 3253
    B12          (time, y, x) uint16 2266 2180 2183 2234 ... 2465 2429 2410 2343
    SMAD         (time, y, x) float32 0.001054 0.001198 ... 0.003435 0.00359
    EMAD         (time, y, x) float32 696.9 698.4 701.8 ... 855.7 845.2 803.1
    BCMAD        (time, y, x) float32 0.05079 0.05238 ... 0.05052 0.05244
    COUNT        (time, y, x) uint16 55 55 55 55 55 55 55 ... 54 54 54 54 54 54
Attributes:
    crs:           epsg:6933
    grid_mapping:  spatial_ref

Query dictionaries can contain any set of parameters that would usually be provided to dc.load():

[12]:
query = {"x": (24.60, 24.80),
         "y": (-20.05, -20.25),
         "time": ("2018-01-01", "2018-12-31"),
         "output_crs": "EPSG:32734",
         "resolution": (-250, 250)}

ds_s2 = dc.load(product="gm_s2_annual",
                 **query)

print(ds_s2)

<xarray.Dataset>
Dimensions:      (time: 1, y: 91, x: 87)
Coordinates:
  * time         (time) datetime64[ns] 2018-07-02T11:59:59.999999
  * y            (y) float64 7.779e+06 7.779e+06 ... 7.757e+06 7.756e+06
  * x            (x) float64 8.761e+05 8.764e+05 ... 8.974e+05 8.976e+05
    spatial_ref  int32 32734
Data variables: (12/14)
    B02          (time, y, x) uint16 683 715 778 817 704 ... 792 747 706 704 744
    B03          (time, y, x) uint16 940 997 1077 1117 ... 1046 1009 1006 1042
    B04          (time, y, x) uint16 1189 1277 1373 1419 ... 1307 1251 1258 1284
    B05          (time, y, x) uint16 1494 1604 1705 1744 ... 1664 1609 1618 1638
    B06          (time, y, x) uint16 1818 1950 2033 2061 ... 2084 2026 2047 2069
    B07          (time, y, x) uint16 1984 2128 2214 2239 ... 2293 2216 2248 2270
    ...           ...
    B11          (time, y, x) uint16 3198 3269 3321 3373 ... 3277 3069 3122 3181
    B12          (time, y, x) uint16 2389 2407 2484 2533 ... 2375 2144 2199 2282
    SMAD         (time, y, x) float32 0.001581 0.001958 ... 0.003856 0.003551
    EMAD         (time, y, x) float32 780.1 794.3 768.2 ... 735.2 777.5 815.4
    BCMAD        (time, y, x) float32 0.06216 0.05727 ... 0.05445 0.05552
    COUNT        (time, y, x) uint16 55 56 56 54 55 54 54 ... 54 55 55 54 53 53
Attributes:
    crs:           EPSG:32734
    grid_mapping:  spatial_ref

Now we have a reusable query, we can easily use it to load data from a different product. For example, we can load Water Observations from Space (WOfS) Annual Summary data for the same extent, time, output CRS and resolution that we just loaded Landsat 8 Geomedian data for:

[13]:
ds_wofs = dc.load(product="wofs_ls_summary_annual",
                 **query)

print(ds_wofs)
<xarray.Dataset>
Dimensions:      (time: 1, y: 91, x: 87)
Coordinates:
  * time         (time) datetime64[ns] 2018-07-02T11:59:59.999999
  * y            (y) float64 7.779e+06 7.779e+06 ... 7.757e+06 7.756e+06
  * x            (x) float64 8.761e+05 8.764e+05 ... 8.974e+05 8.976e+05
    spatial_ref  int32 32734
Data variables:
    count_wet    (time, y, x) int16 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
    count_clear  (time, y, x) int16 55 55 56 57 57 56 57 ... 30 30 30 30 29 29
    frequency    (time, y, x) float32 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0
Attributes:
    crs:           EPSG:32734
    grid_mapping:  spatial_ref

Other helpful tricks

Loading data “like” another dataset

Another option for loading matching data from multiple products is to use dc.load()’s like parameter. This will copy the spatial and temporal extent and the CRS/resolution from an existing dataset, and use these parameters to load a new data from a new product.

In the example below, we load another WOfS dataset that exactly matches the ds_s2 dataset we loaded earlier:

[14]:
ds_wofs = dc.load(product="wofs_ls_summary_annual",
                 like=ds_s2)

print(ds_wofs)
<xarray.Dataset>
Dimensions:      (time: 1, y: 91, x: 87)
Coordinates:
  * time         (time) datetime64[ns] 2018-07-02T11:59:59.999999
  * y            (y) float64 7.779e+06 7.779e+06 ... 7.757e+06 7.756e+06
  * x            (x) float64 8.761e+05 8.764e+05 ... 8.974e+05 8.976e+05
    spatial_ref  int32 32734
Data variables:
    count_wet    (time, y, x) int16 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
    count_clear  (time, y, x) int16 55 55 56 57 57 56 57 ... 30 30 30 30 29 29
    frequency    (time, y, x) float32 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0
Attributes:
    crs:           PROJCS["WGS 84 / UTM zone 34S",GEOGCS["WGS 84",DATUM["WGS_...
    grid_mapping:  spatial_ref

Adding a progress bar

When loading large amounts of data, it can be useful to view the progress of the data load. The progress_cbk parameter in dc.load() allows us to add a progress bar which will indicate how the load is progressing. In this example, we will load 5 years of data (2013, 2014, 2015, 2016 and 2017) from the ga_ls8c_wofs_2_annual_summary product with a progress bar:

[15]:
query = {"x": (24.60, 24.80),
         "y": (-20.05, -20.25),
         "time": ("2013", "2016")}

ds_progress = dc.load(product="wofs_ls_summary_annual",
                      progress_cbk=with_ui_cbk(),
                      **query)

print(ds_progress)
<xarray.Dataset>
Dimensions:      (time: 4, y: 801, x: 644)
Coordinates:
  * time         (time) datetime64[ns] 2013-07-02T11:59:59.999999 ... 2016-07...
  * y            (y) float64 -2.507e+06 -2.507e+06 ... -2.531e+06 -2.531e+06
  * x            (x) float64 2.374e+06 2.374e+06 ... 2.393e+06 2.393e+06
    spatial_ref  int32 6933
Data variables:
    count_wet    (time, y, x) int16 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
    count_clear  (time, y, x) int16 45 45 45 45 45 44 44 ... 32 32 32 32 33 33
    frequency    (time, y, x) float32 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0
Attributes:
    crs:           epsg:6933
    grid_mapping:  spatial_ref

Additional information

License: The code in this notebook is licensed under the Apache License, Version 2.0. Digital Earth Africa data is licensed under the Creative Commons by Attribution 4.0 license.

Contact: If you need assistance, please post a question on the Open Data Cube Slack channel or on the GIS Stack Exchange using the open-data-cube tag (you can view previously asked questions here). If you would like to report an issue with this notebook, you can file one on Github.

Compatible datacube version:

[16]:
print(datacube.__version__)
1.8.15

Last Tested:

[17]:
from datetime import datetime
datetime.today().strftime('%Y-%m-%d')
[17]:
'2023-08-11'