GeoCroissant Demo
GeoCroissant Demo — GeoMind¶
This notebook demonstrates the full GeoMind + GeoCroissant workflow:
- Validate the generated JSON-LD file against the Croissant 1.1 and GeoCroissant 1.0 specifications
- Inspect the GeoCroissant-specific metadata fields (CRS, resolution, bands, conformance)
- Access the raw Sentinel-2 Zarr store referenced in the metadata
- Visualise the imagery - True Colour RGB, NDVI, single-band, and a spatial subset
The JSON-LD file in docs/notebooks/ was produced by GeoMind's tool in response to a plain-English prompt.
Import Libraries¶
In [1]:
Copied!
import xarray as xr
import json
import zarr
import numpy as np
import matplotlib.pyplot as plt
import mlcroissant as mlc
import xarray as xr
import json
import zarr
import numpy as np
import matplotlib.pyplot as plt
import mlcroissant as mlc
Validate GeoCroissant Metadata¶
mlcroissant validate checks that the JSON-LD file conforms to both Croissant 1.1 and GeoCroissant 1.0. A clean Done. output (no errors, no warnings) means the metadata is structurally valid and ready to be consumed by any Croissant-aware ML framework or data catalogue.
In [ ]:
Copied!
!mlcroissant validate --jsonld=croissant_S2B_MSIL2A_20260301T125259_N0512_R138_T27WXN_20260301T163056_6007.json
!mlcroissant validate --jsonld=croissant_S2B_MSIL2A_20260301T125259_N0512_R138_T27WXN_20260301T163056_6007.json
I0302 11:48:59.455116 16608 validate.py:53] Done.
Load GeoCroissant Metadata JSON¶
In [ ]:
Copied!
json_path = r'croissant_S2B_MSIL2A_20260301T125259_N0512_R138_T27WXN_20260301T163056_6007.json'
with open(json_path) as meta_f:
croissant_metadata = json.load(meta_f)
json_path = r'croissant_S2B_MSIL2A_20260301T125259_N0512_R138_T27WXN_20260301T163056_6007.json'
with open(json_path) as meta_f:
croissant_metadata = json.load(meta_f)
Pretty Print GeoCroissant JSON-LD¶
In [ ]:
Copied!
import json
# Load and pretty-print the content of croissant.json
with open("croissant_S2B_MSIL2A_20260301T125259_N0512_R138_T27WXN_20260301T163056_6007.json", "r") as f:
croissant_data = json.load(f)
# Pretty-print JSON to console
print(json.dumps(croissant_data, indent=2))
import json
# Load and pretty-print the content of croissant.json
with open("croissant_S2B_MSIL2A_20260301T125259_N0512_R138_T27WXN_20260301T163056_6007.json", "r") as f:
croissant_data = json.load(f)
# Pretty-print JSON to console
print(json.dumps(croissant_data, indent=2))
{
"@context": {
"@language": "en",
"@vocab": "https://schema.org/",
"citeAs": "cr:citeAs",
"column": "cr:column",
"conformsTo": "dct:conformsTo",
"cr": "http://mlcommons.org/croissant/",
"geocr": "http://mlcommons.org/croissant/geo/",
"rai": "http://mlcommons.org/croissant/RAI/",
"dct": "http://purl.org/dc/terms/",
"sc": "https://schema.org/",
"data": {
"@id": "cr:data",
"@type": "@json"
},
"examples": {
"@id": "cr:examples",
"@type": "@json"
},
"dataType": {
"@id": "cr:dataType",
"@type": "@vocab"
},
"extract": "cr:extract",
"field": "cr:field",
"fileProperty": "cr:fileProperty",
"fileObject": "cr:fileObject",
"fileSet": "cr:fileSet",
"format": "cr:format",
"includes": "cr:includes",
"isLiveDataset": "cr:isLiveDataset",
"jsonPath": "cr:jsonPath",
"key": "cr:key",
"md5": "cr:md5",
"parentField": "cr:parentField",
"path": "cr:path",
"recordSet": "cr:recordSet",
"references": "cr:references",
"regex": "cr:regex",
"repeated": "cr:repeated",
"replace": "cr:replace",
"samplingRate": "cr:samplingRate",
"separator": "cr:separator",
"source": "cr:source",
"subField": "cr:subField",
"transform": "cr:transform",
"equivalentProperty": "cr:equivalentProperty"
},
"@type": "sc:Dataset",
"@id": "S2B_MSIL2A_20260301T125259_N0512_R138_T27WXN_20260301T163056",
"name": "S2B_MSIL2A_20260301T125259_N0512_R138_T27WXN_20260301T163056",
"description": "Sentinel-2 L2A Imagery for Iceland",
"version": "1.0.0",
"license": "https://creativecommons.org/licenses/by/4.0/",
"conformsTo": [
"http://mlcommons.org/croissant/1.1",
"http://mlcommons.org/croissant/geo/1.0"
],
"citeAs": "@dataset{sentinel2_S2B_MSIL2A_20260301T125259_N0512_R138_T27WXN_20260301T163056, title={Sentinel-2 S2B_MSIL2A_20260301T125259_N0512_R138_T27WXN_20260301T163056}, year={2026}, url={https://stac.core.eopf.eodc.eu/}}",
"datePublished": "2026-03-02",
"spatialCoverage": {
"@type": "Place",
"geo": {
"@type": "GeoShape",
"box": "64.77074135158136 -18.89315632774026 65.80593069527565 -16.41850676127295"
}
},
"geocr:coordinateReferenceSystem": "EPSG:4326",
"geocr:spatialResolution": {
"@type": "QuantitativeValue",
"value": 10.0,
"unitText": "meters"
},
"temporalCoverage": "2026-03-01T12:52:59.024000Z/2026-03-01T12:52:59.024000Z",
"keywords": [
"sentinel-2",
"satellite",
"imagery",
"earth-observation"
],
"distribution": [
{
"@type": "cr:FileObject",
"@id": "asset_product",
"name": "product",
"contentUrl": "https://objects.eodc.eu:443/e05ab01a9d56408d82ac32d69a5aae2a:202603-s02msil2a-eu/01/products/cpm_v262/S2B_MSIL2A_20260301T125259_N0512_R138_T27WXN_20260301T163056.zarr",
"encodingFormat": "application/vnd+zarr",
"sha256": "placeholder"
}
],
"recordSet": [
{
"@type": "cr:RecordSet",
"@id": "record_set_imagery_bands",
"name": "imagery_bands",
"field": [
{
"@type": "cr:Field",
"@id": "field_SR_10m",
"name": "SR_10m",
"description": "Surface Reflectance - 10m",
"dataType": "sc:Float",
"source": {
"fileObject": {
"@id": "asset_product"
},
"extract": {
"column": "measurements/reflectance/r10m"
}
}
},
{
"@type": "cr:Field",
"@id": "field_SR_20m",
"name": "SR_20m",
"description": "Surface Reflectance - 20m",
"dataType": "sc:Float",
"source": {
"fileObject": {
"@id": "asset_product"
},
"extract": {
"column": "measurements/reflectance/r20m"
}
}
},
{
"@type": "cr:Field",
"@id": "field_SR_60m",
"name": "SR_60m",
"description": "Surface Reflectance - 60m",
"dataType": "sc:Float",
"source": {
"fileObject": {
"@id": "asset_product"
},
"extract": {
"column": "measurements/reflectance/r60m"
}
}
},
{
"@type": "cr:Field",
"@id": "field_AOT_10m",
"name": "AOT_10m",
"description": "Aerosol optical thickness (AOT)",
"dataType": "sc:Float",
"source": {
"fileObject": {
"@id": "asset_product"
},
"extract": {
"column": "quality/atmosphere/r10m/aot"
}
}
},
{
"@type": "cr:Field",
"@id": "field_B01_20m",
"name": "B01_20m",
"description": "Coastal aerosol (band 1) - 20m",
"dataType": "sc:Float",
"source": {
"fileObject": {
"@id": "asset_product"
},
"extract": {
"column": "measurements/reflectance/r20m/b01"
}
}
},
{
"@type": "cr:Field",
"@id": "field_B02_10m",
"name": "B02_10m",
"description": "Blue (band 2) - 10m",
"dataType": "sc:Float",
"source": {
"fileObject": {
"@id": "asset_product"
},
"extract": {
"column": "measurements/reflectance/r10m/b02"
}
}
},
{
"@type": "cr:Field",
"@id": "field_B03_10m",
"name": "B03_10m",
"description": "Green (band 3) - 10m",
"dataType": "sc:Float",
"source": {
"fileObject": {
"@id": "asset_product"
},
"extract": {
"column": "measurements/reflectance/r10m/b03"
}
}
},
{
"@type": "cr:Field",
"@id": "field_B04_10m",
"name": "B04_10m",
"description": "Red (band 4) - 10m",
"dataType": "sc:Float",
"source": {
"fileObject": {
"@id": "asset_product"
},
"extract": {
"column": "measurements/reflectance/r10m/b04"
}
}
},
{
"@type": "cr:Field",
"@id": "field_B05_20m",
"name": "B05_20m",
"description": "Red edge 1 (band 5) - 20m",
"dataType": "sc:Float",
"source": {
"fileObject": {
"@id": "asset_product"
},
"extract": {
"column": "measurements/reflectance/r20m/b05"
}
}
},
{
"@type": "cr:Field",
"@id": "field_B06_20m",
"name": "B06_20m",
"description": "Red edge 2 (band 6) - 20m",
"dataType": "sc:Float",
"source": {
"fileObject": {
"@id": "asset_product"
},
"extract": {
"column": "measurements/reflectance/r20m/b06"
}
}
},
{
"@type": "cr:Field",
"@id": "field_B07_20m",
"name": "B07_20m",
"description": "Red edge 3 (band 7) - 20m",
"dataType": "sc:Float",
"source": {
"fileObject": {
"@id": "asset_product"
},
"extract": {
"column": "measurements/reflectance/r20m/b07"
}
}
},
{
"@type": "cr:Field",
"@id": "field_B08_10m",
"name": "B08_10m",
"description": "NIR 1 (band 8) - 10m",
"dataType": "sc:Float",
"source": {
"fileObject": {
"@id": "asset_product"
},
"extract": {
"column": "measurements/reflectance/r10m/b08"
}
}
},
{
"@type": "cr:Field",
"@id": "field_B09_60m",
"name": "B09_60m",
"description": "NIR 3 (band 9) - 60m",
"dataType": "sc:Float",
"source": {
"fileObject": {
"@id": "asset_product"
},
"extract": {
"column": "measurements/reflectance/r60m/b09"
}
}
},
{
"@type": "cr:Field",
"@id": "field_B11_20m",
"name": "B11_20m",
"description": "SWIR 1 (band 11) - 20m",
"dataType": "sc:Float",
"source": {
"fileObject": {
"@id": "asset_product"
},
"extract": {
"column": "measurements/reflectance/r20m/b11"
}
}
},
{
"@type": "cr:Field",
"@id": "field_B12_20m",
"name": "B12_20m",
"description": "SWIR 2 (band 12) - 20m",
"dataType": "sc:Float",
"source": {
"fileObject": {
"@id": "asset_product"
},
"extract": {
"column": "measurements/reflectance/r20m/b12"
}
}
},
{
"@type": "cr:Field",
"@id": "field_B8A_20m",
"name": "B8A_20m",
"description": "NIR 2 (band 8A) - 20m",
"dataType": "sc:Float",
"source": {
"fileObject": {
"@id": "asset_product"
},
"extract": {
"column": "measurements/reflectance/r20m/b8a"
}
}
},
{
"@type": "cr:Field",
"@id": "field_SCL_20m",
"name": "SCL_20m",
"description": "Scene classification map (SCL)",
"dataType": "sc:Float",
"source": {
"fileObject": {
"@id": "asset_product"
},
"extract": {
"column": "conditions/mask/l2a_classification/r20m/scl"
}
}
},
{
"@type": "cr:Field",
"@id": "field_TCI_10m",
"name": "TCI_10m",
"description": "True color image",
"dataType": "sc:Float",
"source": {
"fileObject": {
"@id": "asset_product"
},
"extract": {
"column": "quality/l2a_quicklook/r10m/tci"
}
}
},
{
"@type": "cr:Field",
"@id": "field_WVP_10m",
"name": "WVP_10m",
"description": "Water vapour (WVP)",
"dataType": "sc:Float",
"source": {
"fileObject": {
"@id": "asset_product"
},
"extract": {
"column": "quality/atmosphere/r10m/wvp"
}
}
},
{
"@type": "cr:Field",
"@id": "field_product",
"name": "product",
"description": "EOPF Product",
"dataType": "sc:Float",
"source": {
"fileObject": {
"@id": "asset_product"
}
}
}
]
}
],
"geocr:bandConfiguration": {
"@type": "geocr:BandConfiguration",
"geocr:totalBands": 19,
"geocr:bandNamesList": [
"SR_10m",
"SR_20m",
"SR_60m",
"AOT_10m",
"B01_20m",
"B02_10m",
"B03_10m",
"B04_10m",
"B05_20m",
"B06_20m",
"B07_20m",
"B08_10m",
"B09_60m",
"B11_20m",
"B12_20m",
"B8A_20m",
"SCL_20m",
"TCI_10m",
"WVP_10m"
]
}
}
Extract Zarr URL¶
In [3]:
Copied!
zarr_url = croissant_metadata["distribution"][0]["contentUrl"]
print(f"Accessing Zarr Store directly at: {zarr_url}")
zarr_url = croissant_metadata["distribution"][0]["contentUrl"]
print(f"Accessing Zarr Store directly at: {zarr_url}")
Accessing Zarr Store directly at: https://objects.eodc.eu:443/e05ab01a9d56408d82ac32d69a5aae2a:202603-s02msil2a-eu/01/products/cpm_v262/S2B_MSIL2A_20260301T125259_N0512_R138_T27WXN_20260301T163056.zarr
Open Zarr Store¶
In [4]:
Copied!
store = zarr.open(zarr_url, mode='r')
store = zarr.open(zarr_url, mode='r')
Access Single Band¶
In [7]:
Copied!
b02_path = "measurements/reflectance/r10m/b02"
b02_band = store[b02_path]
print("Type:", type(b02_band))
print("Shape:", b02_band.shape)
print("Dtype:", b02_band.dtype)
print("Chunks:", b02_band.chunks)
b02_path = "measurements/reflectance/r10m/b02"
b02_band = store[b02_path]
print("Type:", type(b02_band))
print("Shape:", b02_band.shape)
print("Dtype:", b02_band.dtype)
print("Chunks:", b02_band.chunks)
Type: <class 'zarr.core.Array'> Shape: (10980, 10980) Dtype: uint16 Chunks: (1830, 1830)
Open with Xarray¶
In [5]:
Copied!
ds10m = xr.open_zarr(
zarr_url,
group="measurements/reflectance/r10m"
)
print(ds10m)
ds10m = xr.open_zarr(
zarr_url,
group="measurements/reflectance/r10m"
)
print(ds10m)
<xarray.Dataset> Size: 4GB
Dimensions: (y: 10980, x: 10980)
Coordinates:
* x (x) float32 44kB 6e+05 6e+05 6e+05 ... 7.098e+05 7.098e+05
* y (y) float32 44kB 7.3e+06 7.3e+06 7.3e+06 ... 7.19e+06 7.19e+06
Data variables:
b02 (y, x) float64 964MB dask.array<chunksize=(1830, 1830), meta=np.ndarray>
b03 (y, x) float64 964MB dask.array<chunksize=(1830, 1830), meta=np.ndarray>
b04 (y, x) float64 964MB dask.array<chunksize=(1830, 1830), meta=np.ndarray>
b08 (y, x) float64 964MB dask.array<chunksize=(1830, 1830), meta=np.ndarray>
True Color Image (RGB)¶
In [9]:
Copied!
rgb = np.stack(
[ds10m.b04.values, ds10m.b03.values, ds10m.b02.values],
axis=-1
).astype(np.float32)
# Percentile stretch per channel to avoid colour casts
p2, p98 = np.nanpercentile(rgb, 2), np.nanpercentile(rgb, 98)
rgb = np.clip((rgb - p2) / (p98 - p2), 0, 1)
plt.figure(figsize=(8, 8))
plt.imshow(rgb)
plt.title("True Colour RGB (B04 / B03 / B02)")
plt.axis("off")
plt.show()
rgb = np.stack(
[ds10m.b04.values, ds10m.b03.values, ds10m.b02.values],
axis=-1
).astype(np.float32)
# Percentile stretch per channel to avoid colour casts
p2, p98 = np.nanpercentile(rgb, 2), np.nanpercentile(rgb, 98)
rgb = np.clip((rgb - p2) / (p98 - p2), 0, 1)
plt.figure(figsize=(8, 8))
plt.imshow(rgb)
plt.title("True Colour RGB (B04 / B03 / B02)")
plt.axis("off")
plt.show()
NDVI (Normalised Difference Vegetation Index)¶
In [6]:
Copied!
ndvi = (ds10m.b08 - ds10m.b04) / (ds10m.b08 + ds10m.b04)
plt.figure(figsize=(8,8))
ndvi.plot(cmap="RdYlGn")
plt.title("NDVI")
plt.show()
ndvi = (ds10m.b08 - ds10m.b04) / (ds10m.b08 + ds10m.b04)
plt.figure(figsize=(8,8))
ndvi.plot(cmap="RdYlGn")
plt.title("NDVI")
plt.show()
Single Band Visualization¶
In [7]:
Copied!
plt.figure(figsize=(8,8))
ds10m.b04.plot(cmap="gray")
plt.title("Red band")
plt.show()
plt.figure(figsize=(8,8))
ds10m.b04.plot(cmap="gray")
plt.title("Red band")
plt.show()
Spatial Subset (2000 × 2000 px crop)¶
The full scene is large; this crops a representative 2000 × 2000 pixel window for quick inspection.
In [8]:
Copied!
subset = ds10m.isel(
x=slice(2000, 4000),
y=slice(2000, 4000)
)
rgb_sub = np.stack(
[subset.b04.values, subset.b03.values, subset.b02.values],
axis=-1
).astype(np.float32)
p2, p98 = np.nanpercentile(rgb_sub, 2), np.nanpercentile(rgb_sub, 98)
rgb_sub = np.clip((rgb_sub - p2) / (p98 - p2), 0, 1)
plt.figure(figsize=(8, 8))
plt.imshow(rgb_sub)
plt.title("True Colour RGB — Spatial Subset (x 2000–4000, y 2000–4000)")
plt.axis("off")
plt.show()
subset = ds10m.isel(
x=slice(2000, 4000),
y=slice(2000, 4000)
)
rgb_sub = np.stack(
[subset.b04.values, subset.b03.values, subset.b02.values],
axis=-1
).astype(np.float32)
p2, p98 = np.nanpercentile(rgb_sub, 2), np.nanpercentile(rgb_sub, 98)
rgb_sub = np.clip((rgb_sub - p2) / (p98 - p2), 0, 1)
plt.figure(figsize=(8, 8))
plt.imshow(rgb_sub)
plt.title("True Colour RGB — Spatial Subset (x 2000–4000, y 2000–4000)")
plt.axis("off")
plt.show()