Google Summer of Code: AI-ready Dataset Metadata as a Service

This summer, I participated as an open-source student developer in the Google Summer of Code 2025 program with the Open Source Geospatial Foundation (OSGeo), working on the ZOO-Project.

My project focused on developing metadata-as-a-service utilities. Specifically, it integrates GeoCroissant metadata, Data-Centric AI workflows, and OGC API – Processes to make Earth observation datasets highly AI-ready and standardized.


Brief Description

The project aims to enhance the ZOO-Project with native support for GeoCroissant metadata, enabling AI-ready geospatial datasets. It will provide tools for metadata generation, validation, and integration with platforms like STAC, Earth Engine, and HuggingFace, along with data-centric AI workflows for improving dataset quality.


State of the Project Before GSoC

While the ZOO-Project already offers solid support for OGC-compliant geoprocessing, it currently doesn’t have built-in support for GeoCroissant—a metadata standard designed specifically for AI-ready geospatial datasets. There are no tools available within ZOO to help users create or validate this kind of metadata or to connect easily with existing platforms like STAC, Earth Engine, or machine learning hubs like HuggingFace and Kaggle. It also lacks workflows that can help users check the quality of their training data or fix common issues like annotation errors or bias. This project aims to fill those gaps and bring these much-needed features to the ZOO-Project.


Deliverables

  • Integration of GeoCroissant metadata support into OGC API – Processes.
  • Services for metadata generation, validation, and conversion from STAC, Earth Engine, HuggingFace, and Kaggle.
  • REST endpoints for metadata hosting and JSON-LD-based service chaining.
  • Implementation of Data-Centric AI workflows using Cleanlab for label noise and bias detection.
  • Interoperability tools for STAC, OGC TrainingDML, and MLCommons Croissant formats.
  • Full test suite, example datasets, and usage tutorials.
  • Comprehensive documentation and project wiki with deployment guides.

Detailed Proposal

You can read our full, detailed project proposal and follow along with the source updates on the official wiki:


Participants

Below is the listing of the mentors and developer behind the project:

RoleNameGitHub Handle
1st MentorChetan Mahajan@cOsprey
2nd MentorGérald Fenoy@gfenoy
Student DeveloperHarsh Shinde@HarshShinde0