AI-ready Dataset Metadata as a Service: Google Summer of Code 2025

August 31, 2025 • Harsh Shinde

Brief Description

The project aims to enhance the ZOO-Project with native support for GeoCroissant metadata, enabling AI-ready geospatial datasets. It will provide tools for metadata generation, validation, and integration with platforms like STAC, Earth Engine, and HuggingFace, along with data-centric AI workflows for improving dataset quality.

State of the Project Before GSoC

While the ZOO-Project already offers solid support for OGC-compliant geoprocessing, it currently doesn't have built-in support for GeoCroissant—a metadata standard designed specifically for AI-ready geospatial datasets. There are no tools available within ZOO to help users create or validate this kind of metadata or to connect easily with existing platforms like STAC, Earth Engine, or machine learning hubs like HuggingFace and Kaggle. It also lacks workflows that can help users check the quality of their training data or fix common issues like annotation errors or bias. This project aims to fill those gaps and bring these much-needed features to the ZOO-Project.

Deliverables

Detailed Proposal

Check out the full project proposal: Detailed Proposal Link (GitHub Wiki)

Participants

Title Name GitHub Handle
1st Mentor Chetan Mahajan @cOsprey
2nd Mentor Gérald Fenoy @gfenoy
Student Developer Harsh Shinde @HarshShinde0