Tree / Orchard Detection & Counting from High-Resolution Satellite Images

Summary

This project presents an end-to-end geospatial machine learning pipeline designed to detect and count individual trees and orchards in very high-resolution multispectral satellite imagery. The pipeline processes raw 4-band GeoTIFF data (Blue, Green, Red, and NIR) and utilizes a custom PyTorch U-Net architecture trained with density map regression. Instead of traditional bounding-box object detection, the approach convolves point annotations with Gaussian kernels to generate smooth density heatmaps, where pixel integration yields highly accurate counts even in complex horticultural terrains. The framework supports tile slicing (512×512 px), automated VOC XML-to-point-mask conversions, and full-image spatial metadata preservation for real-world environmental monitoring and sustainable agricultural planning.


HuggingFace Dataset GitHub Kaggle


Model Architecture and Training

Approach - Density Map Regression

Instead of traditional object detection, we use density map regression. Each point annotation is convolved with a Gaussian kernel (sigma = 2) to produce a smooth density map. The integral (pixel sum) of the predicted density map directly gives the tree count.

U-Net Architecture

A lightweight 3-level U-Net built in PyTorch:

U-Net Architecture

Training Sample Visualization

Below is a sample from the training set showing the RGB image, ground-truth mask, and the corresponding Gaussian density map:

Training Sample

Training Curves

Training Curves


Predictions and Results

Prediction on Train Data

Prediction on Train

Prediction on Test Data

Prediction on Test

Each prediction panel shows:

  1. Original RGB - the input satellite tile rendered as true-color.
  2. Predicted Density Map - heatmap where the sum gives the predicted tree count.
  3. Overlay - predicted density heatmap blended on top of the original image, highlighting detected tree locations.

Notebooks

Kaggle - tree-orchard-detection

NotebookDescription
code.ipynbComplete end-to-end workflow - data loading, model definition, training, evaluation, and visualization
inference.ipynbLoad the pre-trained best_model.pth and run inference on any GeoTIFF - no training required