This paper proposes HE-VPR++, an extension for aerial Visual Place Recognition (VPR) that introduces learned height-adaptive patch sizes to address the scale variance challenge across extreme altitude ranges. The method dynamically adjusts patch granularity based on estimated altitude, using smaller patches for high-altitude imagery and larger patches for low-altitude scenarios. A lightweight Height-Conditioned Patch Selector (HCPS) predicts optimal patch configurations from frozen DINOv2 features, enabling efficient multi-scale representation without backbone retraining. Combined with a hierarchical height-partitioned database structure and center-weighted masking, HE-VPR++ achieves an expected 8-12% improvement in Recall@1 over state-of-the-art ViT-based baselines on extreme altitude range datasets while reducing memory usage by up to 85%.
Key findings
HE-VPR++ dynamically adjusts patch sizes based on altitude for aerial VPR.
A lightweight Height-Conditioned Patch Selector (HCPS) predicts optimal patch configurations from frozen DINOv2 features.
Hierarchical height-partitioned database structure reduces search space and memory requirements.
Expected 8-12% improvement in Recall@1 over state-of-the-art ViT-based baselines on extreme altitude range datasets.
Memory usage reduced by up to 85%.
Limitations & open questions
The approach has not been tested in real-world aerial robotics applications.
The performance of HE-VPR++ may be affected by extreme weather conditions or low visibility.