Unsupervised object discovery is a fundamental challenge in computer vision, requiring the identification and segmentation of object instances without manual annotations. This paper proposes FCOD-HVC, a novel framework that integrates multi-scale feature coherence with hierarchical visual cues for robust unsupervised object discovery. The method introduces a Hierarchical Feature Coherence Module and a Scale-Adaptive Slot Attention mechanism, along with a Cross-Level Feature Alignment loss, achieving state-of-the-art performance on PASCAL VOC, COCO, and CLEVRTex.
Key findings
FCOD-HVC integrates multi-scale feature coherence with hierarchical visual cues for unsupervised object discovery.
The Hierarchical Feature Coherence Module enforces consistency across different levels of visual abstraction.
Scale-Adaptive Slot Attention dynamically adjusts to object scales, improving representation.
Cross-Level Feature Alignment loss ensures semantic coherence between hierarchical representations.
FCOD-HVC improves unsupervised object discovery accuracy by 8.3% over prior methods on multiple benchmarks.
Limitations & open questions
The framework's performance on highly occluded objects or in extremely cluttered scenes is not explicitly evaluated.
The generalization of FCOD-HVC to other domains beyond the tested benchmarks is not discussed.