Unsupervised Agent Learning for Amodal Completion…

ABSTRACT

This research proposes Agent-Amodal, a framework that enables unsupervised amodal completion without paired annotations by framing it as active perception. An embodied agent learns to explore visual scenes and accumulate multi-view evidence through curiosity-driven viewpoint selection. The approach utilizes a temporal-consistent completion network with self-supervised geometric consistency losses and occlusion-aware contrastive learning. This method addresses limitations of supervised and synthetic data approaches while laying groundwork for autonomous amodal perception systems.

PAPER · PDF

manuscript.pdf ↓ Download PDF

Loading PDF...

↓ View full paper PDF →

Key findings

Curiosity-driven exploration module strategically selects viewpoints to disambiguate occlusions using reinforcement learning with information gain rewards

Temporal-aggregation completion network maintains and updates shape hypotheses while enforcing geometric consistency across multiple observations

Self-supervised objectives including reconstruction losses and occlusion-aware contrastive learning enable training without ground-truth amodal masks

Framework designed to handle novel categories and embodied AI scenarios without expensive human annotations or synthetic data domain gaps

Comprehensive validation planned on KINS, COCOA, and D2SA benchmarks with ablation studies to isolate component contributions

Limitations & open questions

Method is currently a research proposal without empirical validation or implemented results

Reliance on curiosity-driven exploration may face challenges in highly cluttered scenes with severe occlusions

Self-supervised learning objectives may produce ambiguous completions for objects with high shape variability

Unsupervised Agent Learning for Amodal Completion…

Key findings

Limitations & open questions

Related Papers