HiMemVLN: Hierarchical Streaming Visual Memory for Contin...

ABSTRACT

This paper proposes HiMemVLN, a novel architecture for Vision-Language Navigation (VLN) that introduces Streaming Visual Memory with hierarchical organization for continuous environment adaptation. The method includes a Multi-Resolution Memory Bank, Dynamic Attention Routing mechanism, and an Episodic Consolidation process, achieving state-of-the-art performance on VLN-CE, R2R, and REVERIE benchmarks.

PAPER · PDF

HiMemVLN.pdf ↓ Download PDF

Loading PDF...

↓ View full paper PDF →

Key findings

HiMemVLN introduces a hierarchical streaming visual memory architecture for continuous environment adaptation in VLN.

The architecture includes a Multi-Resolution Memory Bank, Dynamic Attention Routing, and Episodic Consolidation.

Achieved state-of-the-art performance with a 4.2% success rate improvement on unseen environments and 35% reduced memory footprint.

Limitations & open questions

The paper does not extensively discuss the scalability of HiMemVLN to other types of navigation tasks beyond VLN.

HiMemVLN: Hierarchical Streaming Visual Memory for Continuous Environment Adaptation

Key findings

Limitations & open questions

Related Papers