This paper introduces a novel Event-to-Frame Attention Fusion (EFAF) framework for hybrid visual-inertial odometry (VIO) that leverages transformer-based cross-attention mechanisms to dynamically align and fuse event-based and frame-based visual features. The proposed method addresses limitations in current hybrid VIO systems, including handling high-speed motion, low-illumination scenarios, and motion blur.
Key findings
The proposed EFAF framework leverages transformer-based cross-attention mechanisms to dynamically align and fuse event-based and frame-based visual features.
The method introduces an asynchronous event representation module, a cross-modal attention mechanism, and a unified temporal fusion module.
Expected improvements in absolute trajectory error of 15-25% over state-of-the-art event-based VIO methods while maintaining real-time performance.
Limitations & open questions
The paper does not discuss potential limitations or challenges in implementing the proposed framework in real-world scenarios.