Multi-Modal Mixture-of-Experts Fusion for Robust Pose Est...

ABSTRACT

This research addresses the challenge of human pose estimation under occlusion and varying lighting by introducing MM-MoE-Pose, a multi-modal fusion framework that dynamically routes visual and inertial features through expert networks based on input reliability, enabling robust pose estimation.

PAPER · PDF

Multimodal_MoE_Pose.pdf ↓ Download PDF

Loading PDF...

↓ View full paper PDF →

Key findings

MM-MoE-Pose dynamically selects expert networks based on sensor reliability for robust pose estimation.

The framework includes modality-specific encoders, a sparse MoE fusion layer, a cross-modal calibration module, and a kinematic decoder.

Achieves state-of-the-art results in challenging scenarios while maintaining real-time inference capabilities.

Limitations & open questions

The paper does not discuss the computational overhead of the proposed framework.

Further research is needed to scale the framework for broader applications.

Multi-Modal Mixture-of-Experts Fusion for Robust Pose Estimation with IMU Data

Key findings

Limitations & open questions

Related Papers