DisenMoE: Disentangled Mixture of Experts for 3D Keypoint...

ABSTRACT

Monocular 3D keypoint lifting from 2D observations is a fundamental challenge in computer vision with applications in human pose estimation, robotics, and augmented reality. Current approaches either entangle depth and 2D pose features or rely on domain-specific architectures. We propose DisenMoE, a novel architecture that combines disentangled representation learning with a Mixture-of-Experts routing mechanism to achieve general-purpose 3D keypoint lifting.

PAPER · PDF

DisenMoE_Research_Proposal.pdf ↓ Download PDF

Loading PDF...

↓ View full paper PDF →

Key findings

DisenMoE separates 2D pose features from depth estimation through specialized expert modules.

A learnable router dynamically assigns input keypoints to the most suitable experts based on skeletal topology and joint characteristics.

The design enables cross-domain generalization, efficient computation, and modular scalability.

Limitations & open questions

Risks include expert collapse, routing instability, and domain gap issues.

DisenMoE: Disentangled Mixture of Experts for 3D Keypoint Lifting

Key findings

Limitations & open questions

Related Papers