This paper introduces a framework for transferring Flow Matching Diffusion Transformer policies to unseen manipulator morphologies without fine-tuning. It leverages Diffusion Transformers and Flow Matching to learn morphology-agnostic representations and uses a Cross-Embodiment Morphology Encoder to map diverse robot kinematics to a unified latent space.
Key findings
Achieves zero-shot transfer across manipulators with varying degrees of freedom and joint configurations.
Introduces a unified action representation based on SE(3) end-effector trajectories and residual joint corrections.
Develops a morphology-aware attention mechanism that conditions the DiT on robot kinematic graphs.
Limitations & open questions
The paper does not discuss the scalability of the proposed method to a larger number of morphologies.
The effectiveness of the framework in real-world scenarios with varying environmental conditions is not fully explored.