This paper proposes DiffPose-GNN, a novel end-to-end differentiable framework that learns to localize camera poses directly from OpenStreetMap (OSM) graph representations without requiring explicit 2D-3D correspondences. The key innovation is formulating OSM data as learnable geometric graphs that are differentiable with respect to camera pose parameters, enabling gradient-based optimization of pose estimates through neural message passing.
Key findings
DiffPose-GNN achieves inference speeds of 45 FPS while maintaining competitive localization accuracy.
Extensive experiments on nuScenes, KITTI, and Argoverse demonstrate that DiffPose-GNN outperforms existing OSM-based methods by 18% in median translation error and 23% in orientation error.
Limitations & open questions
The paper does not discuss the limitations of the proposed method.