This paper establishes theoretical bounds on the prediction horizon required for stable online learning in diffusion policies, connecting Langevin dynamics, receding horizon control, and online learning theory to derive convergence guarantees.
Key findings
Derives explicit bounds relating the prediction horizon to the diffusion timestep, score estimation error, and online learning rate.
Proves that for stable online learning, the prediction horizon must satisfy a specific scaling with respect to action dimension and diffusion step size.
Validates theoretical predictions through analysis of regret bounds and proposes an adaptive prediction horizon algorithm.
Limitations & open questions
Theoretical framework may require empirical validation in diverse robotic systems.
Adaptive prediction horizon algorithm's practical performance in real-world scenarios needs further testing.