This research proposes Hierarchical Progress Milestone Networks (HPMN) to address the challenge of temporal credit assignment in reinforcement learning, particularly in tasks with long horizons and sparse rewards. HPMN introduces a structured decomposition of credit assignment through learnable progress milestones, combining hierarchical value decomposition with explicit progress estimation for effective credit propagation.
Key findings
HPMN introduces hierarchical progress decomposition, dense progress signals, and multi-scale credit propagation.
The method provides convergence guarantees and is evaluated on robotic manipulation, locomotion, and navigation benchmarks.
Addresses limitations of existing methods including bias-variance tradeoff in return estimation and inefficient exploration in sparse reward settings.
Limitations & open questions
The paper does not discuss the computational complexity of HPMN or its scalability to very large tasks.