This research introduces DAC-Len, a novel training framework that optimizes for reasoning accuracy and efficiency by jointly considering problem difficulty and target reasoning length. It includes a difficulty-aware length scheduler, a length-regularized reward function, and a dynamic curriculum sampler, aiming to reduce inference costs by 40-60% while maintaining accuracy.
Key findings
DAC-Len optimizes reasoning accuracy and efficiency through curriculum scheduling.
Introduces a difficulty-aware length scheduler, length-regularized reward, and dynamic curriculum sampler.
Expected to achieve comparable or superior accuracy with significantly reduced inference costs.
Limitations & open questions
The framework's effectiveness is yet to be empirically validated on the proposed benchmarks.