This paper addresses document set optimization with diversity constraints, a fundamental problem in information retrieval where existing greedy approaches (MMR, DPPs) suffer from myopic decision-making. We propose NG-DivRL, a novel reinforcement learning framework that formulates the problem as a Markov Decision Process with a submodular diversity-aware reward shaping mechanism. Using an actor-critic architecture with set-to-sequence encoding, our method learns to optimize for final set quality rather than locally optimal choices. Empirical evaluation on TREC diversity benchmarks demonstrates 12-18% gains in α-NDCG and 15-22% improvements in subtopic coverage over greedy baselines.
Key findings
NG-DivRL achieves 12-18% improvements in α-NDCG and 15-22% better subtopic coverage compared to greedy baselines (MMR, DPPs) on TREC diversity benchmarks.
The MDP formulation with novel state representation enables non-greedy decisions that consider future diversity contributions rather than just immediate marginal gains.
Submodular diversity-aware reward shaping provides dense training signals while preserving theoretical approximation guarantees for the learned policy.
Set-to-sequence encoding combined with actor-critic architecture effectively handles variable-size document collections and complex interactions between documents.
This represents the first end-to-end learned approach that directly optimizes for set-level diversity without relying on greedy approximation.