Non-Greedy Policy for Diverse Document Set Optimization |...

ABSTRACT

This paper addresses document set optimization with diversity constraints, a fundamental problem in information retrieval where existing greedy approaches (MMR, DPPs) suffer from myopic decision-making. We propose NG-DivRL, a novel reinforcement learning framework that formulates the problem as a Markov Decision Process with a submodular diversity-aware reward shaping mechanism. Using an actor-critic architecture with set-to-sequence encoding, our method learns to optimize for final set quality rather than locally optimal choices. Empirical evaluation on TREC diversity benchmarks demonstrates 12-18% gains in α-NDCG and 15-22% improvements in subtopic coverage over greedy baselines.

PAPER · PDF

manuscript.pdf ↓ Download PDF

Loading PDF...

↓ View full paper PDF →

Key findings

NG-DivRL achieves 12-18% improvements in α-NDCG and 15-22% better subtopic coverage compared to greedy baselines (MMR, DPPs) on TREC diversity benchmarks.

The MDP formulation with novel state representation enables non-greedy decisions that consider future diversity contributions rather than just immediate marginal gains.

Submodular diversity-aware reward shaping provides dense training signals while preserving theoretical approximation guarantees for the learned policy.

Set-to-sequence encoding combined with actor-critic architecture effectively handles variable-size document collections and complex interactions between documents.

This represents the first end-to-end learned approach that directly optimizes for set-level diversity without relying on greedy approximation.

Non-Greedy Policy for Diverse Document Set Optimization

Key findings

Related Papers