Theoretical Analysis of Token-Frequency Bias in Policy Gr...

ABSTRACT

This paper presents a theoretical analysis of token-frequency bias in policy gradient methods used for clinical text generation. It formalizes the bias mechanism, proves inherent amplification of high-frequency tokens, and proposes Frequency-Aware Policy Optimization (FAPO) to reduce bias and improve medical terminology recall.

PAPER · PDF

manuscript.pdf ↓ Download PDF

Loading PDF...

↓ View full paper PDF →

Key findings

Policy gradient methods exhibit a systematic bias towards high-frequency tokens.

Standard REINFORCE and PPO objectives suppress rare but clinically relevant terms.

Frequency-Aware Policy Optimization (FAPO) reduces frequency bias by 34% and improves medical terminology recall by 28%.

Theoretical framework reveals bias magnitude scales with the inverse of token frequency and advantage function variance.

Limitations & open questions

The study focuses on clinical text generation and may not generalize to other domains.

Further research is needed to evaluate FAPO's effectiveness across different language models and datasets.

Theoretical Analysis of Token-Frequency Bias in Policy Gradient Methods for Clinical Text Generation

Key findings

Limitations & open questions

Related Papers