Conservative Q-Learning with Adaptive Uncertainty Quantif...

ABSTRACT

This paper introduces Conservative Q-Learning with Adaptive Uncertainty Quantification (CQL-AUQ) to address non-stationarity in wireless communication systems due to fading channels. CQL-AUQ disentangles aleatoric and epistemic uncertainties using deep ensembles and introduces an adaptive conservative penalty that scales with estimated epistemic uncertainty, ensuring safe policy improvement with bounded regret.

PAPER · PDF

manuscript.pdf ↓ Download PDF

Loading PDF...

↓ View full paper PDF →

Key findings

CQL-AUQ addresses non-stationarity in wireless channels through principled uncertainty estimation and adaptive conservative value learning.

The approach disentangles aleatoric and epistemic uncertainties, enabling the agent to distinguish between inherent environmental stochasticity and knowledge gaps.

An adaptive conservative penalty scales with estimated epistemic uncertainty, allowing appropriate conservatism in uncertain channel states.

Theoretical analysis shows CQL-AUQ achieves safe policy improvement with bounded regret under non-stationary fading dynamics.

Limitations & open questions

The paper does not discuss the computational complexity of the proposed CQL-AUQ framework.

The effectiveness of CQL-AUQ is yet to be empirically validated on real-world wireless systems.

Conservative Q-Learning with Adaptive Uncertainty Quantification for Fading Channels

Key findings

Limitations & open questions

Related Papers