ABSTRACT
This paper proposes BANDIT-E2, a framework that dynamically adjusts inference depth and feature compression in response to real-time bandwidth fluctuations, optimizing end-to-end quality-of-inference.
PAPER · PDF
Loading PDF...
Key findings
BANDIT-E2 achieves up to 43% reduction in end-to-end latency.
Maintains less than 0.5% accuracy degradation compared to full-model inference.
Integrates multi-exit neural architectures, learned bandwidth prediction, and a decision-theoretic controller.
Limitations & open questions
Assumes availability of real-time bandwidth prediction
Does not account for potential inaccuracies in bandwidth prediction