Adaptive Cache Policies for Heterogeneous LLM Workload Mi...

ABSTRACT

This paper addresses the challenge of managing Key-Value caches in Large Language Model serving systems across diverse workloads. AdaptCache, an adaptive cache policy framework, dynamically adjusts strategies based on real-time workload characteristics, improving cache hit rates, reducing latency, and increasing throughput.

PAPER · PDF

manuscript.pdf ↓ Download PDF

Loading PDF...

↓ View full paper PDF →

Key findings

AdaptCache improves cache hit rates by up to 2.3x, reduces p99 latency by 47%, and increases throughput by 1.8x compared to state-of-the-art policies.

The framework introduces a workload-aware policy engine that predicts cache utility and selects optimal strategies for different workload segments.

A lightweight reinforcement learning-based policy selector achieves near-optimal performance with minimal overhead.

Limitations & open questions

The study focuses on production-like traces, further real-world validation may be required.

The framework's scalability and performance in other types of heterogeneous environments need to be explored.

Adaptive Cache Policies for Heterogeneous LLM Workload Mixtures

Key findings

Limitations & open questions

Related Papers