StreamingLookahead: Extending LookaheadKV to Streaming Lo...

ABSTRACT

This paper presents StreamingLookahead, a novel framework that extends LookaheadKV to handle streaming long-context scenarios with online cache management. It introduces a Streaming Importance Predictor, an Online Eviction Scheduler, and a Future-Aware Importance Propagation mechanism, enabling efficient processing of unbounded context streams with a fixed-size KV cache.

PAPER · PDF

StreamingLookahead.pdf ↓ Download PDF

Loading PDF...

↓ View full paper PDF →

Key findings

StreamingLookahead achieves 94.5% of full-cache performance with only 5% of the cache size.

Outperforms StreamingLLM by 18.3% and H2O by 12.7% on streaming tasks.

Maintains constant per-token latency, achieving up to 47× speedup over full-cache baselines at 1M token contexts.

Limitations & open questions

The framework's performance in scenarios with extremely high token arrival rates is not evaluated.

StreamingLookahead: Extending LookaheadKV to Streaming Long-Context Scenarios

Key findings

Limitations & open questions

Related Papers