ABSTRACT
PolyLingua is a novel framework that enables zero-latency multilingual user interaction through on-device language model inference. It integrates adaptive model compression, language-specific routing, and hardware-aware optimization to support over 50 languages with sub-100ms inference latency on smartphones.
PAPER · PDF
Loading PDF...
Key findings
PolyLingua reduces average memory usage by 60% compared to static multilingual models.
The framework achieves 10x model size reduction with less than 3% accuracy degradation.
PolyLingua supports over 50 languages with sub-100ms inference latency on modern smartphones.
Limitations & open questions
The research proposal does not yet include empirical results or user study outcomes.