This paper introduces a novel framework, JWAQ-GAS, for joint weight and activation quantization in neural networks. It proposes a unified sensitivity metric based on gradient analysis to capture the joint impact of quantization errors on both weights and activations. The framework is grounded in second-order optimization theory and includes an efficient bit allocation algorithm that optimizes precision levels across layers. Experiments show state-of-the-art results on ImageNet classification tasks.
Key findings
Proposes a unified theoretical framework for analyzing joint sensitivity to weight and activation quantization.
Introduces Gradient-Aware Sensitivity metric to capture combined quantization errors.
Develops an efficient algorithm for constrained optimization of bit allocation.
Quantization-aware training procedure improves convergence at ultra-low bit-widths.
Achieves 76.8% top-1 accuracy on ImageNet with ResNet-50 at W4A4.
Limitations & open questions
Further research is needed to extend the framework to other network architectures and datasets.