ABSTRACT
The paper addresses the challenge of scaling infinite-depth Bayesian neural network theory to modern vision transformers by deriving infinite-depth Gaussian process limits for attention-based architectures and developing practical approximation algorithms.
PAPER · PDF
Loading PDF...
Key findings
Derives infinite-depth GP limits for attention-based architectures, extending NNGP theory to transformers.
Develops Nesterov-accelerated fixed-point iteration for faster convergence to the infinite-depth limit.
Proposes scalable approximation algorithms using spectral truncation and inducing point methods.
Demonstrates superior uncertainty calibration on UCI benchmarks and strong OOD detection on CIFAR-10/100.
Limitations & open questions
The paper does not discuss the limitations of the proposed methods in detail.