This paper proposes a research framework to understand the implicit biases governing symmetry learning in transformers, using a multi-level probing methodology.
Key findings
Transformers exhibit bias toward permutation-symmetric functions, especially in sequence space.
Initial weights significantly influence the inductive biases of transformer architectures.
Symmetry-related representations in transformers remain underdeveloped and opaque.
SymProbe framework combines sparse autoencoders, circuit tracing, and controlled interventions to analyze symmetry biases.
Limitations & open questions
Limited mechanistic understanding of how symmetry representations emerge in transformers.
Existing probing methodologies are insufficient for detecting symmetry-related representations.
Uncertainty in the relationship between architectural design and symmetry learning biases.