ABSTRACT
This paper presents a theoretical analysis on why code-based semantic representations exhibit lower hallucination rates compared to natural language captions in multimodal grounding. It identifies four mechanisms: formal unambiguity, executable verification, structured grounding, and deterministic composition, formalized within a probabilistic framework.
PAPER · PDF
Loading PDF...
Key findings
Code-based representations have lower hallucination rates due to formal unambiguity.
Executable verification allows direct validation of semantic correctness.
Structured grounding provides explicit referential anchors to visual entities.
Deterministic composition ensures predictable meaning unlike natural language.
Limitations & open questions
Further empirical studies are needed to validate theoretical claims.