Extending Code-as-Perception to Video-Based STEM Reasonin...

ABSTRACT

This paper presents TemporalViper, a novel framework that extends code-as-perception to video-based STEM reasoning through temporal code generation. It introduces a temporal program representation, a modular video understanding architecture, and an adaptive temporal memory mechanism. The framework achieves state-of-the-art performance on compositional spatio-temporal reasoning tasks while maintaining interpretability and compositional generalization benefits.

PAPER · PDF

TemporalViper_Video_STEM_Reasoning.pdf ↓ Download PDF

Loading PDF...

↓ View full paper PDF →

Key findings

TemporalViper extends code-as-perception to video-based STEM reasoning.

The framework introduces explicit temporal operators and stateful variable tracking.

It integrates spatio-temporal perception modules with domain-specific scientific reasoning operators.

TemporalViper achieves state-of-the-art performance on compositional spatio-temporal reasoning tasks.

Limitations & open questions

The paper does not discuss the computational complexity of TemporalViper.

The scalability of the framework to longer video sequences is not addressed.

Extending Code-as-Perception to Video-Based STEM Reasoning

Key findings

Limitations & open questions

Related Papers