ABSTRACT
This research proposes adapting ICC-1M-style interleaved pretraining to medical imaging and scientific diagram understanding, addressing gaps in medical image-text-code corpora and proposing a MedCode-Percept framework for domain-specific enhancements.
PAPER · PDF
Loading PDF...
Key findings
ICC-1M dataset enhances visual understanding in structured scientific domains.
Proposed MedCode-Percept extends code-grounded perception to medical imaging.
Code-grounded pretraining may improve medical VLM performance by 8-15% on reasoning tasks.
Limitations & open questions
Scarcity of interleaved medical image-text-code corpora.
Challenges in understanding medical image structure.
Need for domain-specific code representations of anatomical and pathological concepts.