This research proposes a probing framework to analyze JEPA-DNA's functional embeddings and extract interpretable regulatory motifs, focusing on the interpretability of Genomic Foundation Models trained with Joint-Embedding Predictive Architectures.
Key findings
JEPA-DNA's embedding spaces capture higher-order functional semantics of DNA sequences.
A novel motif extraction pipeline is introduced to discover transcription factor binding motifs without supervised training.
The validation strategy includes benchmarking against motif databases, functional enrichment analysis, and ablation studies.
Limitations & open questions
The interpretability of JEPA-DNA's learned representations remains largely unexplored.
Further research is needed to establish causal relationships between embedding structure and biological function.