Fine-Tuning Encoders for Causal Discovery in Scientific L...

ABSTRACT

This paper proposes a methodological framework for fine-tuning encoder-based language models for causal discovery in scientific literature. The approach combines domain-adaptive pretraining on scientific corpora with task-specific contrastive learning objectives to learn robust causal representations. The paper presents a comprehensive validation plan including benchmark datasets, evaluation metrics, baseline comparisons, and ablation studies.

PAPER · PDF

manuscript.pdf ↓ Download PDF

Loading PDF...

↓ View full paper PDF →

Key findings

Causal discovery from scientific literature presents unique challenges due to complexity and domain-specificity of causal claims.

Large language models achieve near-random performance on causal reasoning tasks, particularly with implicit causal relationships in scientific texts.

Proposed framework combines domain-adaptive pretraining with task-specific contrastive learning for robust causal representations.

Addresses critical gaps by focusing on fine-grained causal relation extraction, handling implicit causal statements, and ensuring domain generalization.

Includes intrinsic evaluation on causal extraction benchmarks and extrinsic evaluation through downstream causal discovery pipeline integration.

Limitations & open questions

Evaluation relies on existing benchmarks which may not fully capture complexity of real-world scientific texts.

Framework's effectiveness in diverse scientific disciplines needs further validation.

Fine-Tuning Encoders for Causal Discovery in Scientific Literature

Key findings

Limitations & open questions

Related Papers