Evaluating LLM Agent Reasoning on Multi-Hop Biomedical KG...

ABSTRACT

This paper proposes a methodological framework to evaluate LLM agent reasoning on multi-hop biomedical KG traversals, focusing on reasoning fidelity, path faithfulness, and clinical safety. It introduces a taxonomy of reasoning patterns, a multi-dimensional evaluation protocol, and a benchmark suite for systematic assessment.

PAPER · PDF

BioKGEval_Manuscript.pdf ↓ Download PDF

Loading PDF...

↓ View full paper PDF →

Key findings

Proposes a comprehensive framework for evaluating LLM agents on biomedical KG traversal tasks.

Introduces a taxonomy of multi-hop reasoning patterns specific to biomedical KGs.

Develops a benchmark suite with clinically validated questions requiring 2-5 hop reasoning.

Presents novel metrics for hallucination detection and correlates graph topological features with reasoning difficulty.

Limitations & open questions

The framework's effectiveness is contingent on the quality and coverage of the biomedical KGs used.

The evaluation metrics may need further refinement as more complex reasoning tasks are identified.

Evaluating LLM Agent Reasoning on Multi-Hop Biomedical KG Traversals

Key findings

Limitations & open questions

Related Papers