This paper proposes a diagnostic diversity sampling framework for pathology report generation, integrating uncertainty quantification with semantic diversity measures to select informative training samples, aiming to reduce annotation costs while maintaining diagnostic reliability.
Key findings
Proposes a DDS framework for pathology report generation from whole slide images.
Integrates diagnostic-aware uncertainty quantification with semantic diversity measures.
Aims to achieve comparable performance to fully supervised approaches using 40-60% of labeled data.
Establishes a foundation for extending active learning principles to other low-resource medical NLP applications.
Limitations & open questions
The proposed framework's effectiveness is yet to be empirically validated.
The study's scope is limited to pathology report generation tasks.