This research proposes a novel framework that leverages eye-tracking metrics to estimate annotator cognitive load during the IDRR annotation process, enabling real-time identification of potentially unreliable labels. The method integrates pupillometry, fixation duration, and saccade patterns with discourse-specific features to predict cognitive load, which is then used to weight annotator labels and improve aggregation quality.
Key findings
High annotator disagreement in IDRR often stems from cognitive complexity rather than genuine ambiguity.
Cognitive load serves as a proxy for annotation reliability, with high load correlating to less reliable judgments.
The proposed CL-ET framework monitors cognitive load in real-time using eye-tracking and predicts label reliability scores.
Annotations are weighted according to estimated cognitive load, improving label aggregation quality.
Limitations & open questions
The framework's effectiveness in diverse annotator populations and across different discourse types needs further validation.
Long-term cognitive load effects on annotation consistency are not addressed in this proposal.