This paper proposes DemographicMOS, a framework extending demographic-aware quality prediction to include age groups and cultural backgrounds. It analyzes perceptual differences across demographic dimensions, revealing biases in quality perception and proposing a multi-task learning architecture to capture these interactions.
Key findings
Older listeners exhibit more lenient scoring patterns due to age-related hearing changes.
Cultural background significantly influences quality perception, especially for speech naturalness and artifact sensitivity.
A multi-task learning architecture with hierarchical demographic embeddings captures interactions between demographic factors while maintaining data efficiency.
Limitations & open questions
The study primarily focuses on Western, English-speaking populations, limiting the generalizability of the findings.
Further research is needed to understand the long-term impact of demographic biases on speech quality assessment.