This research proposes a framework to analyze phoneme-specific degradation in MRI-to-clean speech transfer and develop targeted recovery mechanisms. It introduces a phoneme-aware degradation analysis module, an adaptive multi-branch recovery network, and a phoneme-scale intelligibility evaluation protocol. The study reveals severe degradation in plosives and fricatives, particularly affecting plosive burst characteristics, and addresses these through articulatory-informed attention mechanisms and perceptual loss functions.
Key findings
Plosives and fricatives show the most severe degradation in MRI-to-clean speech transfer.
Plosive burst characteristics are particularly affected by MRI-based synthesis limitations.
Articulatory-informed attention mechanisms and perceptual loss functions improve phoneme quality.
Experimental validation on USC-TIMIT MRI corpus shows significant improvements in phoneme error rate and perceptual quality.
Limitations & open questions
Further research is needed to generalize the findings across different speaker demographics.
The proposed recovery framework requires extensive training data for each phoneme category.