ABSTRACT
This paper proposes a novel method integrating Reconfigurable Intelligent Surfaces (RIS) into audio-visual speaker diarization to isolate individual speakers by controlling acoustic signals from specific directions, enhancing speaker separability.
PAPER · PDF
Loading PDF...
Key findings
Proposes a new paradigm integrating RIS into audio-visual diarization.
Develops a joint RIS-Diarization optimization framework to maximize speaker separability.
Includes a multi-modal fusion network combining RIS-enhanced audio, visual features, and location cues.
Plans comprehensive evaluations including synthetic RIS-augmented datasets and real-world feasibility analysis.
Limitations & open questions
Potential hardware constraints of RIS
Computational complexity of joint optimization
Generalization to unseen room geometries