Real-Time Gaze-directed speech enhancement for audio-visual hearing-aids

Arif Reza Anway, Bryony Buck, Mandar Gogate, Kia Dashtipour, Michael Akeroyd, Amir Hussain

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Downloads (Pure)

Abstract

This study introduces a novel real-time, gaze-directed audio-visual speech enhancement (AVSE) framework for hearing aids designed to improve speech intelligibility for individuals with hearing loss (pHL) in noisy environments. Existing gaze estimation methods often rely solely on eye angle, leading to reduced accuracy. Our approach addresses this limitation by combining eye angle and nose position with head pose estimation to enhance target speaker identification and facilitate noise reduction within the AVSE framework. We utilize a novel eye gaze estimation algorithm that leverages the listener's nose position for improved accuracy. Head pose estimation is also used to capture the overall direction of attention. This combined information is utilized in real-time to steer a beamformer towards the target speaker, effectively enhancing their voice and suppressing background noise. Pilot trials with pHL users demonstrated high accuracy (99.55% - 99.88%) in estimating target speaker direction using the proposed algorithm. This research presents a promising approach for improving communication accessibility and social interaction for pIH users by potentially enhancing speech recognition in challenging listening situations. Future studies will quantify the improvement in speech intelligibility achieved by the gaze directed AVSE framework.

Original languageEnglish
Title of host publicationProceedings of Interspeech 2024
PublisherThe International Symposium on Computer Architecture (ISCA)
Pages2034-2035
Number of pages2
DOIs
Publication statusPublished - 2024
Event25th Interspeech Conferece 2024: Speech and Beyond - Kos Island, Greece
Duration: 1 Sept 20245 Sept 2024
Conference number: 25th
https://interspeech2024.org/ (Interspeech 2024 conference website)

Publication series

NameInterspeech 2024
PublisherThe International Symposium on Computer Architecture (ISCA)
ISSN (Electronic)2958-1796

Conference

Conference25th Interspeech Conferece 2024
Abbreviated titleInterspeech 2024
Country/TerritoryGreece
CityKos Island
Period1/09/245/09/24
Internet address

Keywords

  • communication
  • multimodal hearing-aid
  • speech enhancement

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Fingerprint

Dive into the research topics of 'Real-Time Gaze-directed speech enhancement for audio-visual hearing-aids'. Together they form a unique fingerprint.

Cite this