Inter-frame contextual modelling for visual speech recognition

Adrian Pass, Ji Ming, Philip Hanna, Jianguo Zhang, Darryl Stewart

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    1 Citation (Scopus)

    Abstract

    In this paper, we present a new approach to visual speech recognition which improves contextual modelling by combining Inter- Frame Dependent and Hidden Markov Models. This approach captures contextual information in visual speech that may be lost using a Hidden Markov Model alone. We apply contextual modelling to a large speaker independent isolated digit recognition task, and compare our approach to two commonly adopted feature based techniques for incorporating speech dynamics. Results are presented from baseline feature based systems and the combined modelling technique. We illustrate that both of these techniques achieve similar levels of performance when used independently. However significant improvements in performance can be achieved through a combination of the two. In particular we report an improvement in excess of 17% relative Word Error Rate in comparison to our best baseline system. © 2010 IEEE.
    Original languageEnglish
    Title of host publication2010 17th IEEE International Conference on Image Processing, ICIP 2010
    Subtitle of host publicationProceedings
    Place of PublicationPiscataway, NJ
    PublisherIEEE
    Pages93-96
    Number of pages4
    ISBN (Print)9781424479948
    DOIs
    Publication statusPublished - 2010
    Event17th IEEE International Conference on Image Processing - Hong Kong Convention and Exhibition Centre, Hong Kong, Hong Kong
    Duration: 26 Sep 201029 Sep 2010
    http://www.icip2010.org/

    Conference

    Conference17th IEEE International Conference on Image Processing
    Abbreviated titleICIP 2010
    CountryHong Kong
    CityHong Kong
    Period26/09/1029/09/10
    Internet address

    Fingerprint Dive into the research topics of 'Inter-frame contextual modelling for visual speech recognition'. Together they form a unique fingerprint.

    Cite this