Synthesizing emotions in speech

is it time to get excited?

Iain R. Murray, John L. Arnott

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    24 Citations (Scopus)

    Abstract

    Modern speech synthesis systems with very high intelligibility are readily available in a number of languages. However, the output from all present systems is still readily identifiable as being machine-generated - the output does not sound `natural'. One aspect of naturalness is the variability introduced by the emotional state of the speaker, and related pragmatic effects; no current commercial systems include such variation. Comparatively little work has been done to investigate how a speaker's emotional state creates variation in the speech signal, and this work has traditionally been performed by psychologists and has remained distinct from mainstream speech science. Current research suggests that there will be considerable effort involved in producing any accurate description of pragmatic variations in speech, but there has recently been increasing interest in this area due to potential applications in many branches of speech technology. This paper describes a prototype system which has been constructed to simulate emotion in speech synthesized by rule. The system is based on emotion information from the literature, and it simulates a range of emotions using a commercial synthesizer. The use of emotion models and their applicability in the area of speech technology is discussed. The limitations of our current knowledge in the area of vocal emotion are discussed, and suggestions are presented for future research in this area.
    Original languageEnglish
    Title of host publicationProceedings of the Fourth International Conference on Spoken Language, ICSLP 96
    Place of PublicationPiscataway, N.J.
    PublisherIEEE
    Pages1816-1819
    Number of pages4
    Volume3
    DOIs
    Publication statusPublished - 1996
    EventFourth International Conference on Spoken Language,1996. ICSLP 96. - Philadelphia, P.A., United States
    Duration: 3 Oct 19966 Oct 1996
    http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=606911

    Conference

    ConferenceFourth International Conference on Spoken Language,1996. ICSLP 96.
    CountryUnited States
    CityPhiladelphia, P.A.
    Period3/10/966/10/96
    Internet address

    Fingerprint

    Speech intelligibility
    Speech synthesis
    Acoustic waves

    Cite this

    Murray, I. R., & Arnott, J. L. (1996). Synthesizing emotions in speech: is it time to get excited? In Proceedings of the Fourth International Conference on Spoken Language, ICSLP 96 (Vol. 3, pp. 1816-1819). Piscataway, N.J.: IEEE. https://doi.org/10.1109/ICSLP.1996.607983
    Murray, Iain R. ; Arnott, John L. / Synthesizing emotions in speech : is it time to get excited?. Proceedings of the Fourth International Conference on Spoken Language, ICSLP 96. Vol. 3 Piscataway, N.J. : IEEE, 1996. pp. 1816-1819
    @inproceedings{694d8201128343fea9e6acddd522e502,
    title = "Synthesizing emotions in speech: is it time to get excited?",
    abstract = "Modern speech synthesis systems with very high intelligibility are readily available in a number of languages. However, the output from all present systems is still readily identifiable as being machine-generated - the output does not sound `natural'. One aspect of naturalness is the variability introduced by the emotional state of the speaker, and related pragmatic effects; no current commercial systems include such variation. Comparatively little work has been done to investigate how a speaker's emotional state creates variation in the speech signal, and this work has traditionally been performed by psychologists and has remained distinct from mainstream speech science. Current research suggests that there will be considerable effort involved in producing any accurate description of pragmatic variations in speech, but there has recently been increasing interest in this area due to potential applications in many branches of speech technology. This paper describes a prototype system which has been constructed to simulate emotion in speech synthesized by rule. The system is based on emotion information from the literature, and it simulates a range of emotions using a commercial synthesizer. The use of emotion models and their applicability in the area of speech technology is discussed. The limitations of our current knowledge in the area of vocal emotion are discussed, and suggestions are presented for future research in this area.",
    author = "Murray, {Iain R.} and Arnott, {John L.}",
    note = "Copyright 2004 Elsevier Science B.V., Amsterdam. All rights reserved.",
    year = "1996",
    doi = "10.1109/ICSLP.1996.607983",
    language = "English",
    volume = "3",
    pages = "1816--1819",
    booktitle = "Proceedings of the Fourth International Conference on Spoken Language, ICSLP 96",
    publisher = "IEEE",

    }

    Murray, IR & Arnott, JL 1996, Synthesizing emotions in speech: is it time to get excited? in Proceedings of the Fourth International Conference on Spoken Language, ICSLP 96. vol. 3, IEEE, Piscataway, N.J., pp. 1816-1819, Fourth International Conference on Spoken Language,1996. ICSLP 96. , Philadelphia, P.A., United States, 3/10/96. https://doi.org/10.1109/ICSLP.1996.607983

    Synthesizing emotions in speech : is it time to get excited? / Murray, Iain R.; Arnott, John L.

    Proceedings of the Fourth International Conference on Spoken Language, ICSLP 96. Vol. 3 Piscataway, N.J. : IEEE, 1996. p. 1816-1819.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    TY - GEN

    T1 - Synthesizing emotions in speech

    T2 - is it time to get excited?

    AU - Murray, Iain R.

    AU - Arnott, John L.

    N1 - Copyright 2004 Elsevier Science B.V., Amsterdam. All rights reserved.

    PY - 1996

    Y1 - 1996

    N2 - Modern speech synthesis systems with very high intelligibility are readily available in a number of languages. However, the output from all present systems is still readily identifiable as being machine-generated - the output does not sound `natural'. One aspect of naturalness is the variability introduced by the emotional state of the speaker, and related pragmatic effects; no current commercial systems include such variation. Comparatively little work has been done to investigate how a speaker's emotional state creates variation in the speech signal, and this work has traditionally been performed by psychologists and has remained distinct from mainstream speech science. Current research suggests that there will be considerable effort involved in producing any accurate description of pragmatic variations in speech, but there has recently been increasing interest in this area due to potential applications in many branches of speech technology. This paper describes a prototype system which has been constructed to simulate emotion in speech synthesized by rule. The system is based on emotion information from the literature, and it simulates a range of emotions using a commercial synthesizer. The use of emotion models and their applicability in the area of speech technology is discussed. The limitations of our current knowledge in the area of vocal emotion are discussed, and suggestions are presented for future research in this area.

    AB - Modern speech synthesis systems with very high intelligibility are readily available in a number of languages. However, the output from all present systems is still readily identifiable as being machine-generated - the output does not sound `natural'. One aspect of naturalness is the variability introduced by the emotional state of the speaker, and related pragmatic effects; no current commercial systems include such variation. Comparatively little work has been done to investigate how a speaker's emotional state creates variation in the speech signal, and this work has traditionally been performed by psychologists and has remained distinct from mainstream speech science. Current research suggests that there will be considerable effort involved in producing any accurate description of pragmatic variations in speech, but there has recently been increasing interest in this area due to potential applications in many branches of speech technology. This paper describes a prototype system which has been constructed to simulate emotion in speech synthesized by rule. The system is based on emotion information from the literature, and it simulates a range of emotions using a commercial synthesizer. The use of emotion models and their applicability in the area of speech technology is discussed. The limitations of our current knowledge in the area of vocal emotion are discussed, and suggestions are presented for future research in this area.

    UR - http://www.scopus.com/inward/record.url?scp=0030371808&partnerID=8YFLogxK

    U2 - 10.1109/ICSLP.1996.607983

    DO - 10.1109/ICSLP.1996.607983

    M3 - Conference contribution

    VL - 3

    SP - 1816

    EP - 1819

    BT - Proceedings of the Fourth International Conference on Spoken Language, ICSLP 96

    PB - IEEE

    CY - Piscataway, N.J.

    ER -

    Murray IR, Arnott JL. Synthesizing emotions in speech: is it time to get excited? In Proceedings of the Fourth International Conference on Spoken Language, ICSLP 96. Vol. 3. Piscataway, N.J.: IEEE. 1996. p. 1816-1819 https://doi.org/10.1109/ICSLP.1996.607983