Prosodic Representations of Prominence Classification Neural Networks and Autoencoders Using Bottleneck Features

Show simple item record

dc.contributor.author Kakouros, Sofoklis
dc.contributor.author Suni, Antti
dc.contributor.author Šimko, Juraj
dc.contributor.author Vainio, Martti
dc.date.accessioned 2020-02-20T11:47:02Z
dc.date.available 2020-02-20T11:47:02Z
dc.date.issued 2019
dc.identifier.citation Kakouros , S , Suni , A , Šimko , J & Vainio , M 2019 , Prosodic Representations of Prominence Classification Neural Networks and Autoencoders Using Bottleneck Features . in 20th Annual Conference of the International Speech Communication Association (INTERSPEECH 2019) : Crossroads of Speech and Language . Interspeech , ISCA , Baixas , pp. 1946-1950 , Annual Conference of the International-Speech-Communication-Association , Graz , Austria , 15/09/2019 . https://doi.org/10.21437/Interspeech.2019-2984
dc.identifier.citation conference
dc.identifier.other PURE: 132387639
dc.identifier.other PURE UUID: f750fbed-458d-4cf7-ae35-39c16a4ede9f
dc.identifier.other Scopus: 85074682797
dc.identifier.other ORCID: /0000-0003-2570-0196/work/70947222
dc.identifier.other ORCID: /0000-0001-8996-0793/work/70953389
dc.identifier.uri http://hdl.handle.net/10138/312019
dc.description.abstract Prominence perception has been known to correlate with a complex interplay of the acoustic features of energy, fundamental frequency, spectral tilt, and duration. The contribution and importance of each of these features in distinguishing between prominent and non-prominent units in speech is not always easy to determine, and more so, the prosodic representations that humans and automatic classifiers learn have been difficult to interpret. This work focuses on examining the acoustic prosodic representations that binary prominence classification neural networks and autoencoders learn for prominence. We investigate the complex features learned at different layers of the network as well as the 10-dimensional bottleneck features (BNFs), for the standard acoustic prosodic correlates of prominence separately and in combination. We analyze and visualize the BNFs obtained from the prominence classification neural networks as well as their network activations. The experiments are conducted on a corpus of Dutch continuous speech with manually annotated prominence labels. Our results show that the prosodic representations obtained from the BNFs and higher-dimensional non-BNFs provide good separation of the two prominence categories, with, however, different partitioning of the BNF space for the distinct features, and the best overall separation obtained for F0. en
dc.format.extent 5
dc.language.iso eng
dc.publisher ISCA
dc.relation.ispartof 20th Annual Conference of the International Speech Communication Association (INTERSPEECH 2019)
dc.relation.ispartofseries Interspeech
dc.rights unspecified
dc.rights.uri info:eu-repo/semantics/openAccess
dc.subject 6121 Languages
dc.subject 6161 Phonetics
dc.title Prosodic Representations of Prominence Classification Neural Networks and Autoencoders Using Bottleneck Features en
dc.type Conference contribution
dc.contributor.organization Department of Digital Humanities
dc.contributor.organization Phonetics and Speech Synthesis
dc.contributor.organization Phonetics
dc.contributor.organization Mind and Matter
dc.description.reviewstatus Peer reviewed
dc.relation.doi https://doi.org/10.21437/Interspeech.2019-2984
dc.relation.issn 2308-457X
dc.rights.accesslevel openAccess
dc.type.version publishedVersion

Files in this item

Total number of downloads: Loading...

Files Size Format View
2984.pdf 1.850Mb PDF View/Open

This item appears in the following Collection(s)

Show simple item record