DATA AUGMENTATION STRATEGIES FOR NEURAL NETWORK F0 ESTIMATION

Show simple item record

dc.contributor.author Airaksinen, Manu
dc.contributor.author Juvela, Lauri
dc.contributor.author Alku, Paavo
dc.contributor.author Rasanen, Okko
dc.date.accessioned 2019-10-29T12:00:01Z
dc.date.available 2019-10-29T12:00:01Z
dc.date.issued 2019
dc.identifier.citation Airaksinen , M , Juvela , L , Alku , P & Rasanen , O 2019 , DATA AUGMENTATION STRATEGIES FOR NEURAL NETWORK F0 ESTIMATION . in 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) . International Conference on Acoustics Speech and Signal Processing ICASSP , IEEE , pp. 6485-6489 , 44th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , Brighton , 12/05/2019 . https://doi.org/10.1109/icassp.2019.8683041
dc.identifier.citation conference
dc.identifier.other PURE: 127724954
dc.identifier.other PURE UUID: 74e050df-48ea-4490-a6df-a21d3e835ea4
dc.identifier.other WOS: 000482554006143
dc.identifier.uri http://hdl.handle.net/10138/306497
dc.description.abstract This study explores various speech data augmentation methods for the task of noise-robust fundamental frequency (F0) estimation with neural networks. The explored augmentation strategies are split into additive noise and channel-based augmentation and into vocoder-based augmentation methods. In vocoder-based augmentation, a glottal vocoder is used to enhance the accuracy of ground truth F0 used for training of the neural network, as well as to expand the training data diversity in terms of F0 patterns and vocal tract lengths of the talkers. Evaluations on the PTDB-TUG corpus indicate that noise and channel augmentation can be used to greatly increase the noise robustness of trained models, and that vocoder-based ground truth enhancement further increases model performance. For smaller datasets, vocoder-based diversity augmentation can also be used to increase performance. The best-performing proposed method greatly outperformed the compared F0 estimation methods in terms of noise robustness. en
dc.format.extent 5
dc.language.iso eng
dc.publisher IEEE
dc.relation.ispartof 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)
dc.relation.ispartofseries International Conference on Acoustics Speech and Signal Processing ICASSP
dc.relation.isversionof 978-1-4799-8131-1
dc.rights.uri info:eu-repo/semantics/openAccess
dc.subject Speech analysis
dc.subject F0 estimation
dc.subject noise robustness
dc.subject data augmentation
dc.subject deep learning
dc.subject SPEECH RECOGNITION
dc.subject 3124 Neurology and psychiatry
dc.subject 6121 Languages
dc.title DATA AUGMENTATION STRATEGIES FOR NEURAL NETWORK F0 ESTIMATION en
dc.type Conference contribution
dc.contributor.organization Department of Neurosciences
dc.contributor.organization Kliinisen neurofysiologian yksikkö
dc.contributor.organization University of Helsinki
dc.contributor.organization HUS Neurocenter
dc.description.reviewstatus Peer reviewed
dc.relation.doi https://doi.org/10.1109/icassp.2019.8683041
dc.relation.issn 1520-6149
dc.rights.accesslevel openAccess
dc.type.version publishedVersion
dc.identifier.url https://ieeexplore.ieee.org/search/searchresult.jsp?newsearch=true&queryText=DATA%20AUGMENTATION%20STRATEGIES%20FOR%20NEURAL%20NETWORK%20F0%20ESTIMATION

Files in this item

Total number of downloads: Loading...

Files Size Format View
untitled.pdf 263.6Kb PDF View/Open

This item appears in the following Collection(s)

Show simple item record