DATA AUGMENTATION STRATEGIES FOR NEURAL NETWORK F0 ESTIMATION

Näytä kaikki kuvailutiedot



Pysyväisosoite

http://hdl.handle.net/10138/306497

Lähdeviite

Airaksinen , M , Juvela , L , Alku , P & Rasanen , O 2019 , DATA AUGMENTATION STRATEGIES FOR NEURAL NETWORK F0 ESTIMATION . in 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) . International Conference on Acoustics Speech and Signal Processing ICASSP , IEEE , pp. 6485-6489 , 44th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , Brighton , 12/05/2019 . https://doi.org/10.1109/icassp.2019.8683041

Julkaisun nimi: DATA AUGMENTATION STRATEGIES FOR NEURAL NETWORK F0 ESTIMATION
Tekijä: Airaksinen, Manu; Juvela, Lauri; Alku, Paavo; Rasanen, Okko
Tekijän organisaatio: Department of Neurosciences
Kliinisen neurofysiologian yksikkö
University of Helsinki
HUS Neurocenter
Julkaisija: IEEE
Päiväys: 2019
Kieli: eng
Sivumäärä: 5
Kuuluu julkaisusarjaan: 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)
Kuuluu julkaisusarjaan: International Conference on Acoustics Speech and Signal Processing ICASSP
ISBN: 978-1-4799-8131-1
ISSN: 1520-6149
DOI-tunniste: https://doi.org/10.1109/icassp.2019.8683041
URI: http://hdl.handle.net/10138/306497
Tiivistelmä: This study explores various speech data augmentation methods for the task of noise-robust fundamental frequency (F0) estimation with neural networks. The explored augmentation strategies are split into additive noise and channel-based augmentation and into vocoder-based augmentation methods. In vocoder-based augmentation, a glottal vocoder is used to enhance the accuracy of ground truth F0 used for training of the neural network, as well as to expand the training data diversity in terms of F0 patterns and vocal tract lengths of the talkers. Evaluations on the PTDB-TUG corpus indicate that noise and channel augmentation can be used to greatly increase the noise robustness of trained models, and that vocoder-based ground truth enhancement further increases model performance. For smaller datasets, vocoder-based diversity augmentation can also be used to increase performance. The best-performing proposed method greatly outperformed the compared F0 estimation methods in terms of noise robustness.
Avainsanat: Speech analysis
F0 estimation
noise robustness
data augmentation
deep learning
SPEECH RECOGNITION
3124 Neurology and psychiatry
6121 Languages
Vertaisarvioitu: Kyllä
Pääsyrajoitteet: openAccess
Rinnakkaistallennettu versio: publishedVersion


Tiedostot

Latausmäärä yhteensä: Ladataan...

Tiedosto(t) Koko Formaatti Näytä
untitled.pdf 263.6KB PDF Avaa tiedosto

Viite kuuluu kokoelmiin:

Näytä kaikki kuvailutiedot