Browsing by Subject "SPEECH RECOGNITION"

Sort by: Order: Results:

Now showing items 1-3 of 3
  • Airaksinen, Manu; Juvela, Lauri; Alku, Paavo; Rasanen, Okko (IEEE, 2019)
    International Conference on Acoustics Speech and Signal Processing ICASSP
    This study explores various speech data augmentation methods for the task of noise-robust fundamental frequency (F0) estimation with neural networks. The explored augmentation strategies are split into additive noise and channel-based augmentation and into vocoder-based augmentation methods. In vocoder-based augmentation, a glottal vocoder is used to enhance the accuracy of ground truth F0 used for training of the neural network, as well as to expand the training data diversity in terms of F0 patterns and vocal tract lengths of the talkers. Evaluations on the PTDB-TUG corpus indicate that noise and channel augmentation can be used to greatly increase the noise robustness of trained models, and that vocoder-based ground truth enhancement further increases model performance. For smaller datasets, vocoder-based diversity augmentation can also be used to increase performance. The best-performing proposed method greatly outperformed the compared F0 estimation methods in terms of noise robustness.
  • Marsh, John E.; Ljung, Robert; Nostl, Anatole; Threadgold, Emma; Campbell, Tom A. (2015)
    A dynamic interplay is known to exist between auditory processing and human cognition. For example, prior investigations of speech-in-noise have revealed there is more to learning than just listening: Even if all words within a spoken list are correctly heard in noise, later memory for those words is typically impoverished. These investigations supported a view that there is a "gap" between the intelligibility of speech and memory for that speech. Here, the notion was that this gap between speech intelligibility and memorability is a function of the extent to which the spoken message seizes limited immediate memory resources (e.g., Kiellberg et al., 2008). Accordingly, the more difficult the processing of the spoken message, the less resources are available for elaboration, storage, and recall of that spoken material. However, it was not previously known how increasing that difficulty affected the memory processing of semantically rich spoken material. This investigation showed that noise impairs higher levels of cognitive analysis. A variant of the Deese-Roediger-McDermott procedure that encourages semantic elaborative processes was deployed. On each trial, participants listened to a 36-item list comprising 12 words blocked by each of 3 different themes. Each of those 12 words (e.g., bed, tired, snore...) was associated with a "critical" lure theme word that was not presented (e.g., sleep). Word lists were either presented without noise or at a signal-to-noise ratio of 5 decibels upon an A-weighting. Noise reduced false recall of the critical words, and decreased the semantic clustering of recall. Theoretical and practical implications are discussed.
  • Näätänen, Risto; Petersen, Bjorn; Torppa, Ritva; Lonka, Eila; Vuust, Peter (2017)
    In the present article, we review the studies on the use of the mismatch negativity (MMN) as a tool for an objective assessment of cochlear-implant (CI) functioning after its implantation and as a function of time of CI use. The MMN indexes discrimination of different sound stimuli with a precision matching with that of behavioral discrimination and can therefore be used as its objective index. Importantly, these measurements can be reliably carried out even in the absence of attention and behavioral responses and therefore they can be extended to populations that are not capable of behaviorally reporting their perception such as infants and different clinical patient groups. In infants and small children with CI, the MMN provides the only means for assessing the adequacy of the CI functioning, its improvement as a function of time of CI use, and the efficiency of different rehabilitation procedures. Therefore, the MMN can also be used as a tool in developing and testing different novel rehabilitation procedures. Importantly, the recently developed multi-feature MMN paradigms permit the objective assessment of discrimination accuracy for all the different auditory dimensions (such as frequency, intensity, and duration) in a short recording time of about 30 min. Most recently, such stimulus paradigms have been successfully developed for an objective assessment of music perception, too. (C) 2017 Elsevier B.V. All rights reserved.