  • Suviranta, Rosa (Helsingin yliopisto, 2021)
    This study is a preliminary study to verify how well a Conditioned Convolutional Variational Autoencoder (CCVAE) learns the prosodic characteristics of interaction between the Lombard effect and different focus conditions. Lombard speech is an adaptation to ambient noise manifested by rising vocal intensity, fundamental frequency, and duration. Focus marks new propositional information and is signalled by making the focused word more prominent in relation to others. A CCVAE was trained on the f0 contours and speech envelopes of a Lombard speech corpus of Finnish utterances. The model’s capability to reconstruct the prosodic charac- teristics was statistically evaluated based on bottleneck representations alone. The following questions were addressed: the appropriate size of the bottleneck layer for the task, the ability of the bottleneck representations to capture the prosodic characteris- tics and the encoding of the bottleneck representations. The study shows promising results. The method can elicit representations that can quantify prosodic effects of the underlying influences and interactions. The study found that even the low dimensional bottlenecks can conceptualise and consis- tently typologize the prosodic events of interest. However, finding the optimal bottleneck dimension still needs more research. Subsequently, the model’s ability to capture the prosodic characteristics was verified by investigating the generated samples. Based on the results, the CCVAE can capture prosodic events. The quality of the reconstruction is positively correlated with the bottleneck dimension. Finally, the encoding of the bottlenecks were examined. The CCVAE encodes the bottleneck representations similarly regardless of the training instance or the bottleneck dimension. The Lombard effect was most efficiently captured and focus conditions as second.
  • Kallio, Heini; Suni, Antti; Šimko, Juraj; Vainio, Martti (2020)
    Prosodic characteristics, such as lexical and phrasal stress, are one of the most challenging features for second language (L2) speakers to learn. The ability to quantify language learners' proficiency in terms of prosody can be of use to language teachers and improve the assessment of L2 speaking skills. Automatic assessment, however, requires reliable automatic analyses of prosodic features that allow for the comparison between the productions of L2 speech and reference samples. In this paper we investigate whether signal-based syllable prominence can be used to predict the prosodic competence of Finnish learners of Swedish. Syllable-level prominence was estimated for 180 L2 and 45 native (L1) utterances by a continuous wavelet transform analysis using combinations of f(0), energy, and duration. The L2 utterances were graded by four expert assessors using the revised CEFR scale for prosodic features. Correlations of prominence estimates for L2 utterances with estimates for L1 utterances and linguistic stress patterns were used as a measure of prosodic proficiency of the L2 speakers. The results show that the level of agreement conceptualized in this way correlates significantly with the assessments of expert raters, providing strong support for the use of the wavelet-based prominence estimation techniques in computer-assisted assessment of L2 speaking skills.
  • Lindstrom, R.; Lepistö-Paisley, T.; Makkonen, T.; Reinvall, O.; Nieminen-von Wendt, T.; Alen, R.; Kujala, T. (2018)
    Objective: The present study explored the processing of emotional speech prosody in school-aged children with autism spectrum disorders (ASD) but without marked language impairments (children with ASD [no LI]). Methods: The mismatch negativity (MMN)/the late discriminative negativity (LDN), reflecting pre-attentive auditory discrimination processes, and the P3a, indexing involuntary orienting to attention-catching changes, were recorded to natural word stimuli uttered with different emotional connotations (neutral, sad, scornful and commanding). Perceptual prosody discrimination was addressed with a behavioral sound-discrimination test. Results: Overall, children with ASD (no LI) were slower in behaviorally discriminating prosodic features of speech stimuli than typically developed control children. Further, smaller standard-stimulus event related potentials (ERPs) and MMN/LDNs were found in children with ASD (no LI) than in controls. In addition, the amplitude of the P3a was diminished and differentially distributed on the scalp in children with ASD (no LI) than in control children. Conclusions: Processing of words and changes in emotional speech prosody is impaired at various levels of information processing in school-aged children with ASD (no LI). Significance: The results suggest that low-level speech sound discrimination and orienting deficits might contribute to emotional speech prosody processing impairments observed in ASD. (C) 2018 International Federation of Clinical Neurophysiology. Published by Elsevier B.V. All rights reserved.
  • Lindstrom, R.; Lepisto-Paisley, T.; Vanhala, R.; Alen, R.; Kujala, T. (2016)
    Autism spectrum disorders (ASD) are characterized by deficient social and communication skills, including difficulties in perceiving speech prosody. The present study addressed processing of emotional prosodic changes (sad, scornful and commanding) in natural word stimuli in typically developed school aged children and in children with ASD and language impairment. We found that the responses to a repetitive word were diminished in amplitude in the children with ASD, reflecting impaired speech encoding. Furthermore, the amplitude of the MMN/LDN component, reflecting cortical discrimination of sound changes, was diminished in the children with ASD for the scornful deviant. In addition, the amplitude of the P3a, reflecting involuntary orienting to attention-catching changes, was diminished in the children with ASD for the scornful deviant and tended to be smaller for the sad deviant. These results suggest that prosody processing in ASD is impaired at various levels of neural processing, including deficient pre-attentive discrimination and involuntary orientation to speech prosody. (C) 2016 Elsevier Ireland Ltd. All rights reserved.
  • Torppa, Ritva; Faulkner, Andrew; Laasonen, Marja; Lipsanen, Jari; Sammler, Daniela (2020)
    Objectives: A major issue in the rehabilitation of children with cochlear implants (CIs) is unexplained variance in their language skills, where many of them lag behind children with normal hearing (NH). Here we assess links between generative language skills and the perception of prosodic stress, and with musical and parental activities in children with CIs and NH. Understanding these links is expected to guide future research and towards supporting language development in children with a CI. Method: 21 unilaterally and early-implanted children and 31 children with NH, aged 5 to 13, were classified as musically active or non-active by a questionnaire recording regularity of musical activities, in particular singing, and reading and other activities shared with parents. Perception of word and sentence stress, performance in word finding, verbal intelligence (WISC vocabulary) and phonological awareness (PA; production of rhymes) were measured in all children. Comparisons between children with a CI and NH were made against a sub-set of 21 of the children with NH who were matched to children with CIs by age, gender, socio-economic background and musical activity. Regression analyses, run separately for children with CIs and NH, assessed how much variance in each language task was shared with each of prosodic perception, the child’s own music activity, and activities with parents, including singing and reading. All statistical analyses were conducted both with and without control for age and maternal education. Results: Musically active children with CIs performed similarly to NH controls in all language tasks, while those who were not musically active performed more poorly. Only musically non-active children with CIs made more phonological and semantic errors in word finding than NH controls, and word finding correlated with other language skills. Regression analysis results for word finding and VIQ were similar for children with CIs and NH. These language skills shared considerable variance with the perception of prosodic stress and musical activities. When age and maternal education were controlled for, strong links remained between perception of prosodic stress and VIQ (shared variance: CI, 32%/NH, 16%) and between musical activities and word finding (shared variance: CI, 53%/NH, 20%). Links were always stronger for children with CIs, for whom better phonological awareness was also linked to improved stress perception and more musical activity, and parental activities altogether shared significantly variance with word finding and VIQ. Conclusions: For children with CIs and NH, better perception of prosodic stress and musical activities with singing are associated with improved generative language skills. Additionally, for children with CIs, parental singing has a stronger positive association to word finding and VIQ than parental reading. These results cannot address causality, but they suggest that good perception of prosodic stress, musical activities involving singing, and parental singing and reading may all be beneficial for word finding and other generative language skills in implanted children.
  • Zora, Hatice; Riad, Tomas; Ylinen, Sari (2019)
    Swedish morphemes are classified as prosodically specified or prosodically unspecified, depending on lexical or phonological stress, respectively. Here, we investigate the allomorphy of the suffix -(i)sk, which indicates the distinction between lexical and phonological stress; if attached to a lexically stressed morpheme, it takes a non-syllabic form (-sk), whereas if attached to a phonologically stressed morpheme, an epenthetic vowel is inserted (-isk). Using mismatch negativity (MMN), we explored the neural processing of this allomorphy across lexically stressed and phonologically stressed morphemes. In an oddball paradigm, participants were occasionally presented with congruent and incongruent derivations, created by the suffix -(i)sk, within the repetitive presentation of their monomorphemic stems. The results indicated that the congruent derivation of the lexically stressed stem elicited a larger MMN than the incongruent sequences of the same stem and the derivational suffix, whereas after the phonologically stressed stem a non-significant tendency towards an opposite pattern was observed. We argue that the significant MMN response to the congruent derivation in the lexical stress condition is in line with lexical MMN, indicating a holistic processing of the sequence of lexically stressed stem and derivational suffix. The enhanced MMN response to the incongruent derivation in the phonological stress condition, on the other hand, is suggested to reflect combinatorial processing of the sequence of phonologically stressed stem and derivational suffix. These findings bring a new aspect to the dual-system approach to neural processing of morphologically complex words, namely the specification of word stress.
  • Cole, Jennifer; Hualde, José Ignacio; Smith, Caroline L.; Eager, Christopher; Mahrt, Timothy; Napoleão de Souza, Ricardo (2019)
    This study tests the influence of acoustic cues and non-acoustic contextual factors on listeners’ perception of prominence in three languages whose prominence systems differ in the phonological patterning of prominence and in the association of prominence with information structure—English, French and Spanish. Native speakers of each language performed an auditory rating task to mark prominent words in samples of conversational speech under two instructions: with prominence defined in terms of acoustic or meaning-related criteria. Logistic regression models tested the role of task instruction, acoustic cues and non-acoustic contextual factors in predicting binary prominence ratings of individual listeners. In all three languages we find similar effects of prosodic phrase structure and acoustic cues (F0, intensity, phone-rate) on prominence ratings, and differences in the effect of word frequency and instruction. In English, where phrasal prominence is used to convey meaning related to information structure, acoustic and meaning criteria converge on very similar prominence ratings. In French and Spanish, where prominence plays a lesser role in signaling information structure, phrasal prominence is perceived more narrowly on structural and acoustic grounds. Prominence ratings from untrained listeners correspond with ToBI pitch accent labels for each language. Distinctions in ToBI pitch accent status (nuclear, prenuclear, unaccented) are reflected in empirical and model-predicted prominence ratings. In addition, words with a ToBI pitch accent type that is typically associated with contrastive focus are more likely to be rated as prominent in Spanish and English, but no such effect is found for French. These findings are discussed in relation to probabilistic models of prominence production and perception.