Browsing by Subject "6161 Phonetics"

Sort by: Order: Results:

Now showing items 1-20 of 48
  • Hiovain, Katri; Jokinen, Päivi Kristiina (European Language Resources Association (ELRA), 2016)
  • Šimko, Juraj; Vainio, Martti; Suni, Antti (ISCA, 2020)
    Speech prosody
    We present a novel methodology for speech prosody research based on the analysis of embeddings used to condition a convolutional WaveNet speech synthesis system. The methodology is evaluated using a corpus of Lombard speech, pre-processed in order to preserve only prosodic characteristics of the original recordings. The conditioning embeddings are trained to represent the combined influences of three sources of prosodic variation present in the corpus: the level and type of ambient noise, and the sentence focus type. We show that the resulting representations can be used to quantify the prosodic effects of the underlying influences, as well as interactions among them, in a statistically robust way. Comparing the results of our analysis with the results of a more traditional examination indicates that the presented methodology can be used as an alternative method of phonetic analysis of prosodic phenomena.
  • Kallio, Heini; Suni, Antti; Šimko, Juraj; Vainio, Martti (2020)
    Prosodic characteristics, such as lexical and phrasal stress, are one of the most challenging features for second language (L2) speakers to learn. The ability to quantify language learners' proficiency in terms of prosody can be of use to language teachers and improve the assessment of L2 speaking skills. Automatic assessment, however, requires reliable automatic analyses of prosodic features that allow for the comparison between the productions of L2 speech and reference samples. In this paper we investigate whether signal-based syllable prominence can be used to predict the prosodic competence of Finnish learners of Swedish. Syllable-level prominence was estimated for 180 L2 and 45 native (L1) utterances by a continuous wavelet transform analysis using combinations of f(0), energy, and duration. The L2 utterances were graded by four expert assessors using the revised CEFR scale for prosodic features. Correlations of prominence estimates for L2 utterances with estimates for L1 utterances and linguistic stress patterns were used as a measure of prosodic proficiency of the L2 speakers. The results show that the level of agreement conceptualized in this way correlates significantly with the assessments of expert raters, providing strong support for the use of the wavelet-based prominence estimation techniques in computer-assisted assessment of L2 speaking skills.
  • Suni, Antti Santeri; Simko, Juraj; Vainio, Martti Tapani (International Speech Communications Association, 2016)
    Speech prosody
  • Włodarczak, Marcin; Simko, Juraj; Suni, Antti Santeri; Vainio, Martti Tapani (International Speech Communications Association, 2018)
    SProSIG
  • Simko, Juraj; Suni, Antti; Hiovain, Katri; Vainio, Martti (ISCA, 2017)
    Interspeech
  • Vainio, Lari; Tiainen, Mikko; Tiippana, Kaisa; Vainio, Martti (2019)
    It has been shown recently that when participants are required to pronounce a vowel at the same time with the hand movement, the vocal and manual responses are facilitated when a front vowel is produced with forward-directed hand movements and a back vowel is produced with backward-directed hand movements. This finding suggests a coupling between spatial programing of articulatory tongue movements and hand movements. The present study revealed that the same effect can be also observed in relation to directional leg movements. The study suggests that the effect operates within the common directional processes of movement planning including at least tongue, hand and leg movements, and these processes might contribute sound-to-meaning mappings to the semantic concepts of 'forward' and 'backward'.
  • Tiainen, Mikko; Lukavsky, Jiri; Tiippana, Kaisa; Vainio, Martti; Šimko, Juraj; Felisberti, Fatima; Vainio, Lari (2017)
    We have recently shown in Finnish speakers that articulation of certain vowels and consonants has a systematic influence on simultaneous grasp actions as well as on forward and backward hand movements. Here we studied whether these effects generalize to another language, namely Czech. We reasoned that if the results generalized to another language environment, it would suggest that the effects arise through other processes than language-dependent semantic associations. Rather, the effects would be likely to arise through language-independent interactions between processes that plan articulatory gestures and hand movements. Participants were presented with visual stimuli specifying articulations to be uttered (e.g., A or I), and they were required to produce a manual response concurrently with the articulation. In Experiment 1 they responded with a precision or a power grip, whereas in Experiment 2 they responded with a forward or a backward hand movement. The grip congruency effect was fully replicated: the consonant [k] and the vowel [alpha] were associated with power grip responses, while the consonant [t] and the vowel [i] were associated with precision grip responses. The forward/backward congruency effect was replicated with vowels [alpha], [o], which were associated with backward movement and with [ i], which was associated with forward movement, but not with consonants [k] and [ t]. These findings suggest that the congruency effects mostly reflect interaction between processes that plan articulatory gestures and hand movements with an exception that the forward/backward congruency effect might only work with vowel articulation.
  • Türk, Helen; Lippus, Pärtel; Simko, Juraj (2017)
    The three-way quantity system is a well-known phonological feature of Estonian. In a number of studies it has been shown that quantity is realized in a disyllabic foot by the stressed-to-unstressed syllable rhyme duration ratio and also by pitch movement as the secondary cue. The stressed syllable rhyme duration is achieved by combining the length of the vowel and the coda consonant, which enables minimal septets of CVCV-sequences based on segmental duration. In this study we analyze articulatory (EMA) recordings from four native Estonian speakers producing all possible quantity combinations of intervocalic bilabial stops in two vocalic contexts (/alpha-i/ vs. /i-alpha/). The analysis shows that kinematic characteristics (gesture duration, spatial extent, and peak velocity) are primarily affected by quantity on the segmental level: Phonologically longer segments are produced by longer and larger lip closing gestures and, in reverse, shorter and smaller lip opening movements. Tongue transition gesture is consistently lengthened and slowed down by increasing consonant quantity. In general, both kinematic characteristics and intergestural coordination are influenced by non-linear interactions between segmental quantity levels as well as vocalic context.
  • Vainio, Martti; Suni, Antti; Aalto, Daniel (LPL - Laboratoire Parole et Langage, 2013)
    Wavelet based time frequency representations of various signals are shown to reliably represent perceptually relevant patterns at various spatial and temporal scales in a noise robust way. Here we present a wavelet based visualization and analysis tool for prosodic patterns, in particular intonation. The suitability of the method is assessed by comparing its predictions for word prominences against manual labels in a corpus of 900 sentences. In addition, the method’s potential for visualization is demonstrated by a few example sentences which are compared to more traditional visualization methods. Finally, some further applications are suggested and the limitations of the method are discussed.
  • Kakouros, Sofoklis; Hiovain, Katri; Vainio, Martti; Šimko, Juraj (ISCA, 2020)
    Speech prosody
    This work explores the application of various supervised classification approaches using prosodic information for the identification of spoken North Sámi language varieties. Dialects are language varieties that enclose characteristics specific for a given region or community. These characteristics reflect segmental and suprasegmental (prosodic) differences but also high-level properties such as lexical and morphosyntactic. One aspect that is of particular interest and that has not been studied extensively is how the differences in prosody may underpin the potential differences among different dialects. To address this, this work focuses on investigating the standard acoustic prosodic features of energy, fundamental frequency, spectral tilt, duration, and their combinations, using sequential and context-independent supervised classification methods, and evaluated separately over two different units in speech: words and syllables. The primary aim of this work is to gain a better understanding on the role of prosody in identifying among the different language varieties. Our results show that prosodic information holds an important role in distinguishing between the five areal varieties of North Sámi where the inclusion of contextual information for all acoustic prosodic features is critical for the identification of dialects for words and syllables.
  • Hiovain, Katri; Šimko, Juraj; Vainio, Martti (2020)
    Ternary length contrast is a rare phonological feature, investigated here both in terms of its realization and possible undergoing changes. In North Sami, a phonetically under-documented and endangered Fenno-Ugric language spoken by indigenous people in Northern Europe, the ternary quantity contrast is assumed to be signalled by a progressive lengthening of a consonant and a compensatory shortening of the previous vowel. This study evaluates this assumption and compares the realization of the length contrasts in two dialects, the Western and Eastern Finnmark North Sami. The results show that while the contrast between the short and the two longer quantities is robustly signaled regardless of the dialect, the durational differences between the two longer quantities are maintained only in the Eastern dialect. On the other hand, a vowel quantity contrast independent of the quantity of the following consonant is present in the Western but not in the Eastern dialect. Further, comparing the phonetic realization of the ternary quantity contrast for speakers of different ages presents evidence of a language change: the results indicate an ongoing neutralization of the ternary contrast in younger speakers, which points to a possible disappearance of this rare typological feature in Finnmark North Sami.
  • Hiovain, Katri; Šimko, Juraj (Australasian Speech Science and Technology Association Inc., 2019)
  • Vainio, Lari; Schulman, Mirjam; Tiippana, Kaisa; Vainio, Martti (2013)
  • Šimko, Juraj; O'Dell, Michael; Vainio, Martti (2014)
    Embodied Task Dynamics is a modeling platform combining task dynamical implementation of articulatory phonology with an optimization approach based on adjustable trade-offs between production efficiency and perception efficacy. Within this platform we model a consonantal quantity contrast in bilabial stops as emerging from local adjustment of demands on relative prominence of the consonantal gesture conceptualized in terms of closure duration. The contrast is manifested in the form of two distinct, stable inter-gestural coordination patterns characterized by quantitative differences in relative phasing between the consonant and the coproduced vocalic gesture. Furthermore, the model generates a set of qualitative predictions regarding dependence of kinematic characteristics and inter-gestural coordination on consonant quantity and gestural context. To evaluate these predictions, we collected articulatory data for Finnish speakers uttering singletons and geminates in the same context as explored by the model. Statistical analysis of the data shows strong agreement with model predictions. This result provides support for the hypothesis that speech articulation is guided by efficiency principles that underlie many other types of embodied skilled action.
  • Formants 
    Aalto, Daniel; Malinen, Jarmo; Vainio, Martti Tapani (Oxford University Press, 2018)
    Oxford Research Encyclopedias
  • Suni, Antti; Simko, Juraj; Aalto, Daniel; Vainio, Martti (2017)
    Prominences and boundaries are the essential constituents of prosodic struc- ture in speech. They provide for means to chunk the speech stream into linguis- tically relevant units by providing them with relative saliences and demarcating them within utterance structures. Prominences and boundaries have both been widely used in both basic research on prosody as well as in text-to-speech syn- thesis. However, there are no representation schemes that would provide for both estimating and modelling them in a unified fashion. Here we present an unsupervised unified account for estimating and representing prosodic promi- nences and boundaries using a scale-space analysis based on continuous wavelet transform. The methods are evaluated and compared to earlier work using the Boston University Radio News corpus. The results show that the proposed method is comparable with the best published supervised annotation methods.
  • Aalto, Daniel; Huhtala, Antti; Kivelä, Atle; Malinen, Jarmo; Palo, Pertti; Saunavaara, Jani; Vainio, Martti (2012)
    We compare numerically computed resonances of the human vocal tract with formants that have been extracted from speech during vowel pronunciation. The geometry of the vocal tract has been obtained by MRI from a male subject, and the corresponding speech has been recorded simultaneously. The resonances are computed by solving the Helmholtz partial differential equation with the Finite Element Method (FEM). Despite a rudimentary exterior space acoustics model, i.e., the Dirichlet boundary condition at the mouth opening, the computed resonance structure differs from the measured formant structure by $\approx$ 0.7 semitones for [i] and [u] having small mouth opening area, and by $\approx$ 3 semitones for vowels [a] and [ae] that have a larger mouth opening. The contribution of the possibly open velar port has not been taken into considaration at all which adds the discrepancy for [a] in the present data set. We conclude that by improving the exterior space model and properly treating the velar port opening, it is possible to computationally attain four lowest vowel formants with an error less than a semitone. The corresponding wave equation model on MRI-produced vocal tract geometries is expected to have a comparabale accuracy.
  • Tiainen, Mikko; Tiippana, Kaisa; Vainio, Martti; Komeilipoor, Naeem; Vainio, Lari (2017)
  • Aalto, Daniel; Simko, Juraj; Vainio, Martti (ISCA, 2013)
    The fundamental frequency of a complex sound modulates the perceived duration of a sound. Higher pitch sounds are perceived longer compared to lower pitch sounds as shown by several independent studies since 1973. In this paper, the effect of language background is studied: native speakers of Finnish and German participated in a two alternative forced choice duration discrimination experiment where the duration and frequency of two sounds are randomly varied. The overall duration discrimination sensitivity was similar to both groups but the speakers of Finnish were influenced more by the pitch in their judgements. In addition, the difference in the two sounds’ pitch period explained the response data better than the difference in pitch frequencies or the pitch interval. As the Finnish quantity system is known to employ both duration and pitch cues, the present results suggest that the speakers are shaped by the language environment even when the task is purely non-linguistic.