Browsing by Subject "6161 Phonetics"

Sort by: Order: Results:

Now showing items 1-20 of 56
  • Šimko, Juraj; Adigwe, Adaeze; Suni, Antti; Vainio, Martti (ISCA - International Speech Communication Association, 2022)
    Speech prosody
    Prosodic patterns—and linguistic structures in general— are hierarchical in nature, providing for efficient means for en- coding information in temporally constrained situations where communicative events occur. However, there are no theoreti- cal frameworks that are capable of representing the full extent of linguistic behaviour in a cohesive way that could capture the paradigmatic and syntagmatic links between the organizational levels present in everyday speech. Here we propose a novel theoretical and modelling account of perception and production of prosodic patterns in speech communication, derived from the influential Predictive Processing theory of neural implementation of perception and action based on a hierarchical system of generative models producing progressively more detailed probabilistic predictions of future events. The framework provides a conceptualization of the hierarchical organization of speech prosody as well as a principled way of unifying speech perception and production by postulating a single processing hierarchy shared by both modalities. We discuss the possible implications of the theory for prosodic analysis of speech communication, including conversational setting. In addition, we outline a viable computational implementation in the form of a machine learning architecture that can be used as a testbed for generating and evaluating predictions brought forth by the theory.
  • Hiovain, Katri; Jokinen, Päivi Kristiina (European Language Resources Association (ELRA), 2016)
  • Šimko, Juraj; Vainio, Martti; Suni, Antti (ISCA, 2020)
    Speech prosody
    We present a novel methodology for speech prosody research based on the analysis of embeddings used to condition a convolutional WaveNet speech synthesis system. The methodology is evaluated using a corpus of Lombard speech, pre-processed in order to preserve only prosodic characteristics of the original recordings. The conditioning embeddings are trained to represent the combined influences of three sources of prosodic variation present in the corpus: the level and type of ambient noise, and the sentence focus type. We show that the resulting representations can be used to quantify the prosodic effects of the underlying influences, as well as interactions among them, in a statistically robust way. Comparing the results of our analysis with the results of a more traditional examination indicates that the presented methodology can be used as an alternative method of phonetic analysis of prosodic phenomena.
  • Kallio, Heini; Suni, Antti; Šimko, Juraj; Vainio, Martti (2020)
    Prosodic characteristics, such as lexical and phrasal stress, are one of the most challenging features for second language (L2) speakers to learn. The ability to quantify language learners' proficiency in terms of prosody can be of use to language teachers and improve the assessment of L2 speaking skills. Automatic assessment, however, requires reliable automatic analyses of prosodic features that allow for the comparison between the productions of L2 speech and reference samples. In this paper we investigate whether signal-based syllable prominence can be used to predict the prosodic competence of Finnish learners of Swedish. Syllable-level prominence was estimated for 180 L2 and 45 native (L1) utterances by a continuous wavelet transform analysis using combinations of f(0), energy, and duration. The L2 utterances were graded by four expert assessors using the revised CEFR scale for prosodic features. Correlations of prominence estimates for L2 utterances with estimates for L1 utterances and linguistic stress patterns were used as a measure of prosodic proficiency of the L2 speakers. The results show that the level of agreement conceptualized in this way correlates significantly with the assessments of expert raters, providing strong support for the use of the wavelet-based prominence estimation techniques in computer-assisted assessment of L2 speaking skills.
  • Suni, Antti Santeri; Simko, Juraj; Vainio, Martti Tapani (International Speech Communications Association, 2016)
    Speech prosody
  • Włodarczak, Marcin; Simko, Juraj; Suni, Antti Santeri; Vainio, Martti Tapani (International Speech Communications Association, 2018)
  • Simko, Juraj; Suni, Antti; Hiovain, Katri; Vainio, Martti (ISCA, 2017)
  • Lennes, Mietta; Stevanovic, Melisa; Aalto, Daniel; Palo, Pertti (2015)
    Pitch analysis tools are used widely in order to measure and to visualize the melodic aspects of speech. The resulting pitch contours can serve various research interests linked with speech prosody, such as intonational phonology, interaction in conversation, emotion analysis, language learning and singing. Due to physiological differences and individual habits, speakers tend to differ in their typical pitch ranges. As a consequence, pitch analysis results are not always easy to interpret and to compare among speakers. In this study, we use the Praat program (Boersma & Weenink 2015) for analyzing pitch in samples of conversational Finnish speech and we use the R statistical programming environment (R Core Team, 2014) for further analysis and visualization. We first describe the general shapes of the speaker-specific pitch distributions and see whether and how the distributions vary between individuals. A bootstrapping method is applied to discover the minimal amount of speech that is necessary in order to reliably determine the pitch mean, median and mode for an individual speaker. The scripts and code written for the Praat program and for the R statistical programming environment are made available under an open license for experimenting with other speech samples. The datasets produced with the Praat script will also be made available for further studies.
  • Vainio, Lari; Tiainen, Mikko; Tiippana, Kaisa; Vainio, Martti (2019)
    It has been shown recently that when participants are required to pronounce a vowel at the same time with the hand movement, the vocal and manual responses are facilitated when a front vowel is produced with forward-directed hand movements and a back vowel is produced with backward-directed hand movements. This finding suggests a coupling between spatial programing of articulatory tongue movements and hand movements. The present study revealed that the same effect can be also observed in relation to directional leg movements. The study suggests that the effect operates within the common directional processes of movement planning including at least tongue, hand and leg movements, and these processes might contribute sound-to-meaning mappings to the semantic concepts of 'forward' and 'backward'.
  • Tiainen, Mikko; Lukavsky, Jiri; Tiippana, Kaisa; Vainio, Martti; Šimko, Juraj; Felisberti, Fatima; Vainio, Lari (2017)
    We have recently shown in Finnish speakers that articulation of certain vowels and consonants has a systematic influence on simultaneous grasp actions as well as on forward and backward hand movements. Here we studied whether these effects generalize to another language, namely Czech. We reasoned that if the results generalized to another language environment, it would suggest that the effects arise through other processes than language-dependent semantic associations. Rather, the effects would be likely to arise through language-independent interactions between processes that plan articulatory gestures and hand movements. Participants were presented with visual stimuli specifying articulations to be uttered (e.g., A or I), and they were required to produce a manual response concurrently with the articulation. In Experiment 1 they responded with a precision or a power grip, whereas in Experiment 2 they responded with a forward or a backward hand movement. The grip congruency effect was fully replicated: the consonant [k] and the vowel [alpha] were associated with power grip responses, while the consonant [t] and the vowel [i] were associated with precision grip responses. The forward/backward congruency effect was replicated with vowels [alpha], [o], which were associated with backward movement and with [ i], which was associated with forward movement, but not with consonants [k] and [ t]. These findings suggest that the congruency effects mostly reflect interaction between processes that plan articulatory gestures and hand movements with an exception that the forward/backward congruency effect might only work with vowel articulation.
  • Türk, Helen; Lippus, Pärtel; Simko, Juraj (2017)
    The three-way quantity system is a well-known phonological feature of Estonian. In a number of studies it has been shown that quantity is realized in a disyllabic foot by the stressed-to-unstressed syllable rhyme duration ratio and also by pitch movement as the secondary cue. The stressed syllable rhyme duration is achieved by combining the length of the vowel and the coda consonant, which enables minimal septets of CVCV-sequences based on segmental duration. In this study we analyze articulatory (EMA) recordings from four native Estonian speakers producing all possible quantity combinations of intervocalic bilabial stops in two vocalic contexts (/alpha-i/ vs. /i-alpha/). The analysis shows that kinematic characteristics (gesture duration, spatial extent, and peak velocity) are primarily affected by quantity on the segmental level: Phonologically longer segments are produced by longer and larger lip closing gestures and, in reverse, shorter and smaller lip opening movements. Tongue transition gesture is consistently lengthened and slowed down by increasing consonant quantity. In general, both kinematic characteristics and intergestural coordination are influenced by non-linear interactions between segmental quantity levels as well as vocalic context.
  • Vainio, Martti; Suni, Antti; Aalto, Daniel (LPL - Laboratoire Parole et Langage, 2013)
    Wavelet based time frequency representations of various signals are shown to reliably represent perceptually relevant patterns at various spatial and temporal scales in a noise robust way. Here we present a wavelet based visualization and analysis tool for prosodic patterns, in particular intonation. The suitability of the method is assessed by comparing its predictions for word prominences against manual labels in a corpus of 900 sentences. In addition, the method’s potential for visualization is demonstrated by a few example sentences which are compared to more traditional visualization methods. Finally, some further applications are suggested and the limitations of the method are discussed.
  • Kallio, Heini; Suviranta, Rosa; Kuronen, Mikko; von Zansen, Anna (ISCA - International Speech Communication Association, 2022)
    Speech prosody
    While utterance fluency measures are often studied in relation to perceived L2 fluency and proficiency, the effect of creaky voice remains ignored. However, creaky voice is frequent in a number of languages, including Finnish, where it serves as a cue for phrase-boundaries and turn-taking. In this study we investigate the roles of creaky voice and utterance fluency measures in predicting fluency and proficiency ratings of spontaneous L2 Finnish (F2) speech. In so doing, 16 expert raters participated in assessing narrative spontaneous speech samples from 160 learners of Finnish. The effect of creaky voice and utterance fluency measures on proficiency and fluency ratings was studied using linear regression models. The results indicate that creaky voice can contribute to both oral proficiency and fluency alongside utterance fluency measures. Furthermore, average duration of composite breaks – a measure combining breakdown and repair phenomena – proved to be the most significant predictor of fluency. Based on these findings we recommend further investigation of the effect of creaky voice to the assessment of L2 speech as well as reconsideration of the utterance fluency measures used in predicting L2 fluency or proficiency
  • Kakouros, Sofoklis; Hiovain, Katri; Vainio, Martti; Šimko, Juraj (ISCA, 2020)
    Speech prosody
    This work explores the application of various supervised classification approaches using prosodic information for the identification of spoken North Sámi language varieties. Dialects are language varieties that enclose characteristics specific for a given region or community. These characteristics reflect segmental and suprasegmental (prosodic) differences but also high-level properties such as lexical and morphosyntactic. One aspect that is of particular interest and that has not been studied extensively is how the differences in prosody may underpin the potential differences among different dialects. To address this, this work focuses on investigating the standard acoustic prosodic features of energy, fundamental frequency, spectral tilt, duration, and their combinations, using sequential and context-independent supervised classification methods, and evaluated separately over two different units in speech: words and syllables. The primary aim of this work is to gain a better understanding on the role of prosody in identifying among the different language varieties. Our results show that prosodic information holds an important role in distinguishing between the five areal varieties of North Sámi where the inclusion of contextual information for all acoustic prosodic features is critical for the identification of dialects for words and syllables.
  • Hiovain, Katri; Šimko, Juraj; Vainio, Martti (2020)
    Ternary length contrast is a rare phonological feature, investigated here both in terms of its realization and possible undergoing changes. In North Sami, a phonetically under-documented and endangered Fenno-Ugric language spoken by indigenous people in Northern Europe, the ternary quantity contrast is assumed to be signalled by a progressive lengthening of a consonant and a compensatory shortening of the previous vowel. This study evaluates this assumption and compares the realization of the length contrasts in two dialects, the Western and Eastern Finnmark North Sami. The results show that while the contrast between the short and the two longer quantities is robustly signaled regardless of the dialect, the durational differences between the two longer quantities are maintained only in the Eastern dialect. On the other hand, a vowel quantity contrast independent of the quantity of the following consonant is present in the Western but not in the Eastern dialect. Further, comparing the phonetic realization of the ternary quantity contrast for speakers of different ages presents evidence of a language change: the results indicate an ongoing neutralization of the ternary contrast in younger speakers, which points to a possible disappearance of this rare typological feature in Finnmark North Sami.
  • Hiovain, Katri; Šimko, Juraj (Australasian Speech Science and Technology Association Inc., 2019)
  • Vainio, Lari; Schulman, Mirjam; Tiippana, Kaisa; Vainio, Martti (2013)
  • Šimko, Juraj; O'Dell, Michael; Vainio, Martti (2014)
    Embodied Task Dynamics is a modeling platform combining task dynamical implementation of articulatory phonology with an optimization approach based on adjustable trade-offs between production efficiency and perception efficacy. Within this platform we model a consonantal quantity contrast in bilabial stops as emerging from local adjustment of demands on relative prominence of the consonantal gesture conceptualized in terms of closure duration. The contrast is manifested in the form of two distinct, stable inter-gestural coordination patterns characterized by quantitative differences in relative phasing between the consonant and the coproduced vocalic gesture. Furthermore, the model generates a set of qualitative predictions regarding dependence of kinematic characteristics and inter-gestural coordination on consonant quantity and gestural context. To evaluate these predictions, we collected articulatory data for Finnish speakers uttering singletons and geminates in the same context as explored by the model. Statistical analysis of the data shows strong agreement with model predictions. This result provides support for the hypothesis that speech articulation is guided by efficiency principles that underlie many other types of embodied skilled action.
  • Formants 
    Aalto, Daniel; Malinen, Jarmo; Vainio, Martti Tapani (Oxford University Press, 2018)
    Oxford Research Encyclopedias