Browsing by Author "Aalto, Daniel"

Sort by: Order: Results:

Now showing items 1-6 of 6
  • Vainio, Martti; Suni, Antti; Aalto, Daniel (2013)
    Wavelet based time frequency representations of various signals are shown to reliably represent perceptually relevant patterns at various spatial and temporal scales in a noise robust way. Here we present a wavelet based visualization and analysis tool for prosodic patterns, in particular intonation. The suitability of the method is assessed by comparing its predictions for word prominences against manual labels in a corpus of 900 sentences. In addition, the method’s potential for visualization is demonstrated by a few example sentences which are compared to more traditional visualization methods. Finally, some further applications are suggested and the limitations of the method are discussed.
  • Aalto, Daniel; Huhtala, Antti; Kivelä, Atle; Malinen, Jarmo; Palo, Pertti; Saunavaara, Jani; Vainio, Martti (Cornell University, 2012)
    We compare numerically computed resonances of the human vocal tract with formants that have been extracted from speech during vowel pronunciation. The geometry of the vocal tract has been obtained by MRI from a male subject, and the corresponding speech has been recorded simultaneously. The resonances are computed by solving the Helmholtz partial differential equation with the Finite Element Method (FEM). Despite a rudimentary exterior space acoustics model, i.e., the Dirichlet boundary condition at the mouth opening, the computed resonance structure differs from the measured formant structure by $\approx$ 0.7 semitones for [i] and [u] having small mouth opening area, and by $\approx$ 3 semitones for vowels [a] and [ae] that have a larger mouth opening. The contribution of the possibly open velar port has not been taken into considaration at all which adds the discrepancy for [a] in the present data set. We conclude that by improving the exterior space model and properly treating the velar port opening, it is possible to computationally attain four lowest vowel formants with an error less than a semitone. The corresponding wave equation model on MRI-produced vocal tract geometries is expected to have a comparabale accuracy.
  • Aalto, Daniel; Simko, Juraj; Vainio, Martti (2013)
    The fundamental frequency of a complex sound modulates the perceived duration of a sound. Higher pitch sounds are perceived longer compared to lower pitch sounds as shown by several independent studies since 1973. In this paper, the effect of language background is studied: native speakers of Finnish and German participated in a two alternative forced choice duration discrimination experiment where the duration and frequency of two sounds are randomly varied. The overall duration discrimination sensitivity was similar to both groups but the speakers of Finnish were influenced more by the pitch in their judgements. In addition, the difference in the two sounds’ pitch period explained the response data better than the difference in pitch frequencies or the pitch interval. As the Finnish quantity system is known to employ both duration and pitch cues, the present results suggest that the speakers are shaped by the language environment even when the task is purely non-linguistic.
  • Vainio, Martti; Järvikivi, Juhani; Aalto, Daniel; Suni, Antti (American Institute of Physics for the Acoustical Society of America, 2010)
    Many languages exploit suprasegmental devices in signaling word meaning. Tone languages exploit fundamental frequency whereas quantity languages rely on segmental durations to distinguish otherwise similar words. Traditionally, duration and tone have been taken as mutually exclusive. However, some evidence suggests that, in addition to durational cues, phonological quantity is associated with and co-signaled by changes in fundamental frequency in quantity languages such as Finnish, Estonian, and Serbo-Croat. The results from the present experiment show that the structure of disyllabic word stems in Finnish are indeed signaled tonally and that the phonological length of the stressed syllable is further tonally distinguished within the disyllabic sequence. The results further indicate that the observed association of tone and duration in perception is systematically exploited in speech production in Finnish.
  • Vainio, Martti; Aalto, Daniel; Järvikivi, Juhani; Suni, Antti (2006)
    This paper presents results from a study on the tonal aspects of quantity in Finnish lexically stressed syllables. Fourteen speakers produced a set of 66 utterances where the quantity and structure of the lexically stressed syllable was system- atically varied. The tonal aspects of the syllable nucleus and nucleus and coda in case of closed syllables was stud- ied in the framework of the Target Approximation theory as formulated by Yi Xu. The results show a clear tendency to- wards the quantity distinction and bimoracity in general in Finnish to be signalled tonally by a dynamic falling tone as opposed to a static high tone in short (one mora) nuclei.
  • Suni, Antti Santeri; Aalto, Daniel; Raitio, Tuomo; Alku, Paavo; Vainio, Martti (2013)
    The pitch contour in speech contains information about different linguistic units at several distinct temporal scales. At the finest level, the microprosodic cues are purely segmental in nature, whereas in the coarser time scales, lexical tones, word accents, and phrase accents appear with both linguistic and paralinguistic functions. Consequently, the pitch movements happen on different temporal scales: the segmental perturbations are faster than typical pitch accents and so forth. In HMMbased speech synthesis paradigm, slower intonation patterns are not easy to model. The statistical procedure of decision tree clustering highlights instances that are more common, resulting in good reproduction of microprosody and declination, but with less variation on word and phrase level compared to human speech. Here we present a system that uses wavelets to decompose the pitch contour into five temporal scales ranging from microprosody to the utterance level. Each component is then individually trained within HMM framework and used in a superpositional manner at the synthesis stage. The resulting system is compared to a baseline where only one decision tree is trained to generate the pitch contour.