Browsing by Subject "Fonetik"

Sort by: Order: Results:

Now showing items 1-11 of 11
  • Suviranta, Rosa (Helsingin yliopisto, 2021)
    This study is a preliminary study to verify how well a Conditioned Convolutional Variational Autoencoder (CCVAE) learns the prosodic characteristics of interaction between the Lombard effect and different focus conditions. Lombard speech is an adaptation to ambient noise manifested by rising vocal intensity, fundamental frequency, and duration. Focus marks new propositional information and is signalled by making the focused word more prominent in relation to others. A CCVAE was trained on the f0 contours and speech envelopes of a Lombard speech corpus of Finnish utterances. The model’s capability to reconstruct the prosodic charac- teristics was statistically evaluated based on bottleneck representations alone. The following questions were addressed: the appropriate size of the bottleneck layer for the task, the ability of the bottleneck representations to capture the prosodic characteris- tics and the encoding of the bottleneck representations. The study shows promising results. The method can elicit representations that can quantify prosodic effects of the underlying influences and interactions. The study found that even the low dimensional bottlenecks can conceptualise and consis- tently typologize the prosodic events of interest. However, finding the optimal bottleneck dimension still needs more research. Subsequently, the model’s ability to capture the prosodic characteristics was verified by investigating the generated samples. Based on the results, the CCVAE can capture prosodic events. The quality of the reconstruction is positively correlated with the bottleneck dimension. Finally, the encoding of the bottlenecks were examined. The CCVAE encodes the bottleneck representations similarly regardless of the training instance or the bottleneck dimension. The Lombard effect was most efficiently captured and focus conditions as second.
  • Asikainen, Atte (Helsingin yliopisto, 2021)
    It is common for speech to occur in closed spaces. Hence, room acoustics have a significant role in speech communication. In previous studies, effects of reverberation on speech production have been found. However, research on the concerned field is yet scarce. Adverse room acoustics have been observed to expose occupational speakers, such as teachers, to voice disorders. Thus, it is crucial to study what are the room acoustic requirements for economic speaking. The purpose of this study is to examine which speech-acoustic traits change when the speaker is exposed to reverberation, and how. In the present study, two different approaches are taken: variation of reverberation time and removal of the reverberation. The changes in speech are reflected to the Lombard sign (the raise of speech level in a noisy environment). Additionally, differences related to gender and prosody are examined concerning the present topic. In this study, a speech production experiment was conducted with acoustic and statistical analyses. 11 Finnish-speaking volunteers (six females and five males) participated the experiment, where 150 short sentences were recorded from each participant. The sentences were produced in five different room-acoustic conditions. In four out of five, digitally simulated reverberation was played back on headphones worn by the participant with varying reverberation times. The fifth condition was (nearly) anechoic. Out of the recorded sentences, speech rate, creak ratio and harmonics-to-noise ratio were measured along with mean, maximum and movement of intensity and pitch. The measurements were then assessed with various statistical methods. The results of the study show a significant decrease in speech rate caused by an increasing reverberation time. Additionally, speech rate was the highest in the anechoic condition. Moreover, creak ratio decreased greatly when reverberation time increased to more than one second especially on male speakers and end-weighted sentences. Additionally, monotonousness was higher in reverberated conditions than the anechoic condition. However, substantial speaker-dependent differences in the effects of reverberation on speech were found. Moreover, sentence weight was found to influence speech more fundamentally than reverberation. The results suggest that rooms with average reverberation times, rather than particularly long or short, seem the most beneficial for speaking. This observation corresponds to previous studies. Further research on the field is required to extract valuable knowledge needed in acoustical design of spaces, including classrooms. Designing speaker-friendly spaces helps to preserve occupational speakers’ voices throughout their careers.
  • Pihanurmi, Seila (Helsingfors universitet, 2014)
    Goals. The purpose of this study was to examine, whether the dynamicity of pitch has an effect on duration perception in synthetic auditory stimuli and if the effect, if observed, is dependent on first language. Furthermore, it was of interest to see, if mother tongue has an effect to the way static auditory stimuli are perceived. The effect of dynamic pitch on duration perception is a topic little researched and the results obtained are contradictory which is why this thesis is relevant. The duration discrimination abilities of Finnish and Chinese people have not been compared to each other before, so this thesis has some new information to offer about the perception of duration. Method. The research consisted of different kinds of behavioral tests, which measured the ability to discriminate the differences between two auditory stimuli. Two alternative forced choice method was used in all the experiments. In the first experiment the discrimination ability was measured in stimuli which only differed in duration. In the second experiment the stimuli were dynamic and in the third experiment the stimuli only differed in pitch. There were altogether 30 subjects, 15 of which Finnish and 15 Mandarin Chinese speakers. The answers of the subjects were analyzed with logistic linear regression models fit for multiple variables. Results and conclusions. According to the results mother tongue does have an effect to the answers given and the dynamicity of pitch does lengthen the perceived duration. The effect of language background was also apparent in static stimuli although the significance was marginal. It is nevertheless possible to conclude from the results that the perception of duration between Finnish and Mandarin Chinese speakers does differ from each other. The research on the effect of dynamic pitch on duration perception needs to be continued. Further research is especially needed over the conventions of perceiving natural auditory stimuli and the perception of pitch when tied to a linguistic context.
  • Virkkunen, Päivi (Helsingfors universitet, 2015)
    Goals: The aim of this study was to examine the speech production of Finnish compound words. Finnish prosody and the effect of contrastive focus has been studied widely, but there is no research about the prosody of compound production. This study compared the differences in prosodic features of compound words and phrases and examined the effect of sentence stress to the production of compound words. Methods and materials: This quantitative research consisted of a speech production test and statistical analysis. In the speech production test, sentences with the compound words and phrases made of the same words (e.g. kissankello 'harebell' and kissan kello 'cat's bell') were repeated in three focus conditions. These were broad focus, narrow focus on the first word and narrow focus on the second word. 20 subjects (14 female, 6 male) read 1200 sentences in total, which were segmented and annotated. Fundamental frequency, intensity and syllable lengths were measured and compared between compound words and phrases. The relationships between the word types and acoustic features were studied using repeated measures analysis of variance. Results and conclusions: This research revealed new information about the production of compound words in different focus conditions. Results showed that in the broad focus condition compound words and phrases were produced differently from each other. Word type had a statistically significant effect (p < 0,001) on the acoustical measurements: speakers treated compound words as one word with one primary stress in the first syllable while phrases were treated as two words with individual primary stresses. Contrastive focus strongly affected to the acoustic parameters, and those changes masked the word type differences that were found in the broad focus condition. Context has a significant effect on how a listener interprets the message, but there are also acoustical ways to distinguish compound words from phrases in speech. These findings give a strong foundation to future research, however there are limitations. It is important to increase the amount of male speakers in this study and the research could be expanded to different dialects of Finnish and the speech of non-native speakers. The results can also be used in the development of speech synthesis, e.g. text-to-speech synthesis.
  • Oppong, Olivia Serwaa (Helsingin yliopisto, 2021)
    This thesis investigates the interaction between lexical tones and pitch reset in Akan, a Kwa language with about 8.1 million native speakers in Ghana (Eberhard et al., 2020). Experimental studies on Akan prosody are limited, although the language has a large first and second language speakers. This study seeks to increase our knowledge of the tone-intonation structure of the Akan language. In an earlier study on Akan complex declarative sentences, pitch reset occurred at the beginning of the content word that followed the clausal marker of an embedded clause (Kügler, 2016). Following a pilot study, a hypothesis was formed for the present study that pitch reset in complex declarative utterances in Akan also occurs within the clausal marker of the dependent clause and not only in the following content word. Focusing on the Asante Twi dialect, a controlled material consisting of 64 complex sentences were created. Five native speakers of Asante Twi were recorded as they produced the 64 sentences and additional 32 complex sentences used as fillers. The Mean f_0 values of the syllables of the subordinate conjunction and the syllables of the word before and after the conjunction were extracted and analysed in R; the statistical analysis was based on a linear mixed model. As expected, a reset in the pitch contour consistently occurred within the subordinate conjunction, contrasting the earlier study. The conjunction was phrased prosodically with the dependent clause to signal the syntactic relationship between the two. The degree of pitch register reset was also dependent on the tonal structure; reset was more significant when the initial tone of the conjunction was High but lesser when the conjunction began with a Low tone. Thus, the results show that lexical tones interact to determine the f_0 contour of Akan utterances and that the intonational contour of utterances is complex in the Akan language.
  • Ojala, Tiia (Helsingfors universitet, 2016)
    Goals The aim of this study was to examine how prosodic features affect the perception of prominence with Finnish participants. In more detail, the focus of the study was on the strength and hierarchy of four different acoustic features, length, intensity, fundamental frequency and its dynamic movement. These are examined for the first time in the same experiment. Addition to the acoustic features, the order of the stimuli is also considered to have an effect on perceived prominence. Generally, the phenomena of speech perception have been explained with both universal theories and language specific features, which could affect the perception differently for speakers from different language backgrounds. Method This experiment had two parts. First part of the test included 200 stimuli which all consisted of three sounds. The stimuli were synthetic, word-like stimuli, approximately 300 milliseconds long and varied randomly with regard to the acoustic features of length, intensity, fundamental frequency and its dynamicity. The second part of the test included 200 stimuli with three sound sequences, that were manipulated from the 300 ms long sound into syllable-like, approximately 100 millisecond long sounds. Altogether 24 informants took part in the experiment (14 for the first part, 15 for the second) and they judged which one of the three sound stood out from the others. The answers were analysed with a linear mixed-effect model. Results and conclusions Based on this experiment, fundamental frequency and length of the stimuli were the most important features of perceived prominence for Finnish speakers. Dynamicity of the pitch and intensity of the stimuli were not statistically significant features. It's possible that the influence of pitch height is so great that it overrides the other features possible impacts on perceived prominence
  • Vanhanen, Annukka (Helsingfors universitet, 2015)
    The aim of the study was to determine, which of the prosodic parameters (duration, pitch, intensity) influence whether a pair of words is perceived as a compound or a phrase. I also investigated how three different focus conditions (sentence stress) interact with the perception. The conditions were broad focus, narrow focus on the first noun and narrow focus of the second noun. Three speakers were recorded. They read sentences in which three different focus conditions were produced. The method used was two-alternative forced choise (2AFC), in which the participants answered, whether they had heard a compound or a phrase. A generalized linear mixed-effects model was used in which participants' answers were compared to correct answers. The acoustical measurements were also taken into account. The study revealed that the intensity changes between two words affected the decision. Also the changes in the pitch between two words had an effect on the decision. These findings were statistically significant. The narrow focus on the first noun was partly produced by changing the duration difference between the vowels. The statistical model also revealed that the pair of the words one of the speakers produced was perceived differently compared to the sentences of two other speakers; revealing that the individuals may have different strategies for producing prosodic phenomena.
  • Altarriba, Laura (Helsingfors universitet, 2015)
    Goals: Vowels are defined as their unique formant pattern based on their vocal tract configuration, according to acoustic theory of speech production. Simplified, high frequency F1 values are inversely related to the height of the tongue and high F2 values are high when the tongue is in front of the mouth cavity. Then, F1 is responsible for vertical and F2 for horizontal movement of the tongue. The aim of this study was to examine articulation of the Finnish vowels [i] and [ɑ], and also compare them to the two lowest formants: will the speakers produce vowels in the same manner of articulation. Methods: Four MRI scanned subjects, two men and two women, were included in this study. Half of them were normal speakers and other half orthognatic patients. MR-pictures were 3D-pictures (DICOM), and measuring was made with Osirix-program using nine articulatory measurement points. Also five lowest formants of the speech signal were measured with Praat-program. Regressions between articulatory variables were statistically examined as well as connections between articulatory and acoustic data. Linear mixed-effects model was used on the latter. Articulatory data was normalized for the comparison because of individual anatomy of the speakers Results and conclusions: Differences between speakers were noticed. Basically, articulation of normal speakers was stable, but orthonatic patients had extreme or versatile articulatory positions instead. Statistical testing revealed that there where positive and negative correlations between articulatory measurement points. Speaker or vowel dependent differences as well as clear synergism were found. An important observation was that the tongue root was a vowel dependent separator. There was also found a considerable connection between horizontal tongue position and F2. However, the sound environment of the MR-imaging may have caused the Lombard effect and that is why the results must be considered with caution. Comparing them to normal speech is not recommended. In future, it would be suggested to make the articulatory and acoustic measuring methods more accurate, and study if differences between normal speakers and orthonatic patients can be generalized. Also it would be recommended to gather more data of the other Finnish vowels and their articulatory positions.
  • Anttila, Hanna (Helsingfors universitet, 2008)
    Goals This study aims to map the effect of interrogative function on the intonation of spontaneous and read Finnish. Earlier research shows that the most prominent feature in Finnish question intonation is an appeal to the listener. Question word questions typically start with a high peak which is followed by falling intonation. In yes/no questions, F0 remains on a high level until the word carrying sentence stress and then falls. Final rises are mainly found in intonation clichés such as "Ai mitä?" ("What?") These earlier results are based on read speech and enacted dialogues. In this study, questions and statements found in spontaneous dialogues were compared. These utterances were also compared with read versions of the same utterances. Fundamental frequency values were compared using a mixed model. Contours were also grouped using auditory and visual inspection. Thus it was possible to compare frequencies of contour types according to utterance type and speech style. The position of questions in the F0 distribution of the whole material was also investigated in this study. Method he material consisted of four spontaneous dialogues and their read versions. The speakers were young adults from the Helsinki metropolitan area, four females and four males. The whole material was first divided into broad dialogue function categories arising from the material and F0 curves were calculated for each category. After this, 277 questions and 244 statements were selected for closer inspection. Values reflecting F0 distribution and contour shape were measured from the F0 contours of these utterances. A mixed model was used to analyse the differences. Utterance type, question type, speech style and speaker gender were used as fixed effects. The frequencies of F0 contour types were compared using a Chi square test. Additional material in this study came from eight young female speakers in central Finland. Results and conclusions In the mixed model analysis, significant differences were found both between questions and statements and between spontaneous and read speech. Generally, utterance type affected the variables reflecting contour type while speech style affected the variables reflecting F0 distribution. The effect of question type was not clearly visible. In read speech the contours resembled earlier results more closely. Speakers had different strategies in differentiating between questions and statements. In the whole material, F0 was slightly higher in questions than in statements. The effect of dialectal background could be seen in the contour types. The results show that interrogative function affects intonation in both spontaneous and read Finnish.
  • Kallio, Heini (Helsingfors universitet, 2012)
    The main aim of this study was to examine the effects of different acoustic features on the perception of clear news speech. Additional goal was to increase knowledge on informational clear speech, more closely the Finnish plain language radio news speech. Plain language news is produced by the national Finnish broadcasting station for listeners with Finnish as a second language. Clear speech in news reading was not previously studied in Finnish and therefore the theoretical aspects were obtained from several studies on clear speech, intelligibility and prosody. Clear speech research has revealed many acoustic-phonetic changes made by speakers attempting to clarify their speech. Features like slower speech rate, wider F0 range, higher mean F0 and increased intensity are said to be characteristic to clear speech. In plain language news slower speech rate and appropriate phrasing are significant. The study consisted of two experiments: a listening experiment and acoustic analysis. The purpose of the listening experiment was to study how speech rate, clarity, pleasantness and intelligibility of different news readers were perceived by listeners with different linguistic background. Ratings of news speech from professional plain news readers were obtained from 15 non-native learners of Finnish and 15 native Finnish listeners. Factors that varied were speech rate (normal versus slow) and speaker (two males and two females). Acoustic analysis was made to study differences between news readers on speech rate, articulation rate, fundamental frequency, prosodic phrasing and voice quality. Measurements were made from news samples with two addressed speaking rates. The relations between the rating results of two subject groups and acoustic features were studied using ordinal logistic regression model. The intelligibility ratings of non-native listeners were affected by linguistic contents and therefore were not reliable for statistical testing. However, results showed that fundamental speech rates and articulation rates affected the perception of clarity. Voice quality had an effect on perceived pleasantness. Methods for measuring intelligibility should be studied further. Also the relations between acoustic features as well as perceived qualities of speech should be studied further. This study can be seen as preliminary study for upcoming research on Finnish clear news speech. The focus of this study was on fundamental acoustic features. In addition to suggested improvements, wider analysis on segmental level acoustics is recommended.
  • Ripatti, Minttu (Helsingfors universitet, 2016)
    Speech is a sum of a complicated, multifunctional neurological and motor action. By changing the articulatory setting, the resonance properties of the vocal tract change and a new sound is created. Speech can be described as a continuum of articulatory manoeuvre; each manoeuvre has its own function and they're added together to gain the target articulation. Ventriloquism is speech without visible speech manoeuvres. Previously only few studies about ventriloquism have been published. Studies have focused on articulation, expiratory air pressure, fundamental frequency, laryngeal action, perceptual voice quality and simulation of compensating sounds of a ventriloquist. This study wanted to find out about the articulatory strategies of ventriloquists. Nasality, fundamental frequency, duration and the actual ventriloquism as a speech technique were examined – the writer learned the art of ventriloquism during research. Results show higher fundamental frequency, more nasality and longer duration compared to normal speech. However, differences between the participants were found. We can also rename ventriloquism as velar speech technique by the results obtained from the study. The results show, that velar speech technique may have a potential rule in helping those with structurally disturbed articulators. e.g. oral and throat cancer patients during post-operative speech therapy.