Browsing by Subject "semantics"

Sort by: Order: Results:

Now showing items 1-5 of 5
  • McKenzie, Emma (Helsingin yliopisto, 2020)
    This project is a corpus-based study on numeral + noun phrases in Scottish Gaelic. The typical pattern in Scottish Gaelic is to use a singular noun after numerals one and two and a plural noun after numerals three through ten. However, there are some nouns that do not follow this expected pattern. These exceptions are called numeratives and there are three different categories of numeratives in Scottish Gaelic: duals, numeratives identical in form to a singular, and numeratives with a form that differs from singular and plural and only used with numerals. This study aims to find which nouns have numerative forms and how their use varies diachronically and between dialects. While numeratives have been more researched in Welsh and Irish, there is not much research on numeratives in Scottish Gaelic. Ò Maolalaigh (2013) did a more restricted corpus study to find what nouns use singular after numerals three through ten. The past research provides a good comparison for my results and gives me a good foundation to expand on. From the past research, there seems to be a semantic relationship between the kinds of nouns that have numerative forms, so I sort my results into semantic categories as well. I also look at numeratives from the perspective of linguistic complexity since Scottish Gaelic is a minority language with a large proportion of L2 speakers. This project uses Corpas na Gàidhlig (the Corpus of Scottish Gaelic), which is part of the University of Glasgow’s Digital Archive of Scottish Gaelic. I search the corpus for numerals two through four to see which nouns use numeratives and how consistently they use them. I also look at how frequently numeratives are used diachronically and how usage varies across dialects. I focus especially on nouns that have a high number of numerative tokens to see if there is a pattern in their usage. In my results, I found 47 nouns that use a dual form and 105 nouns that use a numerative identical in form to a singular. The overall findings for numerative use are that dual use is decreasing, while use of numeratives identical in form to singular has been increasing since 1900-1949. The semantic category with the most dual tokens is natural pairs. The nouns with numeratives identical in form to singular tend to be nouns frequently used with numerals, such as measurement words.
  • Kittila, Seppo (2020)
    Folklore refers to information that we have learnt as a part of the history of our own people and that has passed on from generation to generation for hundreds, or even thousands of years. This paper shows that as an information source folklore has features in common with other information sources, most notably hearsay, but it nevertheless constitutes an information source of its own, characterized as [-personal] [-direct] and [+internalized]. In addition, the paper proposes a formal-functional typology based on the element used for folklore coding. It is also shown that the semantic similarity of the coded element with the proposed definition of folklore corresponds to its frequency. Finally, the paper discusses the central theoretical implications this study has for our understanding of evidentiality.
  • Tissari, Heli; Vanhatalo, Ulla; Siiroinen, Mari (2019)
    NSM researchers have not used corpus data very systematically thus far. One could talk about corpus-assisted rather than corpus-based or corpus-driven research. This article suggests a way to not only base research on corpus data, but also to let it guide us in defining words in terms of NSM. It presents a new method, which we have developed. Our data come from the Suomi24 Sentences Corpus and concerns the Finnish emotion words viha ('anger, hate'), vihata ('to hate') and vihainen ('angry').
  • Saalasti, Satu; Alho, Jussi; Bar, Moshe; Glerean, Enrico; Honkela, Timo; Kauppila, Minna; Sams, Mikko; Jääskeläinen, Iiro P. (2019)
    Introduction: When listening to a narrative, the verbal expressions translate into meanings and flow of mental imagery. However, the same narrative can be heard quite differently based on differences in listeners' previous experiences and knowledge. We capitalized on such differences to disclose brain regions that support transformation of narrative into individualized propositional meanings and associated mental imagery by analyzing brain activity associated with behaviorally assessed individual meanings elicited by a narrative. Methods: Sixteen right-handed female subjects were instructed to list words that best described what had come to their minds while listening to an eight-minute narrative during functional magnetic resonance imaging (fMRI). The fMRI data were analyzed by calculating voxel-wise intersubject correlation (ISC) values. We used latent semantic analysis (LSA) enhanced with Wordnet knowledge to measure semantic similarity of the produced words between subjects. Finally, we predicted the ISC with the semantic similarity using representational similarity analysis. Results: We found that semantic similarity in these word listings between subjects, estimated using LSA combined with WordNet knowledge, predicting similarities in brain hemodynamic activity. Subject pairs whose individual semantics were similar also exhibited similar brain activity in the bilateral supramarginal and angular gyrus of the inferior parietal lobe, and in the occipital pole. Conclusions: Our results demonstrate, using a novel method to measure interindividual differences in semantics, brain mechanisms giving rise to semantics and associated imagery during narrative listening. During listening to a captivating narrative, the inferior parietal lobe and early visual cortical areas seem, thus, to support elicitation of individual meanings and flow of mental imagery.
  • Venekoski, Viljami (Helsingfors universitet, 2016)
    Advances in computational linguistics have made analyzing large quantities of text data a more feasible task than ever before. In particular, the recent distributional language models hold promise of effective semantic analysis at a low computational cost. Semantics, however, is a multifaceted phenomenon, and although various language model architectures have been presented, there is relatively little research evaluating the semantic validity of such models. The aim of this research is to evaluate the semantic validity of different distributional language models, particularly as tools for representing Finnish language online text data. The models and methods are evaluated based on their performance on three empirical studies, each estimating a different aspect of semantic representation. The language models in the studies were built using word2vec architecture. The models were taught on approximately 2.6 billion tokens from the Suomi24 corpus of Finnish language social media discussions. 18 models were built in total, each with a different combination of feature processing methods. The models were evaluated in three studies. For Study I, a resource consisting of 300 similarity ratings for word pairs from 55 human annotators was collected. This resource was used as an evaluation task by comparing model estimated similarity scores to the human rated similarity judgments. Study II investigated relational semantics as an evaluation method and were operationalized in form of an analogy task, for which a Finnish language resource is presented. In Study III, the language models were evaluated based on their performance in document classification of Suomi24 messages to their respective topics. The results of the Studies indicate that each presented evaluation task is sufficiently reliable method for estimating language model semantic validity. In turn, distributed language models are reported being able to represent semantics given morphologically rich yet fragmentary Finnish language social media data. Feature processing methods are shown to increase the semantic accuracy of language models in most cases, but to a limited extent. If evaluated valid, semantic language technologies are proposed to hold widespread applicability across scientific as well as commercial fields.