Browsing by Subject "universal dependencies"

Sort by: Order: Results:

Now showing items 1-7 of 7
  • Sandberg, Kirsi; Andrushchenko, Mykola; Turunen, Risto; Marjanen, Jani; Kurunmäki, Jussi; Peltonen, Jaakko; Nummenmaa, Timo; Nummenmaa, Jyrki (CEUR-WS.org, 2022)
    CEUR Workshop Proceedings
    The temporal aspects of politics have been discussed extensively by political theorists, but have not been explored using grammatically parsed textual datasets. This paper explores the ways in which future, present and past are projected and referred to in speeches in the Finnish parliament that talk about ideologies. Ideologies are crucial categories of thinking about the political past and future and therefore serve as a case in which temporality is expressed in a variety of ways. We use a dataset drawn from Finnish parliamentary records from 1980 to 2021 and operationalize morpho-syntactic information on clause structures and grammatical tense system to explore the different temporal profiles of ideologies. We show how some isms, like communism and fascism, are much more likely to appear in the context of the past, whereas others, like capitalism and racism, tend to appear in the present tense. We further develop a framework for analyzing temporality based on clause structures and grammatical tense and relate that to how the study of politics has approached time in parliamentary speaking.
  • Yli-Jyrä, Anssi Mikael (The Association for Computational Linguistics, 2017)
    A recently proposed encoding for non- crossing digraphs can be used to imple- ment generic inference over families of these digraphs and to carry out first-order factored dependency parsing. It is now shown that the recent proposal can be substantially streamlined without information loss. The improved encoding is less dependent on hierarchical processing and it gives rise to a high-coverage bounded-depth approximation of the space of non- crossing digraphs. This subset is presented elegantly by a finite-state machine that recognises an infinite set of encoded graphs. The set includes more than 99.99% of the 0.6 million noncrossing graphs obtained from the UDv2 treebanks through planarisation. Rather than taking the low probability of the residual as a flat rate, it can be modelled with a joint probability distribution that is factorised into two underlying stochastic processes – the sentence length distribution and the related conditional distribution for deep nesting. This model points out that deep nesting in the streamlined code requires extreme sentence lengths. High depth is categorically out in common sentence lengths but emerges slowly at infrequent lengths that prompt further inquiry.
  • Rueter, Jack; Partanen, Niko; Pirinen, Tommi A (The Association for Computational Linguistics, 2021)
    This study discusses the way different numerals and related expressions are currently annotated in the Universal Dependencies project, with a specific focus on the Uralic language family and only occasional references to the other language groups. We analyse different annotation conventions between individual treebanks, and aim to highlight some areas where further development work and systematization could prove beneficial. At the same time, the Universal Dependencies project already offers a wide range of conventions to mark nuanced variation in numerals and counting expressions, and the harmonization of conventions between different languages could be the next step to take. The discussion here makes specific reference to Universal Dependencies version 2.8, and some differences found may already have been harmonized in version 2.9. Regardless of whether this takes place or not, we believe that the study still forms an important documentation of this period in the project.
  • Rueter, Jack; Hämäläinen, Mika (Ижевск: Институт компьютерных исследований, 2020)
    This paper presents the current lexical, morphological, syntactic and rule-based machine translation work for Erzya and Moksha that can and should be used in the development of a roadmap for Mordvin linguistic research. We seek to illustrate and outline initial problem types to be encountered in the construction of an Apertium-based shallow-transfer machine translation system for the Mordvin language forms. We indicate reference points within Mordvin Studies and other parts of Uralic studies, as a point of departure for outlining a linguistic studies with a means for measuring its own progress and developing a roadmap for further studies. Keywords: Erzya, Moksha, Uralic, Shallow-transfer machine translation, Measurable language research, Measurable language distance, Finite-State Morphology, Universal Dependencies
  • Rueter, Jack; Partanen, Niko (The Association for Computational Linguistics, 2019)
    This paper attempts to evaluate some of the systematic differences in Uralic Universal Dependencies treebanks from a perspective that would help to introduce reasonable improvements in treebank annotation consistency within this language family. The study finds that the coverage of Uralic languages in the project is already relatively high, and the majority of typically Uralic features are already present and can be discussed on the basis of existing treebanks. Some of the idiosyncrasies found in individual treebanks stem from language-internal grammar traditions, and could be a target for harmonization in later phases.
  • Sinnemäki, Kaius; Haakana, Viljami Lauri Juhana (The Association for Computational Linguistics, 2020)
    In this paper we present a method for identifying and analyzing adnominal possessive constructions in 66 Universal Dependencies treebanks. We classify adpossessive constructions in terms of their morphological type (locus of marking) and present a workflow for detecting and analyzing them typologically. Based on a preliminary evaluation, the algorithm works fairly reliably in adpossessive constructions that are morphologically marked. However, it performs rather poorly in adpossessive constructions that are not marked morphologically, so-called zero-marked constructions, because of difficulties in identifying these constructions with the current annotation. We also discuss different types of variation in annotation in different treebanks for the same language and for treebanks of closely related languages. The research focuses on one well-circumscribed and universal construction in the hope of generating more interest in using UD for cross-linguistic comparison and for contributing towards developing yet more consistent annotation of constructions in the UD annotation scheme.
  • Rueter, Jack (Издательский центр Историко-социологического института, 2020)
    This paper addresses the issue of a national corpus for language documentation of the Moksha and Erzya literary languages in coordination with dialect archives comprising over 80 years of fieldwork (inclusive Shoksha, Karatai). It shows necessary development in computer-assisted research tools and ongoing research aligned with a consistent and systematic open research project.