Browsing by Subject "finite-state methods"

Sort by: Order: Results:

Now showing items 1-12 of 12
  • Yli-Jyrä, Anssi Mikael (Northern European Association for Language Technology, 2011)
    NEALT Proceedings Series
    Pappret conceptualizes parsning med Constraint Grammar på ett nytt sätt som en process med två viktiga representationer. En representation innehåller lokala tvetydighet och den andra sammanfattar egenskaperna hos den lokala tvetydighet klasser. Båda representationer manipuleras med ren finite-state metoder, men deras samtrafik är en ad hoc -tillämpning av rationella potensserier. Den nya tolkningen av parsning systemet har flera praktiska fördelar, bland annat det inåt deterministiska sättet att beräkna, representera och räkna om alla potentiella tillämpningar av reglerna i meningen.
  • Yli-Jyrä, Anssi Mikael (The Association for Computational Linguistics, 2017)
    A recently proposed encoding for noncrossing digraphs can be used to implement generic inference over families of these digraphs and to carry out first-order factored dependency parsing. It is now shown that the recent proposal can be substantially streamlined without information loss. The improved encoding is less dependent on hierarchical processing and it gives rise to a high-coverage bounded-depth approximation of the space of non- crossing digraphs. This subset is presented elegantly by a finite-state machine that recognizes an infinite set of encoded graphs. The set includes more than 99.99% of the 0.6 million noncrossing graphs obtained from the UDv2 treebanks through planarisation. Rather than taking the low probability of the residual as a flat rate, it can be modelled with a joint probability distribution that is factorised into two underlying stochastic processes – the sentence length distribution and the related conditional distribution for deep nesting. This model points out that deep nesting in the streamlined code requires extreme sentence lengths. High depth is categorically out in common sentence lengths but emerges slowly at infrequent lengths that prompt further inquiry.
  • Yli-Jyrä, Anssi Mikael (The Association for Computational Linguistics, 2011)
    Paperi kuvaa epäkonventionaalisen menetelmän (fonologisten ja morfo-syntaktisten) kontekstirajoitesääntöjen kääntämiseksi epädeterministisiksi automaateiksi äärellistilaisissa työkaluissa ja pintajäsennysjärjestelmissä. Metodi redusoi minkä tahansa kontekstirajoitteen yksinkertaiseksi rajoitteeksi, joka rajoittaa tyhjän merkkijonon esiintymisiä ja esittää oikean puolen kontekstit takaperindeterminististen tilojen avulla. Tapauksissa, joissa täysin deterministinen esitysmuoto olisi eksponentiaalisesti isompi, tällainen sisäänpäin deterministinen kontekstien esitysmuoto voi olla edullisempi kuin erilaiset De Morgan -lähestymistavat, joissa täysi determinisointi on välttämätöntä. Menetelmän yhteydessä jokainen hyväksytty merkkijono saa yksiselitteisen polun, joka on kontekstien tunnistaja-automaatissa olevan tikapuumaisen rakenteen projektio. Tämä projektio voidaan laskea (koko rajoitteelle) ajassa, joka on polynomisessa suhteessa kontekstitilojen määrään. Menetelmästä voi kuitenkin olla vaikea saada hyötyä, jos sitä käytetään äärellistilaisessa kirjastossa, joka pakottaa välitulokset kanonisiksi automaateiksi ja jonka leikkaus-operaatio edellyttää deterministisiä automaatteja operandeinaan.
  • Yli-Jyrä, Anssi Mikael (Northern European Association for Language Technology, 2011)
    NEALT Proceedings Series
    (översättning:) I dokumentet föreslås morphofonematisk markörer kallas positionwise flaggor. Dessa flaggor är inspirerade av de tekniker som används i sammanställningen av två nivåer regler. Det sammanställer praktiskt taget alla regler parallellt, men på ett effektivt sätt. Tekniken hanterar morphofonematisk processer utan separat morphofonematisk representation. De förekomster av allomorphofonem i latenta fonologiska strängar spåras genom en dynamisk datastruktur där den mest framträdande (dvs. bäst rankade) flaggor samlas in. Tillämpningen av tekniken är misstänkt för att ge fördelar när de beskriver morfologi Bantu språk och dialekter
  • Koskenniemi, Kimmo Matti (The Association for Computational Linguistics, 2018)
    A practical method for interactive guessing of LEXC lexicon entries is presented. The method is based on describing groups of similarly inflected words using regular expressions. The patterns are compiled into a finite-state transducer (FST) which maps any word form into the possible LEXC lexicon entries which could generate it. The same FST can be used (1) for converting conventional headword lists into LEXC entries, (2) for interactive guessing of entries, (3) for corpus-assisted interactive guessing and (4) guessing entries from corpora. A method of representing affixes as a table is presented as well how the tables can be converted into LEXC format for several different purposes including morphological analysis and entry guessing. The method has been implemented using the HFST finite-state transducer tools and its Python embedding plus a number of small Python scripts for conversions. The method is tested with a near complete implementation of Finnish verbs. An experiment of generating Finnish verb entries out of corpus data is also described as well as a creation of a full-scale analyzer for Finnish verbs using the conversion patterns.
  • Linden, Krister; Silfverberg, Miikka; Axelson, Erik; Hardwick, Sam; Pirinen, Tommi (Springer, 2011)
    Communications in Computer and Information Science
    HFST–Helsinki Finite-State Technology ( hfst.sf.net ) is a framework for compiling and applying linguistic descriptions with finite-state methods. HFST currently connects some of the most important finite-state tools for creating morphologies and spellers into one open-source platform and supports extending and improving the descriptions with weights to accommodate the modeling of statistical information. HFST offers a path from language descriptions to efficient language applications in key environments and operating systems. HFST also provides an opportunity to exchange transducers between different software providers in order to get the best out of each finite-state library.
  • Drobac, Senka; Silfverberg, Miikka; Yli-Jyrä, Anssi Mikael (The Association for Computational Linguistics, 2012)
    We explain the implementation of replace rules with the .r-glc. operator and preference relations. Our modular approach combines various preference constraints to form different replace rules. In addition to describing the method, we present illustrative examples.
  • Yli-Jyrä, Anssi Mikael (Springer-Verlag, 2012)
    Arc contractions in syntactic dependency graphs can be used to decide which graphs are trees. The paper observes that these contractions can be expressed with weighted finite-state transducers (weighted FST) that operate on string-encoded trees. The observation gives rise to a finite-state parsing algorithm that computes the parse forest and extracts the best parses from it. The algorithm is customizable to functional and bilexical dependency parsing, and it can be extended to non-projective parsing via a multi-planar encoding with prior results on high recall. Our experiments support an analysis of projective parsing according to which the worst-case time complexity of the algorithm is quadratic to the sentence length, and linear to the overlapping arcs and the number of functional categories of the arcs. The results suggest several interesting directions towards efficient and highprecision dependency parsing that takes advantage of the flexibility and the demonstrated ambiguity-packing capacity of such a parser.
  • Koskenniemi, Kimmo (CSLI publications, 2019)
    CSLI Lecture Notes
  • Yli-Jyrä, Anssi (2005)
    Most of the world’s languages lack electronic word form dictionaries. The linguists who gather such dictionaries could be helped with an efficient morphology workbench that adapts to different environments and uses. A widely usable workbench could be characterized, ideally, as generally applicable, extensible, and freely available (GEA). It seems that such a solution could be implemented in the framework of finite-state methods. The current work defines the GEA desiderata and starts a series of articles concerning these desiderata in finite- state morphology. Subsequent parts will review the state of the art and present an action plan toward creating a widely usable finite-state morphology workbench.
  • Yli-Jyrä, Anssi (The Linguistic Association of Finland, 2006)
    The trees in the Penn Treebank have a standard representation that involves complete balanced bracketing. In this article, an alternative for this standard representation of the tree bank is proposed. The proposed representation for the trees is loss-less, but it reduces the total number of brackets by 28%. This is possible by omitting the redundant pairs of special brackets that encode initial and final embedding, using a technique proposed by Krauwer and des Tombe (1981). In terms of the paired brackets, the maximum nesting depth in sentences decreases by 78%. The 99.9% coverage is achieved with only five non-top levels of paired brackets. The observed shallowness of the reduced bracketing suggests that finite-state based methods for parsing and searching could be a feasible option for tree bank processing.
  • Yli-Jyrä, Anssi Mikael (The Linguistic Association of Finland, 2006)
    SKY journal of linguistics, special supplement
    The trees in the Penn Treebank have a standard representation that involves complete balanced bracketing. In this article, an alternative for this standard representation of the tree bank is proposed. The proposed representation for the trees is loss-less, but it reduces the total number of brackets by 28%. This is possible by omitting the redundant pairs of special brackets that encode initial and final embedding, using a technique proposed by Krauwer and des Tombe (1981). In terms of the paired brackets, the maximum nesting depth in sentences decreases by 78%. The 99.9% coverage is achieved with only five non-top levels of paired brackets. The observed shallowness of the reduced bracketing suggests that finite-state based methods for parsing and searching could be a feasible option for tree bank processing.