Guessing lexicon entries using finite-state methods

Show full item record



Permalink

http://hdl.handle.net/10138/233853

Citation

Koskenniemi , K M 2018 , Guessing lexicon entries using finite-state methods . in T Pirinen , M Rießler , J Rueter , T Trosterud & F M Tyers (eds) , Proceedings of the Fourth International Workshop on Computatinal Linguistics for Uralic Languages . The Association for Computational Linguistics , Stroudsburg , pp. 59-77 , International Workshop for Computational Linguistics of Uralic Languages , Helsinki , Finland , 08/01/2018 . < http://aclweb.org/anthology/W18-0206 >

Title: Guessing lexicon entries using finite-state methods
Author: Koskenniemi, Kimmo Matti
Editor: Pirinen, Tommi; Rießler, Michael; Rueter, Jack; Trosterud, Trond; Tyers, Francis M.
Contributor: University of Helsinki, Department of Modern Languages 2010-2017
Publisher: The Association for Computational Linguistics
Date: 2018-01
Language: eng
Number of pages: 19
Belongs to series: Proceedings of the Fourth International Workshop on Computatinal Linguistics for Uralic Languages
URI: http://hdl.handle.net/10138/233853
Abstract: A practical method for interactive guessing of LEXC lexicon entries is presented. The method is based on describing groups of similarly inflected words using regular expressions. The patterns are compiled into a finite-state transducer (FST) which maps any word form into the possible LEXC lexicon entries which could generate it. The same FST can be used (1) for converting conventional headword lists into LEXC entries, (2) for interactive guessing of entries, (3) for corpus-assisted interactive guessing and (4) guessing entries from corpora. A method of representing affixes as a table is presented as well how the tables can be converted into LEXC format for several different purposes including morphological analysis and entry guessing. The method has been implemented using the HFST finite-state transducer tools and its Python embedding plus a number of small Python scripts for conversions. The method is tested with a near complete implementation of Finnish verbs. An experiment of generating Finnish verb entries out of corpus data is also described as well as a creation of a full-scale analyzer for Finnish verbs using the conversion patterns.
Subject: 6121 Languages
computational linguistics
language technology
finite-state methods
lexicon
113 Computer and information sciences
natural language processing
finite-state methods
Rights:


Files in this item

Total number of downloads: Loading...

Files Size Format View
W18_0206.pdf 211.0Kb PDF View/Open

This item appears in the following Collection(s)

Show full item record