Analysing Finnish with word lists : The DDI approach to morphology revisited

Show full item record



Permalink

http://hdl.handle.net/10138/233894

Citation

Voutilainen , A T & Palolahti , M J 2018 , Analysing Finnish with word lists : The DDI approach to morphology revisited . in Proceedings of the 4th International Workshop for Computational Linguistics for Uralic Languages . The Association for Computational Linguistics , Stroudsburg , pp. 171-180 , International Workshop on Computational Linguistics for Uralic Languages , Helsinki , Finland , 08/01/2018 . < http://www.aclweb.org/anthology/W18-0214 >

Title: Analysing Finnish with word lists : The DDI approach to morphology revisited
Author: Voutilainen, Atro Tapio; Palolahti, Maria Johanna
Contributor organization: Department of Digital Humanities
Language Technology
Publisher: The Association for Computational Linguistics
Date: 2018
Language: eng
Number of pages: 10
Belongs to series: Proceedings of the 4th International Workshop for Computational Linguistics for Uralic Languages
URI: http://hdl.handle.net/10138/233894
Abstract: Morphological lexicons for morphologically complex languages provide good text coverage at the cost of overgeneration, difficulty of modification, and sometimes performance issues. Use of simple, manageable lexicon forms – especially lists – for morphologically complex languages may appear unviable because the number of possible word-forms in a morphologically complex language can be prohibitively high. We created and experimented with a list-based lexicon for a morphologically complex language (Finnish), and compared its coverage with that of a mature morphological analyser on new text in two experimental settings. The observed smallish difference in coverage suggests the viability of using simple and easy-to-modify list-based lexicons as an initial part of morphological analysis, to increase developer control on the vast majority of input tokens.Morphological lexicons for morphologically complex languages provide good text coverage at the cost of overgeneration, difficulty of modification, and sometimes performance issues. Use of simple, manageable lexicon forms – especially lists – for morphologically complex languages may appear unviable because the number of possible word-forms in a morphologically complex language can be prohibitively high. We created and experimented with a list-based lexicon for a morphologically complex language (Finnish), and compared its coverage with that of a mature morphological analyser on new text in two experimental settings. The observed smallish difference in coverage suggests the viability of using simple and easy-to-modify list-based lexicons as an initial part of morphological analysis, to increase developer control on the vast majority of input tokens.
Subject: 6121 Languages
Usage restriction: openAccess
Self-archived version: publishedVersion


Files in this item

Total number of downloads: Loading...

Files Size Format View
W18_0214.pdf 129.6Kb PDF View/Open

This item appears in the following Collection(s)

Show full item record