Kettunen , K T 2019 , FiST – towards a Free Semantic Tagger of Modern Standard Finnish . in The fifth International Workshop on Computational Linguistics for Uralic Languages Proceedings of the Workshop . The Association for Computational Linguistics , Stroudsburg , pp. 66-76 , International Workshop on Computational Linguistics for Uralic Languages , Tartu , Estonia , 07/01/2019 . https://doi.org/10.18653/v1/w19-0306
Title: | FiST – towards a Free Semantic Tagger of Modern Standard Finnish |
Author: | Kettunen, Kimmo Tapio |
Contributor organization: | The National Library of Finland, Research Library |
Publisher: | The Association for Computational Linguistics |
Date: | 2019-01-30 |
Language: | eng |
Number of pages: | 11 |
Belongs to series: | The fifth International Workshop on Computational Linguistics for Uralic Languages Proceedings of the Workshop |
ISBN: | 978-1-948087-92-6 |
DOI: | https://doi.org/10.18653/v1/w19-0306 |
URI: | http://hdl.handle.net/10138/306801 |
Abstract: | This paper introduces a work in progress for implementing a free full text semantic tagger for Finnish, FiST. The tagger is based on a 46 226 lexeme semantic lexicon of Finnish that was published in 2016. The basis of the semantic lexicon was developed in the early 2000s in an EU funded project Benedict (Löfberg et al., 2005). Löfberg (2017) describes compilation of the lexicon and evaluates a proprietary version of the Finnish Semantic Tagger, the FST2. The FST and its lexicon were developed using the English Semantic Tagger (The EST) of University of Lancaster as a model. This semantic tagger was developed at the University Centre for Corpus Research on Language (UCREL) at Lancaster University as part of the UCREL Semantic Analysis System (USAS3 ) framework. The semantic lexicon of the USAS framework is based on the modified and enriched categories of the Longman Lexicon of Contemporary English (McArthur, 1981). We have implemented a basic working version of a new full text semantic tagger for Finnish based on freely available components. The implementation uses Omorfi and FinnPos for morphological analysis of Finnish words. After the morphological recognition phase words from the 46K semantic lexicon are matched against the morphologically unambiguous base forms. In our comprehensive tests the lexical tagging coverage of the current implementation is around 82–90% with different text types. The present version needs still some enhancements, at least processing of semantic ambiguity of words and analysis of compounds, and perhaps also treatment of multiword expressions. Also a semantically marked ground truth evaluation collection should be established for evaluation of the tagger. |
Subject: |
113 Computer and information sciences
6121 Languages |
Peer reviewed: | Yes |
Rights: | cc_by |
Usage restriction: | openAccess |
Self-archived version: | publishedVersion |
Total number of downloads: Loading...
Files | Size | Format | View |
---|---|---|---|
W19_0306.pdf | 467.4Kb |
View/ |