Is It Possible to Create a Very Large WordNet in 100 days? -- an Evaluation

Show full item record



Linden , K & Niemi , J 2013 , ' Is It Possible to Create a Very Large WordNet in 100 days? -- an Evaluation ' , Language Resources and Evaluation , vol. 48 , no. 2 , pp. 191-201 .

Title: Is It Possible to Create a Very Large WordNet in 100 days? -- an Evaluation
Author: Linden, Krister; Niemi, Jyrki
Contributor organization: Department of Modern Languages 2010-2017
Krister Linden / Research Group
Date: 2013
Language: eng
Number of pages: 10
Belongs to series: Language Resources and Evaluation
ISSN: 1574-020X
Abstract: Wordnets are large-scale lexical databases of related words and concepts, useful for language-aware software applications. They have recently been built for many languages by using various approaches. The Finnish wordnet, FinnWordNet (FiWN), was created by translating the more than 200,000 word senses in the English Princeton WordNet (PWN) 3.0 in 100 days. To ensure quality, they were translated by professional translators. The direct translation approach was based on the assumption that most synsets in PWN represent language-independent real-world concepts. Thus also the semantic relations between synsets were assumed mostly language-independent, so the structure of PWN could be reused as well. This approach allowed the creation of an extensive Finnish wordnet directly aligned with PWN and also provided us with a translation relation and thus a bilingual wordnet usable as a dictionary. In this paper, we address several concerns raised with regard to  our approach in one single paper, many of them for the first time. We evaluate the craftsmanship of the translators by checking the spelling and translation quality, the viability of the approach by assessing the synonym quality both on the lexeme and concept level, as well as the usefulness of the resulting lexical resource both for humans and in a language-technological task. We discovered no new problems compared with those already known in PWN. As a whole, the paper contributes to the scientific discourse on what it takes to create a very large wordnet. As a side-effect of the evaluation, we extended FiWN to contain 208,645 word senses in 120,449 synsets, effectively making version 2.0 of FiWN the currently largest wordnet in the world by these statistics.
Subject: 6121 Languages
Bilingual lexicon
quality assessment
knowledge representation
word-sense disambiguation
Peer reviewed: Yes
Usage restriction: restrictedAccess
Self-archived version: submittedVersion

Files in this item

Total number of downloads: Loading...

Files Size Format View
lrev_fiwn_2012.pdf 91.15Kb PDF View/Open

This item appears in the following Collection(s)

Show full item record