SenseDefs : a multilingual corpus of semantically annotated textual definitions

Show simple item record

dc.contributor University of Helsinki, Department of Digital Humanities en Camacho-Collados, Jose Delli Bovi, Claudio Raganato, Alessandro Navigli, Roberto 2019-08-01T08:52:01Z 2019-08-01T08:52:01Z 2019
dc.identifier.citation Camacho-Collados , J , Delli Bovi , C , Raganato , A & Navigli , R 2019 , ' SenseDefs : a multilingual corpus of semantically annotated textual definitions ' , Language Resources and Evaluation , vol. 53 , no. 2 , pp. 251–278 . en
dc.identifier.issn 1574-0218
dc.identifier.other PURE: 114860304
dc.identifier.other PURE UUID: c4ad3565-ddfb-4221-bb22-a0f45fb58915
dc.identifier.other RIS: urn:0EE05B708DA544D0A3E171CB25C25DF5
dc.identifier.other RIS: Camacho-Collados2018
dc.identifier.other Scopus: 85050537625
dc.identifier.other WOS: 000471044900003
dc.description.abstract Definitional knowledge has proved to be essential in various Natural Language Processing tasks and applications, especially when information at the level of word senses is exploited. However, the few sense-annotated corpora of textual definitions available to date are of limited size: this is mainly due to the expensive and time-consuming process of annotating a wide variety of word senses and entity mentions at a reasonably high scale. In this paper we present SenseDefs, a large-scale high-quality corpus of disambiguated definitions (or glosses) in multiple languages, comprising sense annotations of both concepts and named entities from a wide-coverage unified sense inventory. Our approach for the construction and disambiguation of this corpus builds upon the structure of a large multilingual semantic network and a state-of-the-art disambiguation system: first, we gather complementary information of equivalent definitions across different languages to provide context for disambiguation; then we refine the disambiguation output with a distributional approach based on semantic similarity. As a result, we obtain a multilingual corpus of textual definitions featuring over 38 million definitions in 263 languages, and we publicly release it to the research community. We assess the quality of SenseDefs’s sense annotations both intrinsically and extrinsically on Open Information Extraction and Sense Clustering tasks. en
dc.format.extent 28
dc.language.iso eng
dc.relation.ispartof Language Resources and Evaluation
dc.rights en
dc.subject 6121 Languages en
dc.subject 113 Computer and information sciences en
dc.title SenseDefs : a multilingual corpus of semantically annotated textual definitions en
dc.type Article
dc.description.version Peer reviewed
dc.type.uri info:eu-repo/semantics/other
dc.type.uri info:eu-repo/semantics/publishedVersion

Files in this item

Total number of downloads: Loading...

Files Size Format View
Camacho_Collado ... sAMultilingualCorpusOf.pdf 986.6Kb PDF View/Open

This item appears in the following Collection(s)

Show simple item record