The MuCoW Test Suite at WMT 2019 : Automatically Harvested Multilingual Contrastive Word Sense Disambiguation Test Sets for Machine Translation

Show full item record



Permalink

http://hdl.handle.net/10138/305137

Citation

Raganato , A , Scherrer , Y & Tiedemann , J 2019 , The MuCoW Test Suite at WMT 2019 : Automatically Harvested Multilingual Contrastive Word Sense Disambiguation Test Sets for Machine Translation . in O Bojar , R Chatterjee , C Federmann & E A (eds) , Fourth Conference on Machine Translation : Proceedings of the Conference (Volume 2: Shared Task Papers, Day 1) . The Association for Computational Linguistics , Stroudsburg , pp. 470-480 , Conference on Machine Translation , Florence , Italy , 01/08/2019 . < https://www.aclweb.org/anthology/W19-5354 >

Title: The MuCoW Test Suite at WMT 2019 : Automatically Harvested Multilingual Contrastive Word Sense Disambiguation Test Sets for Machine Translation
Author: Raganato, Alessandro; Scherrer, Yves; Tiedemann, Jörg
Editor: Bojar, Ondřej; Chatterjee, Rajen; Federmann, Christian; et al.
Contributor: University of Helsinki, Department of Digital Humanities
University of Helsinki, Department of Digital Humanities
University of Helsinki, Department of Digital Humanities
Publisher: The Association for Computational Linguistics
Date: 2019-08-01
Language: eng
Number of pages: 11
Belongs to series: Fourth Conference on Machine Translation Proceedings of the Conference (Volume 2: Shared Task Papers, Day 1)
ISBN: 978-1-950737-27-7
URI: http://hdl.handle.net/10138/305137
Abstract: Supervised Neural Machine Translation (NMT) systems currently achieve impressive translation quality for many language pairs. One of the key features of a correct translation is the ability to perform word sense disambiguation (WSD), i.e., to translate an ambiguous word with its correct sense. Existing evaluation benchmarks on WSD capabilities of translation systems rely heavily on manual work and cover only few language pairs and few word types. We present MuCoW, a multilingual contrastive test suite that covers 16 language pairs with more than 200 thousand contrastive sentence pairs, automatically built from word-aligned parallel corpora and the wide-coverage multilingual sense inventory of BabelNet. We evaluate the quality of the ambiguity lexicons and of the resulting test suite on all submissions from 9 language pairs presented in the WMT19 news shared translation task, plus on other 5 language pairs using NMT pretrained models. The MuCoW test suite is available at http://github.com/Helsinki-NLP/MuCoW.
Subject: 6121 Languages
113 Computer and information sciences
Rights:


Files in this item

Total number of downloads: Loading...

Files Size Format View
W19_5354.pdf 403.3Kb PDF View/Open

This item appears in the following Collection(s)

Show full item record