The University of Helsinki submission to the WMT19 Parallel Corpus Filtering Task

Show full item record



Permalink

http://hdl.handle.net/10138/305139

Citation

Vazquez , R , Sulubacak , U & Tiedemann , J 2019 , The University of Helsinki submission to the WMT19 Parallel Corpus Filtering Task . in O Bojar , R Chatterjee , C Federmann & E A (eds) , Fourth Conference on Machine Translation : Proceedings of the Conference: Volume 3: Shared Task Papers, Day 2 . The Association for Computational Linguistics , Stroudsburg , pp. 294-300 , Conference on Machine Translation , Florence , Italy , 01/08/2019 .

Title: The University of Helsinki submission to the WMT19 Parallel Corpus Filtering Task
Author: Vazquez, Raul; Sulubacak, Umut; Tiedemann, Jörg
Other contributor: University of Helsinki, Department of Digital Humanities
University of Helsinki, Language Technology
University of Helsinki, Department of Digital Humanities
Bojar, Ondřej
Chatterjee, Rajen
Federmann, Christian
et al.

Publisher: The Association for Computational Linguistics
Date: 2019-07-29
Language: eng
Number of pages: 7
Belongs to series: Fourth Conference on Machine Translation Proceedings of the Conference: Volume 3: Shared Task Papers, Day 2
ISBN: 978-1-950737-27-7
URI: http://hdl.handle.net/10138/305139
Abstract: This paper describes the University of Helsinki Language Technology group’s participation in the WMT 2019 parallel corpus filtering task. Our scores were produced using a two-step strategy. First, we individually applied a series of filters to remove the ‘bad’ quality sentences. Then, we produced scores for each sentence by weighting these features with a classification model. This methodology allowed us to build a simple and reliable system that is easily adaptable to other language pairs.
Subject: 113 Computer and information sciences
6121 Languages
Rights:


Files in this item

Total number of downloads: Loading...

Files Size Format View
W19_5441.pdf 277.4Kb PDF View/Open

This item appears in the following Collection(s)

Show full item record