Multilingual NMT with a language-independent attention bridge

Show full item record



Permalink

http://hdl.handle.net/10138/304660

Citation

Vazquez Carrillo , J R , Raganato , A , Tiedemann , J & Creutz , M 2019 , Multilingual NMT with a language-independent attention bridge . in I Augenstein , S Gella , S Ruder , K Kann , B Can , J Welbl , A Conneau , X Ren & M Rei (eds) , The 4th Workshop on Representation Learning for NLP (RepL4NLP-2019) : Proceedings of the Workshop . The Association for Computational Linguistics , Stroudsburg , pp. 33-39 , Workshop on Representation Learning for NLP , Florence , Italy , 02/08/2019 .

Title: Multilingual NMT with a language-independent attention bridge
Author: Vazquez Carrillo, Juan Raul; Raganato, Alessandro; Tiedemann, Jörg; Creutz, Mathias
Other contributor: Augenstein, Isabelle
Gella, Spandana
Ruder, Sebastian
Kann, Katharina
Can, Burcu
Welbl, Johannes
Conneau, Alexis
Ren, Xiang
Rei, Marek
Contributor organization: Department of Digital Humanities
Language Technology
Mind and Matter
Publisher: The Association for Computational Linguistics
Date: 2019
Language: eng
Number of pages: 7
Belongs to series: The 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)
ISBN: 978-1-950737-35-2
URI: http://hdl.handle.net/10138/304660
Abstract: In this paper, we propose a multilingual encoder-decoder architecture capable of obtaining multilingual sentence representations by means of incorporating an intermediate {\em attention bridge} that is shared across all languages. That is, we train the model with language-specific encoders and decoders that are connected via self-attention with a shared layer that we call attention bridge. This layer exploits the semantics from each language for performing translation and develops into a language-independent meaning representation that can efficiently be used for transfer learning. We present a new framework for the efficient development of multilingual NMT using this model and scheduled training. We have tested the approach in a systematic way with a multi-parallel data set. We show that the model achieves substantial improvements over strong bilingual models and that it also works well for zero-shot translation, which demonstrates its ability of abstraction and transfer learning.
Subject: 6121 Languages
113 Computer and information sciences
Natural language processing
Multilingual machine translation
Peer reviewed: Yes
Rights: cc_by
Usage restriction: openAccess
Self-archived version: publishedVersion
Funder: European Commission
SUOMEN AKATEMIA
Grant number: 771113


Files in this item

Total number of downloads: Loading...

Files Size Format View
W19_4305.pdf 375.2Kb PDF View/Open

This item appears in the following Collection(s)

Show full item record