An Evaluation of Language-Agnostic Inner-Attention-Based Representations in Machine Translation

Näytä kaikki kuvailutiedot



Pysyväisosoite

http://hdl.handle.net/10138/305136

Lähdeviite

Raganato , A , Vázquez , R , Creutz , M & Tiedemann , J 2019 , An Evaluation of Language-Agnostic Inner-Attention-Based Representations in Machine Translation . in I Augenstein , S Gella , S Ruder , K Kann , B Can , J Welbl , A Conneau , X Ren & M Rei (eds) , The 4th Workshop on Representation Learning for NLP (RepL4NLP-2019) : Proceedings of the Workshop . The Association for Computational Linguistics , Stroudsburg , pp. 27-32 , Workshop on Representation Learning for NLP , Florence , Italy , 02/08/2019 . < https://www.aclweb.org/anthology/W19-4304 >

Julkaisun nimi: An Evaluation of Language-Agnostic Inner-Attention-Based Representations in Machine Translation
Tekijä: Raganato, Alessandro; Vázquez, Raúl; Creutz, Mathias; Tiedemann, Jörg
Muu tekijä: Augenstein, Isabelle
Gella, Spandana
Ruder, Sebastian
Kann, Katharina
Can, Burcu
Welbl, Johannes
Conneau, Alexis
Ren, Xiang
Rei, Marek
Tekijän organisaatio: Department of Digital Humanities
Language Technology
Julkaisija: The Association for Computational Linguistics
Päiväys: 2019-08-01
Kieli: eng
Sivumäärä: 6
Kuuluu julkaisusarjaan: The 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)
ISBN: 978-1-950737-35-2
URI: http://hdl.handle.net/10138/305136
Tiivistelmä: In this paper, we explore a multilingual translation model with a cross-lingually shared layer that can be used as fixed-size sentence representation in different downstream tasks. We systematically study the impact of the size of the shared layer and the effect of including additional languages in the model. In contrast to related previous work, we demonstrate that the performance in translation does correlate with trainable downstream tasks. In particular, we show that larger intermediate layers not only improve translation quality, especially for long sentences, but also push the accuracy of trainable classification tasks. On the other hand, shorter representations lead to increased compression that is beneficial in non-trainable similarity tasks. We hypothesize that the training procedure on the downstream task enables the model to identify the encoded information that is useful for the specific task whereas non-trainable benchmarks can be confused by other types of information also encoded in the representation of a sentence.
Avainsanat: 6121 Languages
113 Computer and information sciences
Vertaisarvioitu: Kyllä
Tekijänoikeustiedot: cc_by
Pääsyrajoitteet: openAccess
Rinnakkaistallennettu versio: publishedVersion
Rahoittaja: European Commission
SUOMEN AKATEMIA
Rahoitusnumero: 771113


Tiedostot

Latausmäärä yhteensä: Ladataan...

Tiedosto(t) Koko Formaatti Näytä
W19_4304.pdf 371.9KB PDF Avaa tiedosto

Viite kuuluu kokoelmiin:

Näytä kaikki kuvailutiedot