Unsupervised Learning of Cross-Lingual Symbol Embeddings Without Parallel Data

Show full item record



Permalink

http://hdl.handle.net/10138/304870

Citation

Granroth-Wilding , M & Toivonen , H 2019 , Unsupervised Learning of Cross-Lingual Symbol Embeddings Without Parallel Data . in Second Annual Meeting of the Society for Computation in Linguistics (SCiL 2019) . , 4 , The Association for Computational Linguistics , pp. 19-28 , Society for Computation in Linguistics , New York , New York , United States , 03/01/2019 . https://doi.org/10.7275/wx64-ea83

Title: Unsupervised Learning of Cross-Lingual Symbol Embeddings Without Parallel Data
Author: Granroth-Wilding, Mark; Toivonen, Hannu
Contributor organization: Department of Computer Science
Discovery Research Group/Prof. Hannu Toivonen
Helsinki Institute for Information Technology
Publisher: The Association for Computational Linguistics
Date: 2019-01-03
Language: eng
Number of pages: 10
Belongs to series: Second Annual Meeting of the Society for Computation in Linguistics (SCiL 2019)
ISBN: 978-1-5108-7753-5
DOI: https://doi.org/10.7275/wx64-ea83
URI: http://hdl.handle.net/10138/304870
Abstract: We present a new method for unsupervised learning of multilingual symbol (e.g. character) embeddings, without any parallel data or prior knowledge about correspondences between languages. It is able to exploit similarities across languages between the distributions over symbols' contexts of use within their language, even in the absence of any symbols in common to the two languages. In experiments with an artificially corrupted text corpus, we show that the method can retrieve character correspondences obscured by noise. We then present encouraging results of applying the method to real linguistic data, including for low-resourced languages. The learned representations open the possibility of fully unsupervised comparative studies of text or speech corpora in low-resourced languages with no prior knowledge regarding their symbol sets.
Subject: 113 Computer and information sciences
6121 Languages
Peer reviewed: Yes
Rights: unspecified
Usage restriction: openAccess
Self-archived version: publishedVersion
Funder: Academy of Finland
Grant number:


Files in this item

Total number of downloads: Loading...

Files Size Format View
mgwht2019.pdf 977.7Kb PDF View/Open

This item appears in the following Collection(s)

Show full item record