(Re)lexicalization of auto-written news with contextual and cross-lingual word embeddings

Show full item record


Title: (Re)lexicalization of auto-written news with contextual and cross-lingual word embeddings
Author: Rämö, Miia
Other contributor: Helsingin yliopisto, Matemaattis-luonnontieteellinen tiedekunta
University of Helsinki, Faculty of Science
Helsingfors universitet, Matematisk-naturvetenskapliga fakulteten
Publisher: Helsingin yliopisto
Date: 2020
Language: eng
URI: http://urn.fi/URN:NBN:fi:hulib-202011254592
Thesis level: master's thesis
Degree program: Datatieteen maisteriohjelma
Master's Programme in Data Science
Magisterprogrammet i data science
Specialisation: ei opintosuuntaa
no specialization
ingen studieinriktning
Discipline: none
Abstract: In news agencies, there is a growing interest towards automated journalism. Majority of the systems applied are template- or rule-based, as they are expected to produce accurate and fluent output transparently. However, this approach often leads to output that lacks variety. To overcome this issue, I propose two approaches. In the lexicalization approach new words are included in the sentences, and in relexicalization approach some existing words are replaced with synonyms. Both of the approaches utilize contextual word embeddings for finding suitable words. Furthermore, the above approaches require linguistic resources, which are only available for high- resource languages. Thus, I present variants of the (re)lexicalization approaches that allow their utilization for low-resource languages. These variants utilize cross-lingual word embeddings to access linguistic resources of a high-resource language. The high-resource variants achieved promising results. However, the sampling of words should be further enhanced to improve reliability. The low-resource variants did show some promising results, but the quality suffered from complex morphology of the example language. This is a clear next issue to address and resolving it is expected to significantly improve the results.
Subject: natural language generation
word embeddings
automated journalism

Files in this item

Total number of downloads: Loading...

Files Size Format View
Master_s_thesis_Miia_Final.pdf 1.069Mb PDF View/Open

This item appears in the following Collection(s)

Show full item record