Minimum Description Length Models for Unsupervised Learning of Morphology

Näytä kaikki kuvailutiedot

Permalink

http://urn.fi/URN:NBN:fi-fe2017112252174
Julkaisun nimi: Minimum Description Length Models for Unsupervised Learning of Morphology
Tekijä: Nouri, Javad
Muu tekijä: Helsingin yliopisto, Matemaattis-luonnontieteellinen tiedekunta, Tietojenkäsittelytieteen laitos
Opinnäytteen taso: pro gradu -tutkielmat
Tiivistelmä: This thesis work introduces an approach to unsupervised learning of morphological structure of human languages. We focus on morphologically rich languages and the goal is to construct a knowledge-free and language-independent model. This model works by receiving a long list of words in a language and is expected to learn how to segment the input words in a way that the resulting segments correspond to morphemes in the target language. Several improvements inspired by well-motivated linguistic principles of morphology of languages are introduced to the proposed MDL-based learning algorithm. In addition to the learning algorithm, a new evaluation method and corresponding resources are introduced. Evaluation of morphological segmentations is a challenging task due to the inherent ambiguity of natural languages and underlying morphological processes such as fusion which encumber identification of unique 'correct' segmentations for words. Our evaluation method addresses the problem of segmentation evaluation with a focus on consistency of segmentations. Our approach is tested on data from Finnish, Turkish, and Russian. Evaluation shows a gain in performance over the state of the art.
URI: URN:NBN:fi-fe2017112252174
http://hdl.handle.net/10138/165924
Päiväys: 2016
Oppiaine: Algorithms and Machine Learning
Algorithms and Machine Learning
Algorithms and Machine Learning


Tiedostot

Latausmäärä yhteensä: Ladataan...

Tiedosto(t) Koko Formaatti Näytä
javadthesis05final.pdf 1.033MB PDF Avaa tiedosto
javad-thesis-05-final.pdf 1.033MB PDF Avaa tiedosto

Viite kuuluu kokoelmiin:

Näytä kaikki kuvailutiedot