Minimum Description Length Models for Unsupervised Learning of Morphology

Show full item record

Title: Minimum Description Length Models for Unsupervised Learning of Morphology
Author: Nouri, Javad
Contributor: Helsingin yliopisto, Matemaattis-luonnontieteellinen tiedekunta, Tietojenkäsittelytieteen laitos
Thesis level:
Abstract: This thesis work introduces an approach to unsupervised learning of morphological structure of human languages. We focus on morphologically rich languages and the goal is to construct a knowledge-free and language-independent model. This model works by receiving a long list of words in a language and is expected to learn how to segment the input words in a way that the resulting segments correspond to morphemes in the target language. Several improvements inspired by well-motivated linguistic principles of morphology of languages are introduced to the proposed MDL-based learning algorithm. In addition to the learning algorithm, a new evaluation method and corresponding resources are introduced. Evaluation of morphological segmentations is a challenging task due to the inherent ambiguity of natural languages and underlying morphological processes such as fusion which encumber identification of unique “correct” segmentations for words. Our evaluation method addresses the problem of segmentation evaluation with a focus on consistency of segmentations. Our approach is tested on data from Finnish, Turkish, and Russian. Evaluation shows a gain in performance over the state of the art.
Date: 2016-08-17
Discipline: Algorithms and Machine Learning

Files in this item

Total number of downloads: Loading...

Files Size Format View
javadthesis05final.pdf 1.033Mb PDF View/Open

This item appears in the following Collection(s)

Show full item record