Projecting named entity recognizers from resource-rich to resource-poor languages without annotated or parallel corpora

Show full item record



Permalink

http://hdl.handle.net/10138/337133

Citation

Hou , J 2020 , ' Projecting named entity recognizers from resource-rich to resource-poor languages without annotated or parallel corpora ' , Department of Computer Science , Helsinki, Finland . < http://urn.fi/URN:NBN:fi:hulib-202001211120 >

Title: Projecting named entity recognizers from resource-rich to resource-poor languages without annotated or parallel corpora
Author: Hou, Jue
Other contributor: Yangarber, Roman
Contributor organization: Department of Computer Science
Department of Digital Humanities
Publisher: University of Helsinki
Date: 2020-01-21
Language: eng
Number of pages: 60
URI: http://hdl.handle.net/10138/337133
Abstract: Named entity recognition is a challenging task in the field of NLP. As other machine learning problems, it requires a large amount of data for training a workable model. It is still a problem for languages such as Finnish due to the lack of data in linguistic resources. In this thesis, I propose an approach to automatic annotation in Finnish with limited linguistic rules and data of resource-rich language, English, as reference. Training with BiLSTM-CRF model, the preliminary result shows that automatic annotation can produce annotated instances with high accuracy and the model can achieve good performance for Finnish. In addition to automatic annotation and NER model training, to show the actual application of my Finnish NER model, two related experiments are conducted and discussed at the end of my thesis.
Subject: 113 Computer and information sciences
Usage restriction: openAccess
Self-archived version: publishedVersion


Files in this item

Total number of downloads: Loading...

Files Size Format View
Jue_Hou_Master_s_Thesis_v2.1.pdf 1.067Mb PDF View/Open

This item appears in the following Collection(s)

Show full item record