Finite-state Relations Between Two Historically Closely Related Languages

Show full item record



Permalink

http://hdl.handle.net/10138/42176

Citation

Koskenniemi , K 2013 , Finite-state Relations Between Two Historically Closely Related Languages . in Þ Eyþórsson , L Borin , D Haug & E Rögnvaldsson (eds) , Proceedings of the workshop on computational historical linguistics at NODALIDA 2013 . NEALT Proceedings Series , vol. 18 , Northern European Association for Language Technology , Linköping , pp. 43-53 , Workshop on Computational Historical Linguistics, NODALIDA 2013 , Oslo , Norway , 22/05/2013 . < http://www.ep.liu.se/ecp/087/ecp13087.pdf >

Title: Finite-state Relations Between Two Historically Closely Related Languages
Author: Koskenniemi, Kimmo
Other contributor: University of Helsinki, Department of Modern Languages 2010-2017
Eyþórsson, Þórhallur
Borin, Lars
Haug, Dag
Rögnvaldsson, Eirikur
Publisher: Northern European Association for Language Technology
Date: 2013
Language: eng
Number of pages: 11
Belongs to series: Proceedings of the workshop on computational historical linguistics at NODALIDA 2013
Belongs to series: NEALT Proceedings Series
ISBN: 978-91-7519-587-2
URI: http://hdl.handle.net/10138/42176
Abstract: Regular correspondences between historically related languages can be modelled using finite-state transducers (FST). A new method is presented by demonstrating it with a bidirectional experiment between Finnish and Estonian. An artificial representation (resembling a proto-language) is established between two related languages. This representation, AFE (Aligned Finnish-Estonian) is based on the letter by letter alignment of the two languages and uses mechanically constructed morphophonemes which represent the corresponding characters. By describing the constraints of this AFE using two-level rules, one may construct useful mappings between the languages. In this way, the badly ambiguous FSTs from Finnish and Estonian to AFE can be composed into a practically unambiguous transducer from Finnish to Estonian. The inverse mapping from Estonian to Finnish is mildly ambiguous. Steps according to the proposed method could be repeated as such with dialectal or older written texts. Choosing a set of model words, aligning them, recording the mechanical correspondences and designing rules for the constraints could be done with a limited effort. For the purposes of indexing and searching, the mild ambiguity may be tolerable as such. The ambiguity can be further reduced by composing the resulting FST with a speller or morphological analyser of the standard language.
Subject: 6121 Languages
finite-state transducers
historical linguistics
HFST
two-level morphology
FOMA
Rights:


Files in this item

Total number of downloads: Loading...

Files Size Format View
ecp1387004.pdf 149.3Kb PDF View/Open

This item appears in the following Collection(s)

Show full item record