OCR and post-correction of historical Finnish texts

Show full item record



Permalink

http://hdl.handle.net/10138/229864

Citation

Drobac , S , Kauppinen , P S & Linden , B K J 2017 , OCR and post-correction of historical Finnish texts . in J Tiedemann (ed.) , Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden . Linköping Electronic Conference Proceedings , no. 131 , Linköping University Electronic Press , Linköping , pp. 70-76 , Nordic Conference of Computational Linguistics , Gothenburg , Sweden , 22/05/2017 . < http://www.ep.liu.se/ecp/131/ecp17131.pdf >

Title: OCR and post-correction of historical Finnish texts
Author: Drobac, Senka; Kauppinen, Pekka Sakari; Linden, Bo Krister Johan
Editor: Tiedemann, Jörg
Contributor: University of Helsinki, Department of Modern Languages 2010-2017
University of Helsinki, Department of Modern Languages 2010-2017
University of Helsinki, Department of Modern Languages 2010-2017
Publisher: Linköping University Electronic Press
Date: 2017
Language: eng
Number of pages: 7
Belongs to series: Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden
Belongs to series: Linköping Electronic Conference Proceedings
ISBN: 978-91-7685-601-7
URI: http://hdl.handle.net/10138/229864
Abstract: This paper presents experiments on Optical character recognition (OCR) as a combination of Ocropy software and data-driven spelling correction that uses Weighted Finite-State Methods. Both model training and testing were done on Finnish corpora of historical newspaper text and the best combination of OCR and post-processing models give 95.21% character recognition accuracy.
Subject: 6121 Languages
Rights:


Files in this item

Total number of downloads: Loading...

Files Size Format View
W17_0209.pdf 205.9Kb PDF View/Open

This item appears in the following Collection(s)

Show full item record