Drobac , S , Kauppinen , P S & Linden , B K J 2017 , OCR and post-correction of historical Finnish texts . in J Tiedemann (ed.) , Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden . Linköping Electronic Conference Proceedings , no. 131 , Linköping University Electronic Press , Linköping , pp. 70-76 , Nordic Conference of Computational Linguistics , Gothenburg , Sweden , 22/05/2017 . < http://www.ep.liu.se/ecp/131/ecp17131.pdf >
Title: | OCR and post-correction of historical Finnish texts |
Author: | Drobac, Senka; Kauppinen, Pekka Sakari; Linden, Bo Krister Johan |
Other contributor: | Tiedemann, Jörg |
Contributor organization: | Department of Modern Languages 2010-2017 Language Technology |
Publisher: | Linköping University Electronic Press |
Date: | 2017 |
Language: | eng |
Number of pages: | 7 |
Belongs to series: | Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden |
Belongs to series: | Linköping Electronic Conference Proceedings |
ISBN: | 978-91-7685-601-7 |
ISSN: | 1650-3686 |
URI: | http://hdl.handle.net/10138/229864 |
Abstract: | This paper presents experiments on Optical character recognition (OCR) as a combination of Ocropy software and data-driven spelling correction that uses Weighted Finite-State Methods. Both model training and testing were done on Finnish corpora of historical newspaper text and the best combination of OCR and post-processing models give 95.21% character recognition accuracy. |
Subject: | 6121 Languages |
Peer reviewed: | Yes |
Usage restriction: | openAccess |
Self-archived version: | publishedVersion |
Total number of downloads: Loading...
Files | Size | Format | View |
---|---|---|---|
W17_0209.pdf | 205.9Kb |
View/ |