TY - T1 - OCR and post-correction of historical Finnish texts SN - / UR - http://hdl.handle.net/10138/229864 T3 - Linköping Electronic Conference Proceedings A1 - Drobac, Senka; Kauppinen, Pekka Sakari; Linden, Bo Krister Johan A2 - Tiedemann, Jörg PB - Linköping University Electronic Press Y1 - 2017 LA - eng AB - This paper presents experiments on Optical character recognition (OCR) as a combination of Ocropy software and data-driven spelling correction that uses Weighted Finite-State Methods. Both model training and testing were done on Finnish corpora of historical newspaper text and the best combination of OCR and post-processing models give 95.21% character recognition accuracy.... VO - IS - SP - OP - KW - 6121 Languages N1 - PP - ER -