Creating an Aligned Russian Text Simplification Dataset from Language Learner Data

Show full item record



Permalink

http://hdl.handle.net/10138/334002

Citation

Dmitrieva , A & Tiedemann , J 2021 , Creating an Aligned Russian Text Simplification Dataset from Language Learner Data . in B Babych [et al.] (ed.) , Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing . ACL Anthology , Stroudsburg , pp. 73-79 , Workshop on Balto-Slavic Natural Language Processing , 20/04/2021 . < https://aclanthology.org/2021.bsnlp-1.8 >

Title: Creating an Aligned Russian Text Simplification Dataset from Language Learner Data
Author: Dmitrieva, Anna; Tiedemann, Jörg
Editor: Babych [et al.], Bogdan
Contributor: University of Helsinki, Department of Digital Humanities
University of Helsinki, Mind and Matter
Publisher: ACL Anthology
Date: 2021-04
Language: eng
Belongs to series: Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing
ISBN: 978-1-954085-14-5
URI: http://hdl.handle.net/10138/334002
Abstract: Parallel language corpora where regular texts are aligned with their simplified versions can be used in both natural language processing and theoretical linguistic studies. They are essential for the task of automatic text simplification, but can also provide valuable insights into the characteristics that make texts more accessible and reveal strategies that human experts use to simplify texts. Today, there exist a few parallel datasets for English and Simple English, but many other languages lack such data. In this paper we describe our work on creating an aligned Russian-Simple Russian dataset composed of Russian literature texts adapted for learners of Russian as a foreign language. This will be the first parallel dataset in this domain, and one of the first Simple Russian datasets in general.
Subject: 113 Computer and information sciences
6121 Languages
Rights:


Files in this item

Total number of downloads: Loading...

Files Size Format View
2021.bsnlp_1.8.pdf 161.0Kb PDF View/Open

This item appears in the following Collection(s)

Show full item record