Toward automatic improvement of language produced by non-native language learners

Show full item record



Permalink

http://hdl.handle.net/10138/306947

Citation

Creutz , M & Sjöblom , E I 2019 , Toward automatic improvement of language produced by non-native language learners . in D Alfter , E Volodina , L Borin , I Pilán & H Lange (eds) , Proceedings of the 8th Workshop on Natural Language Processingfor Computer Assisted Language Learning (NLP4CALL 2019) . Linköping Electronic Conference Proceedings , no. 164 , NEALT Proceedings Series , no. 39 , Linköping University Electronic Press , Linköping , pp. 20-30 , Workshop on Natural Language Processing for Computer Assisted Language Learning , Turku , Finland , 30/09/2019 .

Title: Toward automatic improvement of language produced by non-native language learners
Author: Creutz, Mathias; Sjöblom, Eetu Ilari
Other contributor: University of Helsinki, Department of Digital Humanities
University of Helsinki, Department of Digital Humanities
Alfter, David
Volodina, Elena
Borin, Lars
Pilán, Ildikó
Lange, Herbert

Publisher: Linköping University Electronic Press
Date: 2019-09-30
Language: eng
Number of pages: 11
Belongs to series: Proceedings of the 8th Workshop on Natural Language Processingfor Computer Assisted Language Learning (NLP4CALL 2019)
Belongs to series: Linköping Electronic Conference Proceedings - NEALT Proceedings Series
ISBN: 978-91-7929-998-9
URI: http://hdl.handle.net/10138/306947
Abstract: It is important for language learners to practice speaking and writing in realistic scenarios. The learners also need feedback on how to express themselves better in the new language. In this paper, we perform automatic paraphrase generation on language-learner texts. Our goal is to devise tools that can help language learners write more correct and natural sounding sentences. We use a pivoting method with a character-based neural machine translation system trained on subtitle data to paraphrase and improve learner texts that contain grammatical errors and other types of noise. We perform experiments in three languages: Finnish, Swedish and English. We experiment with monolingual data as well as error-augmented monolingual and bilingual data in addition to parallel subtitle data during training. Our results show that our baseline model trained only on parallel bilingual data sets is surprisingly robust to different types of noise in the source sentence, but introducing artificial errors can improve performance. In addition to error correction, the results show promise for using the models to improve fluency and make language-learner texts more idiomatic.
Subject: 113 Computer and information sciences
6121 Languages
Rights:


Files in this item

Total number of downloads: Loading...

Files Size Format View
W19_6303.pdf 173.3Kb PDF View/Open

This item appears in the following Collection(s)

Show full item record