Assessing text readability and quality with language models

Show simple item record

dc.contributor Helsingin yliopisto, Matemaattis-luonnontieteellinen tiedekunta fi
dc.contributor University of Helsinki, Faculty of Science en
dc.contributor Helsingfors universitet, Matematisk-naturvetenskapliga fakulteten sv
dc.contributor.author Liu, Yang
dc.date.issued 2020
dc.identifier.uri URN:NBN:fi:hulib-202003191584
dc.identifier.uri http://hdl.handle.net/10138/313475
dc.description.abstract Automatic readability assessment is considered as a challenging task in NLP due to its high degree of subjectivity. The majority prior work in assessing readability has focused on identifying the level of education necessary for comprehension without the consideration of text quality, i.e., how naturally the text flows from the perspective of a native speaker. Therefore, in this thesis, we aim to use language models, trained on well-written prose, to measure not only text readability in terms of comprehension but text quality. In this thesis, we developed two word-level metrics based on the concordance of article text with predictions made using language models to assess text readability and quality. We evaluate both metrics on a set of corpora used for readability assessment or automated essay scoring (AES) by measuring the correlation between scores assigned by our metrics and human raters. According to the experimental results, our metrics are strongly correlated with text quality, which achieve 0.4-0.6 correlations on 7 out of 9 datasets. We demonstrate that GPT-2 surpasses other language models, including the bigram model, LSTM, and bidirectional LSTM, on the task of estimating text quality in a zero-shot setting, and GPT-2 perplexity-based measure is a reasonable indicator for text quality evaluation. en
dc.language.iso eng
dc.publisher Helsingin yliopisto fi
dc.publisher University of Helsinki en
dc.publisher Helsingfors universitet sv
dc.title Assessing text readability and quality with language models en
dc.type.ontasot pro gradu -tutkielmat fi
dc.type.ontasot master's thesis en
dc.type.ontasot pro gradu-avhandlingar sv
dc.subject.discipline none und
dct.identifier.urn URN:NBN:fi:hulib-202003191584
dc.subject.specialization Algoritmit fi
dc.subject.specialization Algorithms en
dc.subject.specialization Algoritmer sv
dc.subject.degreeprogram Tietojenkäsittelytieteen maisteriohjelma fi
dc.subject.degreeprogram Master's Programme in Computer Science en
dc.subject.degreeprogram Magisterprogrammet i datavetenskap sv

Files in this item

Total number of downloads: Loading...

Files Size Format View
Yang_s_MSc.pdf 1.179Mb PDF View/Open

This item appears in the following Collection(s)

Show simple item record