TY - T1 - Unsupervised zero-shot classification of Finnish documents using pre-trained language models SN - / UR - URN:NBN:fi:hulib-202012155147; http://hdl.handle.net/10138/323019 T3 - A1 - Leal, Rafael A2 - PB - Helsingin yliopisto Y1 - 2020 LA - eng AB - In modern Natural Language Processing, document categorisation tasks can achieve success rates of over 95% using fine-tuned neural network models. However, so-called "zero-shot" situations, where specific training data is not available, are researched much less frequently. The objective of this thesis is to investigate how pre-trained Finnish language models fare when classifying documents in a completely unsupervised way: by relying only on their general "knowledge of the world" obtained during... VO - IS - SP - OP - KW - NLP; space vector model; zero-shot classification; Finnish language; pre-trained language models; Kieliteknologia; Language Technology; Språkteknologi; Kielellisen diversiteetin ja digitaalisten menetelmien maisteriohjelma; Master's Programme Linguistic Diversity in the Digital Age; Magisterprogrammet i språklig diversitet och digitala metoder N1 - PP - ER -