Digitising Swiss German : how to process and study a polycentric spoken language

Show full item record



Permalink

http://hdl.handle.net/10138/307669

Citation

Scherrer , Y , Samardžić , T & Glaser , E 2019 , ' Digitising Swiss German : how to process and study a polycentric spoken language ' , Language Resources and Evaluation , vol. 53 , no. 4 , pp. 735-769 . https://doi.org/10.1007/s10579-019-09457-5

Title: Digitising Swiss German : how to process and study a polycentric spoken language
Author: Scherrer, Yves; Samardžić, Tanja; Glaser, Elvira
Contributor: University of Helsinki, Department of Digital Humanities
Date: 2019-11-29
Language: eng
Number of pages: 35
Belongs to series: Language Resources and Evaluation
ISSN: 1574-020X
URI: http://hdl.handle.net/10138/307669
Abstract: Swiss dialects of German are, unlike many dialects of other standardised languages, widely used in everyday communication. Despite this fact, automatic processing of Swiss German is still a considerable challenge due to the fact that it is mostly a spoken variety and that it is subject to considerable regional variation. This paper presents the ArchiMob corpus, a freely available general-purpose corpus of spoken Swiss German based on oral history interviews. The corpus is a result of a long design process, intensive manual work and specially adapted computational processing. We first present the modalities of access of the corpus for linguistic, historic and computational research. We then describe how the documents were transcribed, segmented and aligned with the sound source. This work involved a series of experiments that have led to automatically annotated normalisation and part-of-speech tagging layers. Finally, we present several case studies to motivate the use of the corpus for digital humanities in general and for dialectology in particular.
Subject: 6121 Languages
Rights:


Files in this item

Total number of downloads: Loading...

Files Size Format View
Scherrer2019_Ar ... ngSwissGermanHowToProc.pdf 1.443Mb PDF View/Open

This item appears in the following Collection(s)

Show full item record