Crowdsourcing Regional Variation Data and Automatic Geolocalisation of Speakers of European French

Show full item record



Permalink

http://hdl.handle.net/10138/235429

Citation

Goldman , J-P , Scherrer , Y , Glikman , J , Avanzi , M , Benzitoun , C & Boula de Mareüil , P 2019 , Crowdsourcing Regional Variation Data and Automatic Geolocalisation of Speakers of European French . in N Calzolari , K Choukri , C Cieri , T Declerck , S Goggi , K Hasida , H Isahara , B Maegaard , J Mariani , H Mazo , A Moreno , J Odijk , S Piperidis & T Tokunaga (eds) , Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) . European Language Resources Association (ELRA) , Paris , pp. 3336-3342 , International Conference on Language Resources and Evaluation , Miyazaki , Japan , 07/05/2018 . < http://www.lrec-conf.org/proceedings/lrec2018/summaries/517.html >

Title: Crowdsourcing Regional Variation Data and Automatic Geolocalisation of Speakers of European French
Author: Goldman, Jean-Philippe; Scherrer, Yves; Glikman, Julie; Avanzi, Mathieu; Benzitoun, Christophe; Boula de Mareüil, Philippe
Editor: Calzolari, Nicoletta; Choukri, Khalid; Cieri, Christopher; Declerck, Thierry; Goggi, Sara; Hasida, Koiti; Isahara, Hitoshi; Maegaard, Bente; Mariani, Joseph; Mazo, Hélène; Moreno, Asuncion; Odijk, Jan; Piperidis, Stelios; Tokunaga, Takenobu
Contributor: University of Helsinki, Department of Digital Humanities
Publisher: European Language Resources Association (ELRA)
Date: 2019
Language: eng
Number of pages: 7
Belongs to series: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
ISBN: 979-10-95546-00-9
URI: http://hdl.handle.net/10138/235429
Abstract: We present the crowdsourcing platform Donnez Votre Français à la Science (DFS, or “Give your French to Science”), which aims to collect linguistic data and document language use, with a special focus on regional variation in European French. The activities not only gather data that is useful for scientific studies, but they also provide feedback to the general public; this is important in order to reward participants, to encourage them to follow future surveys, and to foster interaction with the scientific community. The two main activities described here are 1) a linguistic survey on lexical variation with immediate feedback and 2) a speaker geolocalisation system; i.e., a quiz that guesses the linguistic origin of the participant by comparing their answers with previously gathered linguistic data. For the geolocalisation activity, we set up a simulation framework to optimise predictions. Three classification algorithms are compared: the first one uses clustering and shibboleth detection, whereas the other two rely on feature elimination techniques with Support Vector Machines and Maximum Entropy models as underlying base classifiers. The best-performing system uses a selection of 17 questions and reaches a localisation accuracy of 66%, extending the prediction from the one-best area (one among 109 base areas) to its first-order and second-order neighbouring areas.
Subject: 113 Computer and information sciences
6121 Languages
language variation
regionalism
crowdsourcing
geolocalisation
linguistic geography
cartography
Rights:


Files in this item

Total number of downloads: Loading...

Files Size Format View
517.pdf 398.0Kb PDF View/Open

This item appears in the following Collection(s)

Show full item record