SIGMORPHON 2021 Shared Task on Morphological Reinflection: Generalization Across Languages

Show full item record



Permalink

http://hdl.handle.net/10138/340752

Citation

Pimentel , T , Ryskina , M , Mielke , S J , Wu , S , Chodroff , E , Leonard , B , Nicolai , G , Ate , Y G , Khalifa , S , Habash , N , El-Khaissi , C , Goldman , O , Gasser , M , Lane , W , Coler , M , Oncevay , A , Montoya Samame , J R , Silva Villegas , G C , Ek , A , Bernardy , J-P , Shcherbakov , A , Bayyr-ool , A , Sheifer , K , Ganieva , S , Plugaryov , M , Klyachko , E , Salehi , A , Krizhanovsky , A , Krizhanovsky , N , Vania , C , Ivanova , S , Salchak , A , Straughn , C , Liu , Z , Washington , J , Ataman , D , Kieraś , W , Woliński , M , Suhardijanto , T , Stoehr , N , Nuriah , Z , Ratan , S , Tyers , F M , Ponti , E M , Aiton , G , Hatcher , R J , Prud'hommeaux , E , Kumar , R , Hulden , M , Barta , B , Lakatos , D , Szolnok , G , Ács , J , Raj , M , Yarowsky , D , Cotterell , R , Ambridge , B & Vylomova , E 2021 , SIGMORPHON 2021 Shared Task on Morphological Reinflection: Generalization Across Languages . in Proceedings of the 18th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology . The Association for Computational Linguistics , pp. 229–259 , SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology , 05/08/2021 . https://doi.org/10.18653/v1/2021.sigmorphon-1.25

Title: SIGMORPHON 2021 Shared Task on Morphological Reinflection: Generalization Across Languages
Author: Pimentel, Tiago; Ryskina, Maria; Mielke, Sabrina J.; Wu, Shijie; Chodroff, Eleanor; Leonard, Brian; Nicolai, Garett; Ate, Yustinus Ghanggo; Khalifa, Salam; Habash, Nizar; El-Khaissi, Charbel; Goldman, Omer; Gasser, Michael; Lane, William; Coler, Matt; Oncevay, Arturo; Montoya Samame, Jaime Rafael; Silva Villegas, Gema Celeste; Ek, Adam; Bernardy, Jean-Philippe; Shcherbakov, Andrey; Bayyr-ool, Aziyana; Sheifer, Karina; Ganieva, Sofya; Plugaryov, Matvey; Klyachko, Elena; Salehi, Ali; Krizhanovsky, Andrew; Krizhanovsky, Natalia; Vania, Clara; Ivanova, Sardana; Salchak, Aelita; Straughn, Christopher; Liu, Zoey; Washington, Jonathan; Ataman, Duygu; Kieraś, Witold; Woliński, Marcin; Suhardijanto, Totok; Stoehr, Niklas; Nuriah, Zahroh; Ratan, Shyam; Tyers, Francis M.; Ponti, Edoardo M.; Aiton, Grant; Hatcher, Richard J.; Prud'hommeaux, Emily; Kumar, Ritesh; Hulden, Mans; Barta, Botond; Lakatos, Dorina; Szolnok, Gábor; Ács, Judit; Raj, Mohit; Yarowsky, David; Cotterell, Ryan; Ambridge, Ben; Vylomova, Ekaterina
Contributor organization: Department of Computer Science
Publisher: The Association for Computational Linguistics
Date: 2021-08
Language: eng
Belongs to series: Proceedings of the 18th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology
ISBN: 978-1-954085-62-6
DOI: https://doi.org/10.18653/v1/2021.sigmorphon-1.25
URI: http://hdl.handle.net/10138/340752
Abstract: This year's iteration of the SIGMORPHON Shared Task on morphological reinflection focuses on typological diversity and cross-lingual variation of morphosyntactic features. In terms of the task, we enrich UniMorph with new data for 32 languages from 13 language families, with most of them being under-resourced: Kunwinjku, Classical Syriac, Arabic (Modern Standard, Egyptian, Gulf), Hebrew, Amharic, Aymara, Magahi, Braj, Kurdish (Central, Northern, Southern), Polish, Karelian, Livvi, Ludic, Veps, Võro, Evenki, Xibe, Tuvan, Sakha, Turkish, Indonesian, Kodi, Seneca, Asháninka, Yanesha, Chukchi, Itelmen, Eibela. We evaluate six systems on the new data and conduct an extensive error analysis of the systems' predictions. Transformer-based models generally demonstrate superior performance on the majority of languages, achieving >90% accuracy on 65% of them. The languages on which systems yielded low accuracy are mainly under-resourced, with a limited amount of data. Most errors made by the systems are due to allomorphy, honorificity, and form variation. In addition, we observe that systems especially struggle to inflect multiword lemmas. The systems also produce misspelled forms or end up in repetitive loops (e.g., RNN-based models). Finally, we report a large drop in systems' performance on previously unseen lemmas.
Subject: 113 Computer and information sciences
Peer reviewed: Yes
Rights: cc_by
Usage restriction: openAccess
Self-archived version: publishedVersion


Files in this item

Total number of downloads: Loading...

Files Size Format View
shared_task.pdf 358.3Kb PDF View/Open

This item appears in the following Collection(s)

Show full item record