The Helsinki-NLP Submissions at NADI 2023 Shared Task : Walking the Baseline

Scherrer, YvesMiletic Haddad, AleksandraKuparinen, OlliSawaf, HassanEl-Beltagy, SamhaaZaghouani, WajdiMagdy, WalidAbdelali, AhmedTomeh, NadiAbu Farha, IbrahimHabash, NizarKhalifa, SalamKeleg, AmrHaddad, HatemZitouni, ImedMrini, KhalilAlmatham, Rawan2023-12-212023-12-212023-12-01Scherrer, Y, Miletic Haddad, A & Kuparinen, O 2023, The Helsinki-NLP Submissions at NADI 2023 Shared Task : Walking the Baseline. in H Sawaf, S El-Beltagy, W Zaghouani, W Magdy, A Abdelali, N Tomeh, I Abu Farha, N Habash, S Khalifa, A Keleg, H Haddad, I Zitouni, K Mrini & R Almatham (eds), Proceedings of the The First Arabic Natural Language Processing Conference (ArabicNLP 2023). The Association for Computational Linguistics, Stroudsburg, pp. 670-677, Arabic Natural Language Processing Conference, Singapore, 07/12/2023. https://doi.org/10.18653/v1/2023.arabicnlp-1.73conferenceBibtex: scherrer-etal-2023-helsinkiORCID: /0000-0001-9468-7111/work/150702206ORCID: /0000-0001-5247-5073/work/150762236http://hdl.handle.net/10138/568978The Helsinki-NLP team participated in the NADI 2023 shared tasks on Arabic dialect translation with seven submissions. We used statistical (SMT) and neural machine translation (NMT) methods and explored character- and subword-based data preprocessing. Our submissions placed second in both tracks. In the open track, our winning submission is a character-level SMT system with additional Modern Standard Arabic language models. In the closed track, our best BLEU scores were obtained with the leave-as-is baseline, a simple copy of the input, and narrowly followed by SMT systems. In both tracks, fine-tuning existing multilingual models such as AraT5 or ByT5 did not yield superior performance compared to SMT.8engcc_byinfo:eu-repo/semantics/openAccessComputer and information sciencesLanguagesThe Helsinki-NLP Submissions at NADI 2023 Shared Task : Walking the BaselineConference contributionopenAccesscd83c895-557c-4931-b016-d871e339febf