Projecting named entity recognizers without annotated or parallel corpora

Näytä kaikki kuvailutiedot



Pysyväisosoite

http://hdl.handle.net/10138/306000

Lähdeviite

Hou , J , Koppatz , M , Hoya Quecedo , J M & Yangarber , R 2019 , Projecting named entity recognizers without annotated or parallel corpora . in M Hartmann & B Plank (eds) , 22nd Nordic Conference on Computational Linguistics (NoDaLiDa) : Proceedings of the Conference . Linköping Electronic Conference Proceedings , no. 67 , NEALT Proceedings Series , no. 42 , Linköping University Electronic Press , Linköping , pp. 232-241 , Nordic Conference on Computational Linguistics , Turku , Finland , 30/09/2019 .

Julkaisun nimi: Projecting named entity recognizers without annotated or parallel corpora
Tekijä: Hou, Jue; Koppatz, Maximilian; Hoya Quecedo, Jose María; Yangarber, Roman
Muu tekijä: Hartmann, Mareike
Plank, Barbara
Tekijän organisaatio: Department of Computer Science
Department of Digital Humanities
Helsinki Inequality Initiative (INEQ)
Julkaisija: Linköping University Electronic Press
Päiväys: 2019-10
Kieli: eng
Sivumäärä: 10
Kuuluu julkaisusarjaan: 22nd Nordic Conference on Computational Linguistics (NoDaLiDa)
Kuuluu julkaisusarjaan: Linköping Electronic Conference Proceedings - NEALT Proceedings Series
ISBN: 978-91-7929-995-8
ISSN: 1650-3686
URI: http://hdl.handle.net/10138/306000
Tiivistelmä: Named entity recognition (NER) is a well-researched task in the field of NLP, which typically requires large annotated corpora for training usable models. This is a problem for languages which lack large annotated corpora, such as Finnish. We propose an approach to create a named entity recognizer with no annotated or parallel documents, by leveraging strong NER models that exist for English. We automatically gather a large amount of chronologically matched data in two languages, then project named entity annotations from the English documents onto the Finnish ones, by resolving the matches with limited linguistic rules. We use this “artificially” annotated data to train a BiLSTM-CRF model. Our results show that this method can produce annotated instances with high precision, and the resulting model achieves state-of-the-art performance.
Avainsanat: 113 Computer and information sciences
6121 Languages
Vertaisarvioitu: Kyllä
Tekijänoikeustiedot: cc_by
Pääsyrajoitteet: openAccess
Rinnakkaistallennettu versio: publishedVersion


Tiedostot

Latausmäärä yhteensä: Ladataan...

Tiedosto(t) Koko Formaatti Näytä
W19_6124.pdf 184.6KB PDF Avaa tiedosto

Viite kuuluu kokoelmiin:

Näytä kaikki kuvailutiedot