Projecting named entity recognizers without annotated or parallel corpora

Show simple item record Hou, Jue Koppatz, Maximilian Hoya Quecedo, Jose María Yangarber, Roman
dc.contributor.editor Hartmann, Mareike
dc.contributor.editor Plank, Barbara 2019-10-14T13:30:02Z 2019-10-14T13:30:02Z 2019-10
dc.identifier.citation Hou , J , Koppatz , M , Hoya Quecedo , J M & Yangarber , R 2019 , Projecting named entity recognizers without annotated or parallel corpora . in M Hartmann & B Plank (eds) , 22nd Nordic Conference on Computational Linguistics (NoDaLiDa) : Proceedings of the Conference . Linköping Electronic Conference Proceedings , no. 67 , NEALT Proceedings Series , no. 42 , Linköping University Electronic Press , Linköping , pp. 232-241 , Nordic Conference on Computational Linguistics , Turku , Finland , 30/09/2019 .
dc.identifier.citation conference
dc.identifier.other PURE: 126118974
dc.identifier.other PURE UUID: 7806aa58-16c8-4b4d-9d67-a1349caf5361
dc.identifier.other ORCID: /0000-0001-9404-2022/work/68617687
dc.identifier.other ORCID: /0000-0001-5264-9870/work/68618674
dc.description.abstract Named entity recognition (NER) is a well-researched task in the field of NLP, which typically requires large annotated corpora for training usable models. This is a problem for languages which lack large annotated corpora, such as Finnish. We propose an approach to create a named entity recognizer with no annotated or parallel documents, by leveraging strong NER models that exist for English. We automatically gather a large amount of chronologically matched data in two languages, then project named entity annotations from the English documents onto the Finnish ones, by resolving the matches with limited linguistic rules. We use this “artificially” annotated data to train a BiLSTM-CRF model. Our results show that this method can produce annotated instances with high precision, and the resulting model achieves state-of-the-art performance. en
dc.format.extent 10
dc.language.iso eng
dc.publisher Linköping University Electronic Press
dc.relation.ispartof 22nd Nordic Conference on Computational Linguistics (NoDaLiDa)
dc.relation.ispartofseries Linköping Electronic Conference Proceedings
dc.relation.ispartofseries NEALT Proceedings Series
dc.relation.isversionof 978-91-7929-995-8
dc.rights cc_by
dc.rights.uri info:eu-repo/semantics/openAccess
dc.subject 113 Computer and information sciences
dc.subject 6121 Languages
dc.title Projecting named entity recognizers without annotated or parallel corpora en
dc.type Conference contribution
dc.contributor.organization Department of Computer Science
dc.contributor.organization Department of Digital Humanities
dc.contributor.organization Helsinki Inequality Initiative (INEQ)
dc.description.reviewstatus Peer reviewed
dc.relation.issn 1650-3686
dc.rights.accesslevel openAccess
dc.type.version publishedVersion

Files in this item

Total number of downloads: Loading...

Files Size Format View
W19_6124.pdf 184.6Kb PDF View/Open

This item appears in the following Collection(s)

Show simple item record