Browsing by Subject "Named entity recognition"

Sort by: Order: Results:

Now showing items 1-3 of 3
  • Ruokolainen, Teemu; Kauppinen, Pekka; Silfverberg, Miikka; Lindén, Krister (2020)
    We present a corpus of Finnish news articles with a manually prepared named entity annotation. The corpus consists of 953 articles (193,742 word tokens) with six named entity classes (organization, location, person, product, event, and date). The articles are extracted from the archives of Digitoday, a Finnish online technology news source. The corpus is available for research purposes. We present baseline experiments on the corpus using a rule-based and two deep learning systems on two, in-domain and out-of-domain, test sets.
  • La Mela, Matti; Tamper, Minna; Kettunen, Kimmo (CEUR-WS.org, 2019)
    CEUR Workshop Proceedings
    The paper studies and improves methods of named entity recognition (NER) and linking (NEL) for facilitating historical research, which uses digitized newspaper texts. The specific focus is on a study about historical process of commodification. The named entity detection pipeline is discussed in three steps. First, the paper presents the corpus, which consists of newspaper articles on wild berry picking from the late nineteenth century. Second, the paper compares two named entity recognition tools: the trainable Stanford NER and the rule-based FiNER. Third, the linking and disambiguation of the recognized places is explored. In the linking process, information about the newspaper publication place is used to improve the identification of small places. The paper concludes that the pipeline performs well for mapping the commodification, and that specific problems relate to the recognition of place names (among named entities). It is shown how Stanford NER performs better in the task (F-score of 0.83) than the FiNER tool (F-score of 0.68). Concerning the linking of places, the use of newspaper metadata appears useful for disambiguation between small places. However, the historical language (with its OCR errors) recognized by the Stanford model poses challenges for the linking tool. The paper proposes that other information, for instance about the reuse of the newspaper articles, could be used to further improve the recognition and linking quality.
  • Leal, Rafael; Rantala, Heikki; Koho, Mikko; Ikkala, Esko; Tamper, Minna; Merenmies, Markus; Hyvönen, Eero (CEUR-WS.org, 2022)
    CEUR Workshop Proceedings
    This paper presents WarMemoirSampo, a portal that provides semantic search and navigation of video interviews with Finnish World War II veterans. The portal associates video fragments with contextual data extracted from the video transcriptions, enabling users to find suitable video segments via faceted search and highlighting relevant content in the video being watched. This is carried out by processing natural language texts in order to extract named entities, keywords and lemmas. The result is a Linked Data Knowledge Graph that underpins the portal. We describe the collaboration between Natural Language Processing and Semantic Web technologies used in order to produce these results.