Grouping business news stories based on salience of named entities

Show full item record



Permalink

http://hdl.handle.net/10138/212995

Citation

Escoter , L , Pivovarova , L , Du , M , Katinskaia , A & Yangarber , R 2017 , Grouping business news stories based on salience of named entities . in 15th Conference of the European Chapter of the Association for Computational Linguistics : Proceedings of Conference, Volume 1: Long Papers . The Association for Computational Linguistics , Stroudsburg, PA , pp. 1096-1106 , Conference of the European Chapter of the Association for Computational Linguistics , Valencia , Spain , 03/04/2017 . https://doi.org/10.18653/v1/e17-1103

Title: Grouping business news stories based on salience of named entities
Author: Escoter, Llorenc; Pivovarova, Lidia; Du, Mian; Katinskaia, Anisia; Yangarber, Roman
Contributor: University of Helsinki, Department of Computer Science
University of Helsinki, Department of Computer Science
University of Helsinki, Department of Computer Science
University of Helsinki, Department of Computer Science
University of Helsinki, Department of Computer Science
Publisher: The Association for Computational Linguistics
Date: 2017
Language: eng
Number of pages: 11
Belongs to series: 15th Conference of the European Chapter of the Association for Computational Linguistics Proceedings of Conference, Volume 1: Long Papers
ISBN: 978-1-945626-34-0
URI: http://hdl.handle.net/10138/212995
Abstract: In news aggregation systems focused on broad news domains, certain stories may appear in multiple articles. Depending on the relative importance of the story, the number of versions can reach dozens or hundreds within a day. The text in these versions may be nearly identical or quite different. Linking multiple versions of a story into a single group brings several important benefits to the end-user—reducing the cognitive load on the reader, as well as signaling the relative importance of the story. We present a grouping algorithm, and explore several vector-based representations of input documents: from a baseline using keywords, to a method using salience—a measure of importance of named entities in the text. We demonstrate that features beyond keywords yield substantial improvements, verified on a manually-annotated corpus of business news stories.
Subject: 113 Computer and information sciences
Rights:


Files in this item

Total number of downloads: Loading...

Files Size Format View
2017_eacl_grouping.pdf 381.6Kb PDF View/Open

This item appears in the following Collection(s)

Show full item record