Low-rank approximations of second-order document representations

Show simple item record

dc.contributor.author Lagus, Jarkko
dc.contributor.author Sinkkonen, Janne
dc.contributor.author Klami, Arto
dc.contributor.editor Bansal, Mohit
dc.contributor.editor Villavicencio, Aline
dc.date.accessioned 2020-01-13T16:29:01Z
dc.date.available 2020-01-13T16:29:01Z
dc.date.issued 2019-11
dc.identifier.citation Lagus , J , Sinkkonen , J & Klami , A 2019 , Low-rank approximations of second-order document representations . in M Bansal & A Villavicencio (eds) , Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL) . ACL , Stroudsburg, PA , pp. 634-644 , Conference on Computational Natural Language Learning , Hong Kong , Hong Kong , 03/11/2019 . https://doi.org/10.18653/v1/K19-1059
dc.identifier.citation conference
dc.identifier.other PURE: 129104788
dc.identifier.other PURE UUID: 946718e0-197a-47db-96b7-5051857a6628
dc.identifier.other ORCID: /0000-0002-7950-1355/work/68616179
dc.identifier.uri http://hdl.handle.net/10138/309458
dc.description.abstract Document embeddings, created with methods ranging from simple heuristics to statistical and deep models, are widely applicable. Bag-of-vectors models for documents include the mean and quadratic approaches (Torki, 2018). We present evidence that quadratic statistics alone, without the mean information, can offer superior accuracy, fast document comparison, and compact document representations. In matching news articles to their comment threads, low-rank representations of only 3-4 times the size of the mean vector give most accurate matching, and in standard sentence comparison tasks, results are state of the art despite faster computation. Similarity measures are discussed, and the Frobenius product implicit in the proposed method is contrasted to Wasserstein or Bures metric from the transportation theory. We also shortly demonstrate matching of unordered word lists to documents, to measure topicality or sentiment of documents. en
dc.format.extent 11
dc.language.iso eng
dc.publisher ACL
dc.relation.ispartof Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)
dc.relation.isversionof 978-1-950737-72-7
dc.rights cc_by
dc.rights.uri info:eu-repo/semantics/openAccess
dc.subject 113 Computer and information sciences
dc.title Low-rank approximations of second-order document representations en
dc.type Conference contribution
dc.contributor.organization Department of Computer Science
dc.contributor.organization Helsinki Institute for Information Technology
dc.description.reviewstatus Peer reviewed
dc.relation.doi https://doi.org/10.18653/v1/K19-1059
dc.rights.accesslevel openAccess
dc.type.version publishedVersion

Files in this item

Total number of downloads: Loading...

Files Size Format View
K19_1059.pdf 2.350Mb PDF View/Open

This item appears in the following Collection(s)

Show simple item record