Minimum Coresets for Maxima Representation of Multidimensional Data

Show full item record



Permalink

http://hdl.handle.net/10138/333090

Citation

Wang , Y , Mathioudakis , M , Li , Y & Tan , K-L 2021 , Minimum Coresets for Maxima Representation of Multidimensional Data . in Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS '21) . ACM , pp. 138–152 , 2021 ACM SIGMOD/PODS International Conference on Management of Data , Xi'an , China , 20/06/2021 . https://doi.org/10.1145/3452021.3458322

Title: Minimum Coresets for Maxima Representation of Multidimensional Data
Author: Wang, Yanhao; Mathioudakis, Michael; Li, Yuchen; Tan, Kian-Lee
Other contributor: University of Helsinki, Department of Computer Science
University of Helsinki, Department of Computer Science

Publisher: ACM
Date: 2021
Language: eng
Belongs to series: Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS '21)
ISBN: 978-1-4503-8381-3
978-1-4503-8381-3
DOI: https://doi.org/10.1145/3452021.3458322
URI: http://hdl.handle.net/10138/333090
Abstract: Coresets are succinct summaries of large datasets such that, for a given problem, the solution obtained from a coreset is provably competitive with the solution obtained from the full dataset. As such, coreset-based data summarization techniques have been successfully applied to various problems, e.g., geometric optimization, clustering, and approximate query processing, for scaling them up to massive data. In this paper, we study coresets for the maxima representation of multidimensional data: Given a set P of points in R^d , where d is a small constant, and an error parameter ε ∈ (0, 1), a subset Q ⊆ P is an ε-coreset for the maxima representation of P iff the maximum of Q is an ε-approximation of the maximum of P for any vector u ∈ R^d , where the maximum is taken over the inner products between the set of points (P or Q) and u. We define a novel minimum ε-coreset problem that asks for an ε-coreset of the smallest size for the maxima representation of a point set. For the two-dimensional case, we develop an optimal polynomial-time algorithm for the minimum ε-coreset problem by transforming it into the shortest-cycle problem in a directed graph. Then, we prove that this problem is NP-hard in three or higher dimensions and present polynomial-time approximation algorithms in an arbitrary fixed dimension. Finally, we provide extensive experimental results on both real and synthetic datasets to demonstrate the superior performance of our proposed algorithms.
Subject: 113 Computer and information sciences
Rights:


Files in this item

Total number of downloads: Loading...

Files Size Format View
pods043_wangA1.pdf 971.3Kb PDF View/Open

This item appears in the following Collection(s)

Show full item record