One size does not fit all : accelerating OLAP workloads with GPUs

Show full item record



Zhang , Y , Zhang , Y , Lu , J , Wang , S , Liu , Z & Han , R 2020 , ' One size does not fit all : accelerating OLAP workloads with GPUs ' , Distributed and Parallel Databases , vol. 38 , pp. 995-1037 .

Title: One size does not fit all : accelerating OLAP workloads with GPUs
Author: Zhang, Yansong; Zhang, Yu; Lu, Jiaheng; Wang, Shan; Liu, Zhuan; Han, Ruichen
Contributor organization: Department of Computer Science
Unified DataBase Management System research group / Jiaheng Lu
Date: 2020-07-31
Language: eng
Number of pages: 43
Belongs to series: Distributed and Parallel Databases
ISSN: 0926-8782
Abstract: GPU has been considered as one of the next-generation platforms for real-time query processing databases. In this paper we empirically demonstrate that the representative GPU databases [e.g., OmniSci (Open Source Analytical Database & SQL Engine,, 2019)] may be slower than the representative in-memory databases [e.g., Hyper (Neumann and Leis, IEEE Data Eng Bull 37(1):3-11, 2014)] with typical OLAP workloads (with Star Schema Benchmark) even if the actual dataset size of each query can completely fit in GPU memory. Therefore, we argue that GPU database designs should not be one-size-fits-all; a general-purpose GPU database engine may not be well-suited for OLAP workloads without careful designed GPU memory assignment and GPU computing locality. In order to achieve better performance for GPU OLAP, we need to re-organize OLAP operators and re-optimize OLAP model. In particular, we propose the 3-layer OLAP model to match the heterogeneous computing platforms. The core idea is to maximize data and computing locality to specified hardware. We design the vector grouping algorithm for data-intensive workload which is proved to be assigned to CPU platform adaptive. We design the TOP-DOWN query plan tree strategy to guarantee the optimal operation in final stage and pushing the respective optimizations to the lower layers to make global optimization gains. With this strategy, we design the 3-stage processing model (OLAP acceleration engine) for hybrid CPU-GPU platform, where the computing-intensive star-join stage is accelerated by GPU, and the data-intensive grouping & aggregation stage is accelerated by CPU. This design maximizes the locality of different workloads and simplifies the GPU acceleration implementation. Our experimental results show that with vector grouping and GPU accelerated star-join implementation, the OLAP acceleration engine runs 1.9x, 3.05x and 3.92x faster than Hyper, OmniSci GPU and OmniSci CPU in SSB evaluation with dataset of SF = 100.
Subject: GPU
Layered OLAP
Vector grouping
3-layer OLAP model
113 Computer and information sciences
Peer reviewed: Yes
Usage restriction: openAccess
Self-archived version: acceptedVersion

Files in this item

Total number of downloads: Loading...

Files Size Format View
One_Size_Cannot_Fit_All.pdf 1.334Mb PDF View/Open

This item appears in the following Collection(s)

Show full item record