ACM Conference on Management of Data (SIGMOD)
Yaoshu Wang1 Chuan Xiao2 Jianbin Qin1 Xin Cao3 Yifang Sun3 Wei Wang3 Makoto Onizuka2
1Shenzhen University 2Osaka University & Nagoya 3The University of New South Wales
Abstract
In this paper, we investigate the possibilities of utilizing deep learning for cardinality estimation of similarity selection. Answering this problem accurately and efficiently is essential to many data management applications, especially for query optimization. Moreover, in some applications the estimated cardinality is supposed to be consistent and interpretable. Hence a monotonic estimation w.r.t. the query threshold is preferred. We propose a novel and generic method that can be applied to any data type and distance function. Our method consists of a feature extraction model and a regression model. The feature extraction model transforms original data and threshold to a Hamming space, in which a deep learningbased regression model is utilized to exploit the incremental property of cardinality w.r.t. the threshold for both accuracy and monotonicity. We develop a training strategy tailored to our model as well as techniques for fast estimation. We also discuss how to handle updates. We demonstrate the accuracy and the efficiency of our method through experiments, and show how it improves the performance of a query optimizer.
Figure 1: Cardinality distribution on ImageNet.
Figure 2: The regression model.
Figure 3: Example of encoder Ψ.
Figure 4: Φ′ in the accelerated regression model.
Figure 5: Accuracy v.s. threshold.
Figure 6: Accuracy v.s. training data size.
Figure 7: Evaluation of updates.
Figure 8: Evaluation of long-tail queries.
Figure 10: Query processing time.
Figure 11: Query planning precision.
Acknowledgements
This work was supported by JSPS 16H01722, 17H06099, 18H04093, and 19K11979, NSFC 61702409, CCF DBIR2019001A, NKRDP of China 2018YFB1003201, ARC DE190100663, DP170103710, and DP180103411, and D2D CRC DC25002 and DC25003. The Titan V was donated by Nvidia. We thank Rui Zhang (the University of Melbourne) for his precious comments.
Bibtex
@inproceedings{
DBLP:conf/sigmod/WangXQ0SWO20,
author = {Yaoshu Wang and Chuan Xiao and Jianbin Qin and Xin Cao and Yifang Sun and Wei Wang and Makoto Onizuka},
editor = {David Maier and Rachel Pottinger and AnHai Doan and Wang{-}Chiew Tan and Abdussalam Alawini and Hung Q. Ngo},
title = {Monotonic Cardinality Estimation of Similarity Selection: {A} Deep Learning Approach},
booktitle = {Proceedings of the 2020 International Conference on Management of Data, {SIGMOD} Conference 2020, online conference [Portland, OR, USA],
June 14-19, 2020},
pages = {1197--1212},
publisher = {{ACM}}, year = {2020},
}
Downloads