Monotonic Cardinality Estimation of Similarity Selection: A Deep Learning Approach-国家高性能计算中心深圳分中心

论文

当前位置：首页 -> 项目成果 -> 论文 -> 正文

Monotonic Cardinality Estimation of Similarity Selection: A Deep Learning Approach

ACM Conference on Management of Data (SIGMOD)

阅读数：发布日期：22-09-17 21:29

ACM Conference on Management of Data (SIGMOD)

Yaoshu Wang¹Chuan Xiao²Jianbin Qin¹Xin Cao³Yifang Sun³Wei Wang³Makoto Onizuka²

¹Shenzhen University²Osaka University & Nagoya³The University of New South Wales

Abstract

In this paper, we investigate the possibilities of utilizing deep learning for cardinality estimation of similarity selection. Answering this problem accurately and efficiently is essential to many data management applications, especially for query optimization. Moreover, in some applications the estimated cardinality is supposed to be consistent and interpretable. Hence a monotonic estimation w.r.t. the query threshold is preferred. We propose a novel and generic method that can be applied to any data type and distance function. Our method consists of a feature extraction model and a regression model. The feature extraction model transforms original data and threshold to a Hamming space, in which a deep learningbased regression model is utilized to exploit the incremental property of cardinality w.r.t. the threshold for both accuracy and monotonicity. We develop a training strategy tailored to our model as well as techniques for fast estimation. We also discuss how to handle updates. We demonstrate the accuracy and the efficiency of our method through experiments, and show how it improves the performance of a query optimizer.

Figure 1: Cardinality distribution on ImageNet.

Figure 2: The regression model.

Figure 3: Example of encoder Ψ.

Figure 4: Φ′ in the accelerated regression model.

Figure 5: Accuracy v.s. threshold.

Figure 6: Accuracy v.s. training data size.

Figure 7: Evaluation of updates.

Figure 8: Evaluation of long-tail queries.

Figure 10: Query processing time.

Figure 11: Query planning precision.

Acknowledgements

This work was supported by JSPS 16H01722, 17H06099, 18H04093, and 19K11979, NSFC 61702409, CCF DBIR2019001A, NKRDP of China 2018YFB1003201, ARC DE190100663, DP170103710, and DP180103411, and D2D CRC DC25002 and DC25003. The Titan V was donated by Nvidia. We thank Rui Zhang (the University of Melbourne) for his precious comments.

Bibtex

@inproceedings{

DBLP:conf/sigmod/WangXQ0SWO20,

author = {Yaoshu Wang and Chuan Xiao and Jianbin Qin and Xin Cao and Yifang Sun and Wei Wang and Makoto Onizuka},

editor = {David Maier and Rachel Pottinger and AnHai Doan and Wang{-}Chiew Tan and Abdussalam Alawini and Hung Q. Ngo},

title = {Monotonic Cardinality Estimation of Similarity Selection: {A} Deep Learning Approach},

booktitle = {Proceedings of the 2020 International Conference on Management of Data, {SIGMOD} Conference 2020, online conference [Portland, OR, USA],

June 14-19, 2020},

pages = {1197--1212},

publisher = {{ACM}}, year = {2020},

}

Downloads

Paper

上一条：Parkinson's Disease Classification and Clinical Score Regression via United Embedding and Sparse ...

项目成果

论文

Monotonic Cardinality Estimation of Similarity Selection: A Deep Learning Approach

ACM Conference on Management of Data (SIGMOD)

友情链接

联系我们