TY - GEN
T1 - CS2
T2 - 2013 ACM SIGMOD Conference on Management of Data, SIGMOD 2013
AU - Yu, Feng
AU - Hou, Wen Chi
AU - Luo, Cheng
AU - Che, Dunren
AU - Zhu, Mengxia
PY - 2013
Y1 - 2013
N2 - Fast and accurate estimations for complex queries are profoundly beneficial for large databases with heavy workloads. In this research, we propose a statistical summary for a database, called CS2 (Correlated Sample Synopsis), to provide rapid and accurate result size estimations for all queries with joins and arbitrary selections. Unlike the state-of-the-art techniques, CS2 does not completely rely on simple random samples, but mainly consists of correlated sample tuples that retain join relationships with less storage. We introduce a statistical technique, called reverse sample, and design a powerful estimator, called reverse estimator, to fully utilize correlated sample tuples for query estimation. We prove both theoretically and empirically that the reverse estimator is unbiased and accurate using CS2. Extensive experiments on multiple datasets show that CS2 is fast to construct and derives more accurate estimations than existing methods with the same space budget.
AB - Fast and accurate estimations for complex queries are profoundly beneficial for large databases with heavy workloads. In this research, we propose a statistical summary for a database, called CS2 (Correlated Sample Synopsis), to provide rapid and accurate result size estimations for all queries with joins and arbitrary selections. Unlike the state-of-the-art techniques, CS2 does not completely rely on simple random samples, but mainly consists of correlated sample tuples that retain join relationships with less storage. We introduce a statistical technique, called reverse sample, and design a powerful estimator, called reverse estimator, to fully utilize correlated sample tuples for query estimation. We prove both theoretically and empirically that the reverse estimator is unbiased and accurate using CS2. Extensive experiments on multiple datasets show that CS2 is fast to construct and derives more accurate estimations than existing methods with the same space budget.
KW - Database Synopsis
KW - Query Optimization
KW - Selectivity Estimation
UR - http://www.scopus.com/inward/record.url?scp=84880534257&partnerID=8YFLogxK
U2 - 10.1145/2463676.2463701
DO - 10.1145/2463676.2463701
M3 - Conference contribution
AN - SCOPUS:84880534257
SN - 9781450320375
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 469
EP - 480
BT - SIGMOD 2013 - International Conference on Management of Data
Y2 - 22 June 2013 through 27 June 2013
ER -