CS2

A new database synopsis for query estimation

Feng Yu, Wen Chi Hou, Cheng Luo, Dunren Che, Michelle Zhu

Research output: Chapter in Book/Report/Conference proceedingConference contributionResearchpeer-review

12 Citations (Scopus)

Abstract

Fast and accurate estimations for complex queries are profoundly beneficial for large databases with heavy workloads. In this research, we propose a statistical summary for a database, called CS2 (Correlated Sample Synopsis), to provide rapid and accurate result size estimations for all queries with joins and arbitrary selections. Unlike the state-of-the-art techniques, CS2 does not completely rely on simple random samples, but mainly consists of correlated sample tuples that retain join relationships with less storage. We introduce a statistical technique, called reverse sample, and design a powerful estimator, called reverse estimator, to fully utilize correlated sample tuples for query estimation. We prove both theoretically and empirically that the reverse estimator is unbiased and accurate using CS2. Extensive experiments on multiple datasets show that CS2 is fast to construct and derives more accurate estimations than existing methods with the same space budget.

Original languageEnglish
Title of host publicationSIGMOD 2013 - International Conference on Management of Data
Pages469-480
Number of pages12
DOIs
StatePublished - 29 Jul 2013
Event2013 ACM SIGMOD Conference on Management of Data, SIGMOD 2013 - New York, NY, United States
Duration: 22 Jun 201327 Jun 2013

Other

Other2013 ACM SIGMOD Conference on Management of Data, SIGMOD 2013
CountryUnited States
CityNew York, NY
Period22/06/1327/06/13

Fingerprint

Experiments

Keywords

  • Database Synopsis
  • Query Optimization
  • Selectivity Estimation

Cite this

Yu, F., Hou, W. C., Luo, C., Che, D., & Zhu, M. (2013). CS2: A new database synopsis for query estimation. In SIGMOD 2013 - International Conference on Management of Data (pp. 469-480) https://doi.org/10.1145/2463676.2463701
Yu, Feng ; Hou, Wen Chi ; Luo, Cheng ; Che, Dunren ; Zhu, Michelle. / CS2 : A new database synopsis for query estimation. SIGMOD 2013 - International Conference on Management of Data. 2013. pp. 469-480
@inproceedings{e00172cf724d4b0eacbc2d3a5a6014fc,
title = "CS2: A new database synopsis for query estimation",
abstract = "Fast and accurate estimations for complex queries are profoundly beneficial for large databases with heavy workloads. In this research, we propose a statistical summary for a database, called CS2 (Correlated Sample Synopsis), to provide rapid and accurate result size estimations for all queries with joins and arbitrary selections. Unlike the state-of-the-art techniques, CS2 does not completely rely on simple random samples, but mainly consists of correlated sample tuples that retain join relationships with less storage. We introduce a statistical technique, called reverse sample, and design a powerful estimator, called reverse estimator, to fully utilize correlated sample tuples for query estimation. We prove both theoretically and empirically that the reverse estimator is unbiased and accurate using CS2. Extensive experiments on multiple datasets show that CS2 is fast to construct and derives more accurate estimations than existing methods with the same space budget.",
keywords = "Database Synopsis, Query Optimization, Selectivity Estimation",
author = "Feng Yu and Hou, {Wen Chi} and Cheng Luo and Dunren Che and Michelle Zhu",
year = "2013",
month = "7",
day = "29",
doi = "10.1145/2463676.2463701",
language = "English",
isbn = "9781450320375",
pages = "469--480",
booktitle = "SIGMOD 2013 - International Conference on Management of Data",

}

Yu, F, Hou, WC, Luo, C, Che, D & Zhu, M 2013, CS2: A new database synopsis for query estimation. in SIGMOD 2013 - International Conference on Management of Data. pp. 469-480, 2013 ACM SIGMOD Conference on Management of Data, SIGMOD 2013, New York, NY, United States, 22/06/13. https://doi.org/10.1145/2463676.2463701

CS2 : A new database synopsis for query estimation. / Yu, Feng; Hou, Wen Chi; Luo, Cheng; Che, Dunren; Zhu, Michelle.

SIGMOD 2013 - International Conference on Management of Data. 2013. p. 469-480.

Research output: Chapter in Book/Report/Conference proceedingConference contributionResearchpeer-review

TY - GEN

T1 - CS2

T2 - A new database synopsis for query estimation

AU - Yu, Feng

AU - Hou, Wen Chi

AU - Luo, Cheng

AU - Che, Dunren

AU - Zhu, Michelle

PY - 2013/7/29

Y1 - 2013/7/29

N2 - Fast and accurate estimations for complex queries are profoundly beneficial for large databases with heavy workloads. In this research, we propose a statistical summary for a database, called CS2 (Correlated Sample Synopsis), to provide rapid and accurate result size estimations for all queries with joins and arbitrary selections. Unlike the state-of-the-art techniques, CS2 does not completely rely on simple random samples, but mainly consists of correlated sample tuples that retain join relationships with less storage. We introduce a statistical technique, called reverse sample, and design a powerful estimator, called reverse estimator, to fully utilize correlated sample tuples for query estimation. We prove both theoretically and empirically that the reverse estimator is unbiased and accurate using CS2. Extensive experiments on multiple datasets show that CS2 is fast to construct and derives more accurate estimations than existing methods with the same space budget.

AB - Fast and accurate estimations for complex queries are profoundly beneficial for large databases with heavy workloads. In this research, we propose a statistical summary for a database, called CS2 (Correlated Sample Synopsis), to provide rapid and accurate result size estimations for all queries with joins and arbitrary selections. Unlike the state-of-the-art techniques, CS2 does not completely rely on simple random samples, but mainly consists of correlated sample tuples that retain join relationships with less storage. We introduce a statistical technique, called reverse sample, and design a powerful estimator, called reverse estimator, to fully utilize correlated sample tuples for query estimation. We prove both theoretically and empirically that the reverse estimator is unbiased and accurate using CS2. Extensive experiments on multiple datasets show that CS2 is fast to construct and derives more accurate estimations than existing methods with the same space budget.

KW - Database Synopsis

KW - Query Optimization

KW - Selectivity Estimation

UR - http://www.scopus.com/inward/record.url?scp=84880534257&partnerID=8YFLogxK

U2 - 10.1145/2463676.2463701

DO - 10.1145/2463676.2463701

M3 - Conference contribution

SN - 9781450320375

SP - 469

EP - 480

BT - SIGMOD 2013 - International Conference on Management of Data

ER -

Yu F, Hou WC, Luo C, Che D, Zhu M. CS2: A new database synopsis for query estimation. In SIGMOD 2013 - International Conference on Management of Data. 2013. p. 469-480 https://doi.org/10.1145/2463676.2463701