TY - GEN
T1 - A Hive and SQL Case Study in Cloud Data Analytics
AU - Chandra, Shireesha
AU - Varde, Aparna S.
AU - Wang, Jiayin
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/10
Y1 - 2019/10
N2 - The digital universe is expanding at a very fast pace generating massive datasets. In order to keep up with the processing and storage needs for this big data, and to discover knowledge, we need scalable infrastructure and technologies that can access data from multiple disks simultaneously. Cloud computing provides paradigms for data analytics over such huge datasets. While SQL continues to be popular among database and data mining professionals, in recent years Hive has established itself as a rapidly advancing technology for big data which makes it highly suitable for use over the cloud. In this paper, we present investigatory research on Hive and SQL with a detailed case study between them, considering cloud data management and mining. Our work here constitutes a thorough scrutiny, focusing on processing Hive queries on cloud infrastructure considering three different approaches, and also delving into SQL processing on the cloud with similar approaches. Real datasets are used for conducting various operations using Hive and SQL. This paper conducts performance comparisons of the two technologies and explains the environment in which one is preferred over the other for processing and analyzing data. It provides recommendations for cloud data analytics, based on the case study.
AB - The digital universe is expanding at a very fast pace generating massive datasets. In order to keep up with the processing and storage needs for this big data, and to discover knowledge, we need scalable infrastructure and technologies that can access data from multiple disks simultaneously. Cloud computing provides paradigms for data analytics over such huge datasets. While SQL continues to be popular among database and data mining professionals, in recent years Hive has established itself as a rapidly advancing technology for big data which makes it highly suitable for use over the cloud. In this paper, we present investigatory research on Hive and SQL with a detailed case study between them, considering cloud data management and mining. Our work here constitutes a thorough scrutiny, focusing on processing Hive queries on cloud infrastructure considering three different approaches, and also delving into SQL processing on the cloud with similar approaches. Real datasets are used for conducting various operations using Hive and SQL. This paper conducts performance comparisons of the two technologies and explains the environment in which one is preferred over the other for processing and analyzing data. It provides recommendations for cloud data analytics, based on the case study.
KW - Big Data
KW - Cloud Computing
KW - Hadoop and Hive
KW - Performance Comparisons
KW - RDBMS and SQL
UR - http://www.scopus.com/inward/record.url?scp=85080147457&partnerID=8YFLogxK
U2 - 10.1109/UEMCON47517.2019.8992925
DO - 10.1109/UEMCON47517.2019.8992925
M3 - Conference contribution
AN - SCOPUS:85080147457
T3 - 2019 IEEE 10th Annual Ubiquitous Computing, Electronics and Mobile Communication Conference, UEMCON 2019
SP - 112
EP - 118
BT - 2019 IEEE 10th Annual Ubiquitous Computing, Electronics and Mobile Communication Conference, UEMCON 2019
A2 - Chakrabarti, Satyajit
A2 - Saha, Himadri Nath
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 10th IEEE Annual Ubiquitous Computing, Electronics and Mobile Communication Conference, UEMCON 2019
Y2 - 10 October 2019 through 12 October 2019
ER -