A Hive and SQL Case Study in Cloud Data Analytics

Shireesha Chandra, Aparna S. Varde, Jiayin Wang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

The digital universe is expanding at a very fast pace generating massive datasets. In order to keep up with the processing and storage needs for this big data, and to discover knowledge, we need scalable infrastructure and technologies that can access data from multiple disks simultaneously. Cloud computing provides paradigms for data analytics over such huge datasets. While SQL continues to be popular among database and data mining professionals, in recent years Hive has established itself as a rapidly advancing technology for big data which makes it highly suitable for use over the cloud. In this paper, we present investigatory research on Hive and SQL with a detailed case study between them, considering cloud data management and mining. Our work here constitutes a thorough scrutiny, focusing on processing Hive queries on cloud infrastructure considering three different approaches, and also delving into SQL processing on the cloud with similar approaches. Real datasets are used for conducting various operations using Hive and SQL. This paper conducts performance comparisons of the two technologies and explains the environment in which one is preferred over the other for processing and analyzing data. It provides recommendations for cloud data analytics, based on the case study.

Original languageEnglish
Title of host publication2019 IEEE 10th Annual Ubiquitous Computing, Electronics and Mobile Communication Conference, UEMCON 2019
EditorsSatyajit Chakrabarti, Himadri Nath Saha
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages112-118
Number of pages7
ISBN (Electronic)9781728138855
DOIs
StatePublished - Oct 2019
Event10th IEEE Annual Ubiquitous Computing, Electronics and Mobile Communication Conference, UEMCON 2019 - New York City, United States
Duration: 10 Oct 201912 Oct 2019

Publication series

Name2019 IEEE 10th Annual Ubiquitous Computing, Electronics and Mobile Communication Conference, UEMCON 2019

Conference

Conference10th IEEE Annual Ubiquitous Computing, Electronics and Mobile Communication Conference, UEMCON 2019
Country/TerritoryUnited States
CityNew York City
Period10/10/1912/10/19

Keywords

  • Big Data
  • Cloud Computing
  • Hadoop and Hive
  • Performance Comparisons
  • RDBMS and SQL

Fingerprint

Dive into the research topics of 'A Hive and SQL Case Study in Cloud Data Analytics'. Together they form a unique fingerprint.

Cite this