HiSAT: Hierarchical Framework for Sentiment Analysis on Twitter Data

Amrutha Kommu, Snehal Patel, Sebastian Derosa, Jiayin Wang, Aparna S. Varde

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Social media websites such as Twitter have become so indispensable today that people use them almost on a daily basis for sharing their emotions, opinions, suggestions and thoughts. Motivated by such behavioral tendencies, the purpose of this study is to define an approach to automatically classify the tweets on Twitter data into two main classes, namely, hate speech and non-hate speech. This provides a valuable source of information in analyzing and understanding target audiences and spotting marketing trends. We thus propose HiSAT, a Hierarchical framework for Sentiment Analysis on Twitter data. Sentiments/opinions in tweets are highly unstructured-and do not have a proper defined sequence. They constitute a heterogeneous data from many sources having different formats, and express either positive or negative, or neutral sentiment. Hence, in HiSAT we conduct Natural Language Processing encompassing tokenization, stemming and lemmatization techniques that convert text to tokens; as well as Bag-of-Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF) techniques that convert text sentences into numeric vectors. These are then fed as inputs to Machine learning algorithms within the HiSAT framework; more specifically, Random Forest, Logistic Regression and Naïve Bayes are used as text-binary classifiers to detect hate speech and non-hate speech from the tweets. Results of experiments performed with the HiSAT framework show that Random Forest outperforms the others with a better prediction in estimating the correct labels (with accuracy above the 95% range). We present the HiSAT approach, its implementation and experiments, along with related work and ongoing research.

Original languageEnglish
Title of host publicationIntelligent Systems and Applications - Proceedings of the 2022 Intelligent Systems Conference IntelliSys Volume 1
EditorsKohei Arai
PublisherSpringer Science and Business Media Deutschland GmbH
Pages376-392
Number of pages17
ISBN (Print)9783031160714
DOIs
StatePublished - 2023
EventIntelligent Systems Conference, IntelliSys 2022 - Virtual, Online
Duration: 1 Sep 20222 Sep 2022

Publication series

NameLecture Notes in Networks and Systems
Volume542 LNNS
ISSN (Print)2367-3370
ISSN (Electronic)2367-3389

Conference

ConferenceIntelligent Systems Conference, IntelliSys 2022
CityVirtual, Online
Period1/09/222/09/22

Keywords

  • Bayesian models
  • Knowledge discovery
  • Logistic Regression
  • NLP
  • Opinion mining
  • Random Forest
  • Social media
  • Text mining

Fingerprint

Dive into the research topics of 'HiSAT: Hierarchical Framework for Sentiment Analysis on Twitter Data'. Together they form a unique fingerprint.

Cite this