TY - JOUR
T1 - The wisdom of the lexicon crowds
T2 - leveraging on decades of lexicon-based sentiment analysis for improved results
AU - Hill, Chelsey H.
AU - Fresneda, Jorge E.
AU - Anandarajan, Murugan
N1 - Publisher Copyright:
© The Author(s) 2025.
PY - 2025/12
Y1 - 2025/12
N2 - The “wisdom of the crowd” (WoC) refers to the notion that collective human knowledge is capable of outperforming even individual expert knowledge. This study investigates the application of this phenomenon to lexicon-based sentiment analysis of text data. Lexicons are frequently used to classify the sentiment of text data, particularly in the absence of sentiment class label information. We propose leveraging some of the most popular, publicly-available lexicons created in the last half century to improve sentiment analysis performance. Specifically, this research argues that the collective information provided by the thirteen lexicons included in the crowd constitutes a WoC situation that can more accurately predict the sentiment in the majority of example cases when compared to individual lexicons, lexicon ensembles, and machine learning methods. Thirteen popular sentiment-labeled text datasets, comprised of different types of text data and covering a variety of domains, are used to test this research proposition. We show that the WoC sentiment analysis achieves greater performance than individual lexicons, which are considered to be ‘experts’, and a lexicon ensemble approach. In comparing our novel approach to sentiment analysis against popular machine learning approaches, the proposed WoC method achieves superior results in the majority of examples. By overcoming many of the limitations of other approaches with high accuracy, the WoC method can provide organizations with real-time, reliable, and accurate sentiment analysis.
AB - The “wisdom of the crowd” (WoC) refers to the notion that collective human knowledge is capable of outperforming even individual expert knowledge. This study investigates the application of this phenomenon to lexicon-based sentiment analysis of text data. Lexicons are frequently used to classify the sentiment of text data, particularly in the absence of sentiment class label information. We propose leveraging some of the most popular, publicly-available lexicons created in the last half century to improve sentiment analysis performance. Specifically, this research argues that the collective information provided by the thirteen lexicons included in the crowd constitutes a WoC situation that can more accurately predict the sentiment in the majority of example cases when compared to individual lexicons, lexicon ensembles, and machine learning methods. Thirteen popular sentiment-labeled text datasets, comprised of different types of text data and covering a variety of domains, are used to test this research proposition. We show that the WoC sentiment analysis achieves greater performance than individual lexicons, which are considered to be ‘experts’, and a lexicon ensemble approach. In comparing our novel approach to sentiment analysis against popular machine learning approaches, the proposed WoC method achieves superior results in the majority of examples. By overcoming many of the limitations of other approaches with high accuracy, the WoC method can provide organizations with real-time, reliable, and accurate sentiment analysis.
KW - Lexicon-based sentiment analysis
KW - Natural language processing
KW - Opinion mining
KW - Sentiment analysis
KW - Text analytics
KW - Wisdom of crowds
UR - http://www.scopus.com/inward/record.url?scp=105006651744&partnerID=8YFLogxK
U2 - 10.1186/s40537-025-01186-7
DO - 10.1186/s40537-025-01186-7
M3 - Article
AN - SCOPUS:105006651744
SN - 2196-1115
VL - 12
JO - Journal of Big Data
JF - Journal of Big Data
IS - 1
M1 - 129
ER -