TY - GEN
T1 - Controversy and sentiment
T2 - 10th Hellenic Conference on Artificial Intelligence, SETN 2018
AU - Kaplun, Kateryna
AU - Leberknight, Christopher
AU - Feldman, Anna
N1 - Publisher Copyright:
© 2018 Association for Computing Machinery.
PY - 2018/7/9
Y1 - 2018/7/9
N2 - Automatic keyword analysis is often performed around the world to limit individual access to online content. To enable citizens to freely and openly communicate on the Internet, research is required to study the predictive quality of single words to detect controversial content. This paper extends our previous work with a larger topic-diverse dataset of 1,068,621 words collected from 23 RSS feeds over a 2 month period. Reliability of prior results and the relationship between controversy and sentiment is examined by reproducing a crowd-sourced experiment. Results from the experiment suggest that controversial and not controversial words are classified by human annotators with a high degree of reliability, but unlike previous research we determine that single words are not useful for detecting controversy. In addition, while we cannot conclude that sentiment alone can be used to predict controversy we find that the variance of sentiment may be a useful metric for partitioning data into distinct clusters. Specifically, we find that higher sentiment variance provides greater discrimination quality compared to using positive and negative sentiment to classify controversial documents.
AB - Automatic keyword analysis is often performed around the world to limit individual access to online content. To enable citizens to freely and openly communicate on the Internet, research is required to study the predictive quality of single words to detect controversial content. This paper extends our previous work with a larger topic-diverse dataset of 1,068,621 words collected from 23 RSS feeds over a 2 month period. Reliability of prior results and the relationship between controversy and sentiment is examined by reproducing a crowd-sourced experiment. Results from the experiment suggest that controversial and not controversial words are classified by human annotators with a high degree of reliability, but unlike previous research we determine that single words are not useful for detecting controversy. In addition, while we cannot conclude that sentiment alone can be used to predict controversy we find that the variance of sentiment may be a useful metric for partitioning data into distinct clusters. Specifically, we find that higher sentiment variance provides greater discrimination quality compared to using positive and negative sentiment to classify controversial documents.
KW - Classification
KW - Controversy
KW - Internet censorship
KW - Sentiment analysis
UR - http://www.scopus.com/inward/record.url?scp=85052020515&partnerID=8YFLogxK
U2 - 10.1145/3200947.3201016
DO - 10.1145/3200947.3201016
M3 - Conference contribution
AN - SCOPUS:85052020515
T3 - ACM International Conference Proceeding Series
BT - Proceedings - 10th Hellenic Conference on Artificial Intelligence, SETN 2018
PB - Association for Computing Machinery
Y2 - 9 July 2018 through 12 July 2018
ER -