Abstract
Automatic keyword analysis is often performed around the world to limit individual access to online content. To enable citizens to freely and openly communicate on the Internet, research is required to study the predictive quality of single words to detect controversial content. This paper extends our previous work with a larger topic-diverse dataset of 1,068,621 words collected from 23 RSS feeds over a 2 month period. Reliability of prior results and the relationship between controversy and sentiment is examined by reproducing a crowd-sourced experiment. Results from the experiment suggest that controversial and not controversial words are classified by human annotators with a high degree of reliability, but unlike previous research we determine that single words are not useful for detecting controversy. In addition, while we cannot conclude that sentiment alone can be used to predict controversy we find that the variance of sentiment may be a useful metric for partitioning data into distinct clusters. Specifically, we find that higher sentiment variance provides greater discrimination quality compared to using positive and negative sentiment to classify controversial documents.
Original language | English |
---|---|
Title of host publication | Proceedings - 10th Hellenic Conference on Artificial Intelligence, SETN 2018 |
Publisher | Association for Computing Machinery |
ISBN (Electronic) | 9781450364331 |
DOIs | |
State | Published - 9 Jul 2018 |
Event | 10th Hellenic Conference on Artificial Intelligence, SETN 2018 - Patras, Greece Duration: 9 Jul 2018 → 12 Jul 2018 |
Publication series
Name | ACM International Conference Proceeding Series |
---|
Other
Other | 10th Hellenic Conference on Artificial Intelligence, SETN 2018 |
---|---|
Country | Greece |
City | Patras |
Period | 9/07/18 → 12/07/18 |
Fingerprint
Keywords
- Classification
- Controversy
- Internet censorship
- Sentiment analysis
Cite this
}
Controversy and sentiment : An exploratory study. / Kaplun, Kateryna; Leberknight, Christopher; Feldman, Anna.
Proceedings - 10th Hellenic Conference on Artificial Intelligence, SETN 2018. Association for Computing Machinery, 2018. (ACM International Conference Proceeding Series).Research output: Chapter in Book/Report/Conference proceeding › Conference contribution
TY - GEN
T1 - Controversy and sentiment
T2 - An exploratory study
AU - Kaplun, Kateryna
AU - Leberknight, Christopher
AU - Feldman, Anna
PY - 2018/7/9
Y1 - 2018/7/9
N2 - Automatic keyword analysis is often performed around the world to limit individual access to online content. To enable citizens to freely and openly communicate on the Internet, research is required to study the predictive quality of single words to detect controversial content. This paper extends our previous work with a larger topic-diverse dataset of 1,068,621 words collected from 23 RSS feeds over a 2 month period. Reliability of prior results and the relationship between controversy and sentiment is examined by reproducing a crowd-sourced experiment. Results from the experiment suggest that controversial and not controversial words are classified by human annotators with a high degree of reliability, but unlike previous research we determine that single words are not useful for detecting controversy. In addition, while we cannot conclude that sentiment alone can be used to predict controversy we find that the variance of sentiment may be a useful metric for partitioning data into distinct clusters. Specifically, we find that higher sentiment variance provides greater discrimination quality compared to using positive and negative sentiment to classify controversial documents.
AB - Automatic keyword analysis is often performed around the world to limit individual access to online content. To enable citizens to freely and openly communicate on the Internet, research is required to study the predictive quality of single words to detect controversial content. This paper extends our previous work with a larger topic-diverse dataset of 1,068,621 words collected from 23 RSS feeds over a 2 month period. Reliability of prior results and the relationship between controversy and sentiment is examined by reproducing a crowd-sourced experiment. Results from the experiment suggest that controversial and not controversial words are classified by human annotators with a high degree of reliability, but unlike previous research we determine that single words are not useful for detecting controversy. In addition, while we cannot conclude that sentiment alone can be used to predict controversy we find that the variance of sentiment may be a useful metric for partitioning data into distinct clusters. Specifically, we find that higher sentiment variance provides greater discrimination quality compared to using positive and negative sentiment to classify controversial documents.
KW - Classification
KW - Controversy
KW - Internet censorship
KW - Sentiment analysis
UR - http://www.scopus.com/inward/record.url?scp=85052020515&partnerID=8YFLogxK
U2 - 10.1145/3200947.3201016
DO - 10.1145/3200947.3201016
M3 - Conference contribution
AN - SCOPUS:85052020515
T3 - ACM International Conference Proceeding Series
BT - Proceedings - 10th Hellenic Conference on Artificial Intelligence, SETN 2018
PB - Association for Computing Machinery
ER -