TY - GEN
T1 - Detecting censorable content on sina weibo
T2 - 10th Hellenic Conference on Artificial Intelligence, SETN 2018
AU - Ng, Kei Yin
AU - Feldman, Anna
AU - Leberknight, Chris
N1 - Publisher Copyright:
© 2018 Association for Computing Machinery.
PY - 2018/7/9
Y1 - 2018/7/9
N2 - This study provides preliminary insights into the linguistic features that contribute to Internet censorship in mainland China. We collected a corpus of 344 censored and uncensored microblog posts that were published on Sina Weibo and built a Naive Bayes classifier based on the linguistic, topic-independent, features. The classifier achieves a 79.34% accuracy in predicting whether a blog post would be censored on Sina Weibo.
AB - This study provides preliminary insights into the linguistic features that contribute to Internet censorship in mainland China. We collected a corpus of 344 censored and uncensored microblog posts that were published on Sina Weibo and built a Naive Bayes classifier based on the linguistic, topic-independent, features. The classifier achieves a 79.34% accuracy in predicting whether a blog post would be censored on Sina Weibo.
KW - Chinese social media, censorship detection
UR - http://www.scopus.com/inward/record.url?scp=85052017902&partnerID=8YFLogxK
U2 - 10.1145/3200947.3201037
DO - 10.1145/3200947.3201037
M3 - Conference contribution
AN - SCOPUS:85052017902
T3 - ACM International Conference Proceeding Series
BT - Proceedings - 10th Hellenic Conference on Artificial Intelligence, SETN 2018
PB - Association for Computing Machinery
Y2 - 9 July 2018 through 12 July 2018
ER -