Detecting censorable content on sina weibo: A pilot study

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

This study provides preliminary insights into the linguistic features that contribute to Internet censorship in mainland China. We collected a corpus of 344 censored and uncensored microblog posts that were published on Sina Weibo and built a Naive Bayes classifier based on the linguistic, topic-independent, features. The classifier achieves a 79.34% accuracy in predicting whether a blog post would be censored on Sina Weibo.

Original languageEnglish
Title of host publicationProceedings - 10th Hellenic Conference on Artificial Intelligence, SETN 2018
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450364331
DOIs
Publication statusPublished - 9 Jul 2018
Event10th Hellenic Conference on Artificial Intelligence, SETN 2018 - Patras, Greece
Duration: 9 Jul 201812 Jul 2018

Publication series

NameACM International Conference Proceeding Series

Other

Other10th Hellenic Conference on Artificial Intelligence, SETN 2018
CountryGreece
CityPatras
Period9/07/1812/07/18

    Fingerprint

Keywords

  • Chinese social media, censorship detection

Cite this

Ng, K. Y., Feldman, A., & Leberknight, C. (2018). Detecting censorable content on sina weibo: A pilot study. In Proceedings - 10th Hellenic Conference on Artificial Intelligence, SETN 2018 (ACM International Conference Proceeding Series). Association for Computing Machinery. https://doi.org/10.1145/3200947.3201037