Detecting censorable content on sina weibo

A pilot study

Research output: Chapter in Book/Report/Conference proceedingConference contributionResearchpeer-review

Abstract

This study provides preliminary insights into the linguistic features that contribute to Internet censorship in mainland China. We collected a corpus of 344 censored and uncensored microblog posts that were published on Sina Weibo and built a Naive Bayes classifier based on the linguistic, topic-independent, features. The classifier achieves a 79.34% accuracy in predicting whether a blog post would be censored on Sina Weibo.

Original languageEnglish
Title of host publicationProceedings - 10th Hellenic Conference on Artificial Intelligence, SETN 2018
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450364331
DOIs
StatePublished - 9 Jul 2018
Event10th Hellenic Conference on Artificial Intelligence, SETN 2018 - Patras, Greece
Duration: 9 Jul 201812 Jul 2018

Publication series

NameACM International Conference Proceeding Series

Other

Other10th Hellenic Conference on Artificial Intelligence, SETN 2018
CountryGreece
CityPatras
Period9/07/1812/07/18

Fingerprint

Linguistics
Classifiers
Blogs
Internet

Keywords

  • Chinese social media, censorship detection

Cite this

Ng, K. Y., Feldman, A., & Leberknight, C. (2018). Detecting censorable content on sina weibo: A pilot study. In Proceedings - 10th Hellenic Conference on Artificial Intelligence, SETN 2018 (ACM International Conference Proceeding Series). Association for Computing Machinery. https://doi.org/10.1145/3200947.3201037
Ng, Kei Yin ; Feldman, Anna ; Leberknight, Christopher. / Detecting censorable content on sina weibo : A pilot study. Proceedings - 10th Hellenic Conference on Artificial Intelligence, SETN 2018. Association for Computing Machinery, 2018. (ACM International Conference Proceeding Series).
@inproceedings{e4560f4527434e8fa6aaf99bf91d525c,
title = "Detecting censorable content on sina weibo: A pilot study",
abstract = "This study provides preliminary insights into the linguistic features that contribute to Internet censorship in mainland China. We collected a corpus of 344 censored and uncensored microblog posts that were published on Sina Weibo and built a Naive Bayes classifier based on the linguistic, topic-independent, features. The classifier achieves a 79.34{\%} accuracy in predicting whether a blog post would be censored on Sina Weibo.",
keywords = "Chinese social media, censorship detection",
author = "Ng, {Kei Yin} and Anna Feldman and Christopher Leberknight",
year = "2018",
month = "7",
day = "9",
doi = "10.1145/3200947.3201037",
language = "English",
series = "ACM International Conference Proceeding Series",
publisher = "Association for Computing Machinery",
booktitle = "Proceedings - 10th Hellenic Conference on Artificial Intelligence, SETN 2018",

}

Ng, KY, Feldman, A & Leberknight, C 2018, Detecting censorable content on sina weibo: A pilot study. in Proceedings - 10th Hellenic Conference on Artificial Intelligence, SETN 2018. ACM International Conference Proceeding Series, Association for Computing Machinery, 10th Hellenic Conference on Artificial Intelligence, SETN 2018, Patras, Greece, 9/07/18. https://doi.org/10.1145/3200947.3201037

Detecting censorable content on sina weibo : A pilot study. / Ng, Kei Yin; Feldman, Anna; Leberknight, Christopher.

Proceedings - 10th Hellenic Conference on Artificial Intelligence, SETN 2018. Association for Computing Machinery, 2018. (ACM International Conference Proceeding Series).

Research output: Chapter in Book/Report/Conference proceedingConference contributionResearchpeer-review

TY - GEN

T1 - Detecting censorable content on sina weibo

T2 - A pilot study

AU - Ng, Kei Yin

AU - Feldman, Anna

AU - Leberknight, Christopher

PY - 2018/7/9

Y1 - 2018/7/9

N2 - This study provides preliminary insights into the linguistic features that contribute to Internet censorship in mainland China. We collected a corpus of 344 censored and uncensored microblog posts that were published on Sina Weibo and built a Naive Bayes classifier based on the linguistic, topic-independent, features. The classifier achieves a 79.34% accuracy in predicting whether a blog post would be censored on Sina Weibo.

AB - This study provides preliminary insights into the linguistic features that contribute to Internet censorship in mainland China. We collected a corpus of 344 censored and uncensored microblog posts that were published on Sina Weibo and built a Naive Bayes classifier based on the linguistic, topic-independent, features. The classifier achieves a 79.34% accuracy in predicting whether a blog post would be censored on Sina Weibo.

KW - Chinese social media, censorship detection

UR - http://www.scopus.com/inward/record.url?scp=85052017902&partnerID=8YFLogxK

U2 - 10.1145/3200947.3201037

DO - 10.1145/3200947.3201037

M3 - Conference contribution

T3 - ACM International Conference Proceeding Series

BT - Proceedings - 10th Hellenic Conference on Artificial Intelligence, SETN 2018

PB - Association for Computing Machinery

ER -

Ng KY, Feldman A, Leberknight C. Detecting censorable content on sina weibo: A pilot study. In Proceedings - 10th Hellenic Conference on Artificial Intelligence, SETN 2018. Association for Computing Machinery. 2018. (ACM International Conference Proceeding Series). https://doi.org/10.1145/3200947.3201037