Hate Speech Detection on Social Networks

Important dates

  • July 15, 2019: Registration open

  • September 20th to October 4th, 2019: Challenge time.

  • October 5th to October 6th, 2019: Semi-final list. Test on private data and decide the semi-final list.

  • October 7th to October 9th, 2019: Final list. All teams in the semi-final list write up a report and become final winners. If not, follow-up teams who wrote and submitted the report on time will replace their places.

  • Oct 13, 2019: Result announcement (workshop day).

This task aims to solve the problem of classifying hate contents on social network sites (SNSs). On social networks, the threat of abuse and harassment online means that many people stop expressing themselves and give up on seeking different opinions. SNSs struggle to effectively facilitate conversations, leading many communities to limit or completely shut down user posts/comments. Therefore, in this shared-task, we focus on annotating data to classify hate contents (i.e. contents that are rude, disrespectful or otherwise likely to make someone leave a discussion or feel unpleasant when reading them).

In this shared-task, participants are challenged to build a multi-class classification model that is capable of classifying an item to one of 3 classes (HATE, OFFENSIVE, CLEAN). You'll be using a dataset of posts and/or comments from Facebook. A good classification model will hopefully help online discussion become more productive and respectful.

  • Hate speech (HATE): an item is identified as hate speech if it (1) targets individuals or groups on the basis of their characteristics; (2) demonstrates a clear intention to incite harm, or to promote hatred; (3) may or may not use offensive or profane words. For example: “Assimilate? No they all need to go back to their own countries. #BanMuslims Sorry if someone disagrees too bad.". See the definition of Zhang et al. [1]. In contrast, “All you perverts (other than me) who posted today, needs to leave the O Board. Dfasdfdasfadfs" is an example of abusive language, which often bears the purpose of insulting individuals or groups, and can include hate speech, derogatory and offensive language.

  • Offensive but not hate speech (OFFENSIVE): an item (posts/comments) may contain offensive words but it does not target individuals or groups on the basis of their characteristics. E.g., “WTF, tomorrow is Monday already."

  • Neither offensive nor hate speech (CLEAN): normal item, it does not contain offensive language or hate speech. E.g., “She learned how to paint very hard when she was young".


F1-score  is used as the evaluation metric for this task. The final ranking of participants is based on evaluation score on the private test data.


[1] Zhang, Z., Luo, L.: Hate speech detection: A solved problem? the challenging case of long tail on twitter. CoRR abs/1803.03662 (2018), http://arxiv.org/abs/1803.03662