Classifying constructive comments
Keywords:Content moderation, online comments, toxicity, constructiveness, annotation, data creation, machine learning, deep learning
We introduce the Constructive Comments Corpus (C3), comprised of 12,000 annotated news comments, intended to help build new tools for online communities to improve the quality of their discussions. We define constructive comments as high-quality comments that make a contribution to the conversation. We explain the crowd worker annotation scheme and de ne a taxonomy of subcharacteristics of constructiveness. The quality of the annotation scheme and the resulting dataset is evaluated using measurements of inter-annotator agreement, expert assessment of a sample, and by the constructiveness sub-characteristics, which we show provide a proxy for the general constructiveness concept. We provide models for constructiveness trained on C3 using both feature-based and a variety of deep learning approaches and demonstrate, through domain adaptation experiments, that these models capture general rather than topic- or domain-specific characteristics of constructiveness. We also examine the role that length plays in our models, as comment length could be easily gamed if models depend heavily upon this feature. By examining the errors made by each model and their distribution by length, we show that the best performing models are effective independently of comment length. The constructiveness corpus and our experiments pave the way for a moderation tool focused on promoting comments that make a meaningful contribution, rather than only filtering out undesirable content.
How to Cite
Copyright (c) 2023 First Monday
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Authors retain copyright to their work published in First Monday. Please see the footer of each article for details.