首页> 外文会议>Conference on empirical methods in natural language processing >Datasets of Slovene and Croatian Moderated News Comments
【24h】

Datasets of Slovene and Croatian Moderated News Comments

机译:斯洛别诺和克罗地亚的数据集适度的新闻评论

获取原文

摘要

This paper presents two large newly con-structed datasets of moderated news comments from two highly popular online news por-tals in the respective countries: the Slovene RTV MCC and the Croatian 24sata. The datasets are analyzed by performing manual annotation of the types of the content which have been deleted by moderators and by in-vestigating deletion trends among users and threads. Next, initial experiments on auto-matically detecting the deleted content in the datasets are presented. Both datasets are pub-lished in encrypted form, to enable others to perform experiments on detecting content to be deleted without revealing potentially inap-propriate content. Finally, the baseline classi-fication models trained on the non-encrypted datasets are disseminated as well to enable real-world use.
机译:本文介绍了两个大型新配置的新闻新闻评论中的两个高度受欢迎的在线新闻POR-TALS:Slovene RTV MCC和克罗地亚24SATA。通过执行由中风器删除的内容类型的手动注释以及通过在用户和线程之间的缺失删除趋势进行丢弃的删除趋势来分析数据集。接下来,介绍了在数据集中自动检测删除内容的初始实验。两个数据集都以加密的形式进行了Pub-LiShed,以使他人能够在不显示潜在的In-Propriate内容的情况下进行删除的内容来执行实验。最后,传播了在非加密数据集上培训的基线分类模型,也可以实现现实世界。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号