首页> 外文会议>Workshop on Language Analysis in Social Media 2013 >Really? Well. Apparently Bootstrapping Improves the Performance of Sarcasm and Nastiness Classifiers for Online Dialogue
【24h】

Really? Well. Apparently Bootstrapping Improves the Performance of Sarcasm and Nastiness Classifiers for Online Dialogue

机译:真?好。显然,自举提高了在线对话的讽刺和肮脏分类器的性能

获取原文
获取原文并翻译 | 示例

摘要

More and more of the information on the web is dialogic, from Facebook newsfeeds, to forum conversations, to comment threads on news articles. In contrast to traditional, monologic Natural Language Processing resources such as news, highly social dialogue is frequent in social media, making it a challenging context for NLP. This paper tests a bootstrapping method, originally proposed in a monologic domain, to train classifiers to identify two different types of subjective language in dialogue: sarcasm and nastiness. We explore two methods of developing linguistic indicators to be used in a first level classifier aimed at maximizing precision at the expense of recall. The best performing classifier for the first phase achieves 54% precision and 38% recall for sarcastic utterances. We then use general syntactic patterns from previous work to create more general sarcasm indicators, improving precision to 62% and recall to 52%. To further test the generality of the method, we then apply it to bootstrapping a classifier for nastiness dialogic acts. Our first phase, using crowdsourced nasty indicators, achieves 58% precision and 49% recall, which increases to 75% precision and 62% recall when we bootstrap over the first level with generalized syntactic patterns.
机译:Web上越来越多的信息具有对话性,从Facebook新闻源到论坛对话,以评论新闻文章的主题。与新闻等传统的,单一的自然语言处理资源相反,社交媒体中经常进行高度社交对话,这使其成为NLP的挑战性环境。本文测试了一种最初在单一领域中提出的自举方法,该方法用于训练分类器以识别对话中的两种不同类型的主观语言:讽刺和讨厌。我们探索开发用于一级分类器的语言指标的两种方法,该方法旨在以最大程度地提高查全率为代价。第一阶段的最佳分类器可达到54%的准确度和38%的讽刺话语召回率。然后,我们使用先前工作中的一般句法模式来创建更一般的讽刺指标,将准确性提高到62%,并将召回率提高到52%。为了进一步测试该方法的一般性,我们将其应用于引导分类器进行肮脏的对话行为。我们的第一阶段使用众包的讨厌指标,达到58%的准确性和49%的回忆率,当我们使用广义句法模式进行第一阶段引导时,则提高到75%的准确性和62%的回忆率。

著录项

  • 来源
  • 会议地点 Atlanta GA(US)
  • 作者单位

    Natural Language and Dialogue Systems University of California, Santa Cruz 1156 High Street, Santa Cruz, CA 95064;

    Natural Language and Dialogue Systems University of California, Santa Cruz 1156 High Street, Santa Cruz, CA 95064;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号