首页> 外文会议>Conference on empirical methods in natural language processing >Social Media Text Classification under Negative Covariate Shift
【24h】

Social Media Text Classification under Negative Covariate Shift

机译:负协变量移位下的社交媒体文本分类

获取原文

摘要

In a typical social media content analysis task, the user is interested in analyzing posts of a particular topic. Identifying such posts is often formulated as a classification problem. However, this problem is challenging. One key issue is covariate shift. That is, the training data is not fully representative of the test data. We observed that the covariate shift mainly occurs in the negative data because topics discussed in social media are highly diverse and numerous, but the user-labeled negative training data may cover only a small number of topics. This paper proposes a novel technique to solve the problem. The key novelty of the technique is the transformation of document representation from the traditional n-gram feature space to a center-based similarity (CBS) space. In the CBS space, the covariate shift problem is significantly mitigated, which enables us to build much better classifiers. Experiment results show that the proposed approach markedly improves classification.
机译:在典型的社交媒体内容分析任务中,用户对分析特定主题的帖子感兴趣。识别此类职位通常被称为分类问题。但是,这个问题具有挑战性。一个关键问题是协变量转变。即,训练数据不能完全代表测试数据。我们观察到协变量偏移主要发生在负面数据中,因为在社交媒体中讨论的主题非常多样且众多,但用户标记的负面训练数据可能只涵盖了少数主题。本文提出了一种解决该问题的新技术。该技术的关键新颖之处在于将文档表示形式从传统的n-gram特征空间转换为基于中心的相似度(CBS)空间。在CBS空间中,协变量偏移问题得到了显着缓解,这使我们能够建立更好的分类器。实验结果表明,该方法显着提高了分类效果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号