【24h】

Review Spam Detection Using Word Embeddings and Deep Neural Networks

机译:使用词嵌入和深度神经网络查看垃圾邮件检测

获取原文

摘要

Review spam (fake review) detection is increasingly important taking into consideration the rapid growth of internet purchases. Therefore, sophisticated spam filters must be designed to tackle the problem. Traditional machine learning algorithms use review content and other features to detect review spam. However, as demonstrated in related studies, the linguistic context of words may be of particular importance for text categorization. In order to enhance the performance of review spam detection, we propose a novel content-based approach that considers both bag-of-words and word context. More precisely, our approach utilizes n-grams and the skip-gram word embedding method to build a vector model. As a result, high-dimensional feature representation is generated. To handle the representation and classify the review spam accurately, a deep feed-forward neural network is used in the second step. To verify our approach, we use two hotel review datasets, including positive and negative reviews. We show that the proposed detection system outperforms other popular algorithms for review spam detection in terms of accuracy and area under ROC. Importantly, the system provides balanced performance on both classes, legitimate and spam, irrespective of review polarity.
机译:考虑到互联网购买的快速增长,审查垃圾邮件(虚假审查)变得越来越重要。因此,必须设计复杂的垃圾邮件过滤器来解决该问题。传统的机器学习算法使用评论内容和其他功能来检测评论垃圾邮件。但是,如相关研究所示,单词的语言环境对于文本分类可能特别重要。为了提高垃圾评论的检测性能,我们提出了一种新颖的基于内容的方法,该方法同时考虑了词袋和词上下文。更准确地说,我们的方法利用n-gram和skip-gram词嵌入方法来构建矢量模型。结果,生成了高维特征表示。为了处理表示形式并准确地对垃圾评论进行分类,第二步使用了深度前馈神经网络。为了验证我们的方法,我们使用了两个酒店评论数据集,包括正面和负面评论。我们显示,提出的检测系统在ROC下的准确性和面积方面优于其他流行的垃圾邮件检测算法。重要的是,无论审阅极性如何,该系统在合法和垃圾邮件两个类别上均提供平衡的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号