首页> 外文会议>2014 IEEE 13th International Conference on Trust, Security and Privacy in Computing and Communications >Towards Designing an Email Classification System Using Multi-view Based Semi-supervised Learning
【24h】

Towards Designing an Email Classification System Using Multi-view Based Semi-supervised Learning

机译:利用多视图半监督学习设计电子邮件分类系统

获取原文
获取原文并翻译 | 示例

摘要

The goal of email classification is to classify user emails into spam and legitimate ones. Many supervised learning algorithms have been invented in this domain to accomplish the task, and these algorithms require a large number of labeled training data. However, data labeling is a labor intensive task and requires in-depth domain knowledge. Thus, only a very small proportion of the data can be labeled in practice. This bottleneck greatly degrades the effectiveness of supervised email classification systems. In order to address this problem, in this work, we first identify some critical issues regarding supervised machine learning-based email classification. Then we propose an effective classification model based on multi-view disagreement-based semi-supervised learning. The motivation behind the attempt of using multi-view and semi-supervised learning is that multi-view can provide richer information for classification, which is often ignored by literature, and semi-supervised learning supplies with the capability of coping with labeled and unlabeled data. In the evaluation, we demonstrate that the multi-view data can improve the email classification than using a single view data, and that the proposed model working with our algorithm can achieve better performance as compared to the existing similar algorithms.
机译:电子邮件分类的目的是将用户电子邮件分类为垃圾邮件和合法电子邮件。在这个领域已经发明了许多监督学习算法来完成任务,并且这些算法需要大量的标记训练数据。但是,数据标记是一项劳动密集型任务,需要深入的领域知识。因此,实际上只有很小一部分数据可以被标记。这个瓶颈大大降低了受监管电子邮件分类系统的有效性。为了解决这个问题,在这项工作中,我们首先确定一些有关基于监督机器学习的电子邮件分类的关键问题。然后我们提出了一种基于多视角分歧的半监督学习的有效分类模型。尝试使用多视图和半监督学习的动机在于,多视图可以提供更丰富的分类信息,而文献常常忽略了这种观点,并且半监督学习提供了处理标记和未标记数据的能力。在评估中,我们证明了多视图数据比使用单视图数据可以改善电子邮件分类,并且与现有的类似算法相比,与我们的算法一起使用的建议模型可以实现更好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号