首页> 外文期刊>Electronics and Electrical Engineering >The Impact of Feature Extraction and Selection on SMS Spam Filtering
【24h】

The Impact of Feature Extraction and Selection on SMS Spam Filtering

机译:特征提取和选择对SMS垃圾邮件过滤的影响

获取原文
获取原文并翻译 | 示例
           

摘要

This paper investigates the impact of several feature extraction and feature selection approaches on filtering of short message service (SMS) spam messages in two different languages, namely Turkish and English. The entire feature set of filtering framework consists of the features originated from the bag-of-words (BoW) model along with the ensemble of structural features (SF) specific to spam problem. The distinctive BoW features are identified using information theoretic feature selection methods. Various combinations of the BoW and SF are then fed into widely used pattern classification algorithms to classify SMS messages. The filtering framework is evaluated on both Turkish and English SMS message datasets. For this purpose, as part of the study, the first publicly available Turkish SMS message collection is constituted as well. Comprehensive experimental analysis on the respective datasets revealed that the combinations of BoW and SFs, rather than BoW features alone, provide better classification performance on both datasets. Effectiveness of the utilized feature selection methods however slightly differs in each language.
机译:本文研究了几种特征提取和特征选择方法对土耳其语和英语两种不同语言的短消息服务(SMS)垃圾邮件过滤的影响。整个过滤框架特征集包括源自词袋(BoW)模型的特征以及针对垃圾邮件问题的结构特征(SF)的集合。使用信息理论特征选择方法来识别独特的BoW特征。然后,将BoW和SF的各种组合输入到广泛使用的模式分类算法中,以对SMS消息进行分类。土耳其语和英语SMS消息数据集都对过滤框架进行了评估。为此,作为研究的一部分,还构建了第一个公开可用的土耳其SMS消息集合。对各个数据集的综合实验分析表明,BoW和SF的组合,而不是单独的BoW功能,在两个数据集上均提供了更好的分类性能。但是,每种语言使用的特征选择方法的有效性略有不同。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号