首页> 中文期刊> 《计算机应用与软件》 >基于类别特征选择与反馈学习随机森林算法的邮件过滤系统研究

基于类别特征选择与反馈学习随机森林算法的邮件过滤系统研究

         

摘要

针对邮件过滤系统中普遍存在的维数灾难、类别主题差异和反馈信息缺失问题,提出一种基于类别特征选择与反馈学习随机森林算法的邮件过滤模型。该方法将隐含的Dirichlet模型引入到邮件的特征选择环节,在不同类型的邮件集中建立各自的生成模型,分别搜寻构成各个主题的特征信息,有效降低冗余信息和噪声数据对分类性能的影响。反馈学习随机森林算法发挥了决策树集成与反馈学习的优势,实现邮件过滤系统的自我调节,及时捕捉垃圾邮件的变化趋势。在公开的语料库CCERT和Trec06上进行测试,并与典型算法进行比较,实验结果表明所提算法的可行性和有效性。%To solve the problems of"curse of dimensionality","diversity in the categories topic"and"lack of feedback"commonly exis-ted in email filtering system,we propose an email filtering method which is based on category feature selection and feedback learning random forest algorithm.It introduces the latent Dirichlet allocation (LDA)model to the feature selection link of email and builds the respective gen-eration model in different type of email sets to search separately the feature information forming each subject,thus effectively reduces the im-pacts of redundant information and noise data on classification performance.The feedback learning random forest algorithm plays to the advan-tages of decision trees integration and feedback learning,realises the self-regulation of the email filtering system and can catch the change trend in spam promptly.The test is done on open corpus CCERT and Trec06,and the comparison is made with typical algorithm as well.Ex-perimental results demonstrate the availability and effectiveness of the proposed algorithm.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号