首页> 外文会议>International Conference on Computer Design and Applications >Algorithm of Web Page Purification Based on Improved DOM and Statistical Learning
【24h】

Algorithm of Web Page Purification Based on Improved DOM and Statistical Learning

机译:基于改进DOM和统计学习的网页净化算法

获取原文

摘要

In order to effectively remove the noisy information existed in web pages, such as advertisement, not related links, etc, and to improve the classification results, we proposed the algorithm of web page purification based on improved DOM tree and statistical learning. In this paper, we firstly establish block tree model by combining DOM tree and visual characteristics of web content, then ststistical learning methods are used to discriminate each sub-block tree to identify the main content of the theme-based web pages. Experiment shows that the method has a good purifying effect for all kinds of theme-based web pages, the method can be applied to preprocessing stage of web page classificaion, which will enchance the accuracy of classification.
机译:为了有效地删除网页中存在的嘈杂信息,例如广告,不相关的链接等,并提高分类结果,我们提出了基于改进的DOM树和统计学习的网页净化算法。在本文中,我们首先通过组合DOM树和Web内容的视觉特征来建立块树模型,然后使用StStistical学习方法来区分每个子块树来识别基于主题的网页的主要内容。实验表明,该方法对各种主题的网页具有良好的净化效果,该方法可以应用于网页分类的预处理阶段,这将取消分类的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号