首页> 外文会议>IEEE Joint International Information Technology and Artificial Intelligence Conference >An automatic approach to extracting review link from Chinese news pages
【24h】

An automatic approach to extracting review link from Chinese news pages

机译:从中国新闻网页提取审查链接的自动方法

获取原文

摘要

Review links are widely used in some special kinds of web pages, especially news pages. They are very useful pieces of information in many applications, such as hot topic discovery and public opinion monitoring. Unfortunately, extracting review links manually from news pages is time-consuming and error-prone. Though lots of works on web data extraction have been developed, we argue that this is still not a trivial problem due to the diversity on both DOM tree structure and visual presentation. In this paper, a novel approach is proposed for automatically extracting the review links from web pages. This approach consists of two steps: first segment each news page into a set of blocks, and then identify the block(s) that contain the review link using a machine learning technique. Experimental results over a large number of Chinese news pages indicate that this approach is highly accurate.
机译:审查链接广泛用于某些特殊类型的网页,尤其是新闻页面。它们在许多应用中是非常有用的信息,例如热门主题发现和公众意见监控。不幸的是,从新闻页面手动提取审查链接是耗时和错误的。虽然已经开发了许多关于Web数据提取的作品,但我们认为这仍然不是一个由于DOM树结构和视觉演示的多样性而难度的问题。在本文中,提出了一种自动提取来自网页的审查链接的新方法。此方法由两个步骤组成:首先将每个新闻页面分段为一组块,然后识别使用计算机学习技术包含审阅链接的块。在大量中国新闻网页上的实验结果表明这种方法是高度准确的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号