首页> 外文期刊>Expert systems with applications >Mining key information of web pages: A method and its application
【24h】

Mining key information of web pages: A method and its application

机译:网页关键信息挖掘:一种方法及其应用

获取原文
获取原文并翻译 | 示例
           

摘要

Web content mining aims to discover useful information and generate desired knowledge from a large amount of web pages. Key information, such as distinctive menu items, navigation indicators, which is embedded in web pages, can help classify the main contents of web pages and reflect certain taxonomy knowledge. Therefore, mining key information is significant in helping acquire domain knowledge and build catalogue classifiers. Current web content mining methods cannot mine such key information effectively. "Noise information" (such as advertisements) is a problem for the performance of web mining tasks. This paper proposes a method to extract key information out of web pages which contain noisy information. The method contains two steps: to extract a list of candidate key information, and then apply entropy measure to filter noisy information and discover key information. Experiment results show that this method is effective in discovering key information. With the discovered key information that reflects taxonomy knowledge, an application is developed to help ontology generation.
机译:Web内容挖掘旨在发现有用的信息并从大量的网页中生成所需的知识。嵌入在网页中的关键信息(例如独特的菜单项,导航指示器)可以帮助对网页的主要内容进行分类并反映某些分类法知识。因此,挖掘关键信息对于帮助获取领域知识和建立目录分类器具有重要意义。当前的Web内容挖掘方法无法有效地挖掘此类关键信息。 “噪声信息”(例如广告)是执行Web挖掘任务的问题。本文提出了一种从包含嘈杂信息的网页中提取关键信息的方法。该方法包括两个步骤:提取候选关键信息列表,然后应用熵测度过滤噪声信息并发现关键信息。实验结果表明,该方法可以有效地发现关键信息。利用发现的反映分类学知识的关键信息,开发了一个应用程序来帮助生成本体。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号