...
首页> 外文期刊>Foundations of computing and decision sciences >JUICER -- A DATA MINING APPROACH TO INFORMATION EXTRACTION FROM THE WWW
【24h】

JUICER -- A DATA MINING APPROACH TO INFORMATION EXTRACTION FROM THE WWW

机译:JUICER-从WWW提取信息的数据挖掘方法

获取原文
获取原文并翻译 | 示例
           

摘要

We present a novel approach to automatic text mining on the World Wide Web. Considering the fact that the enormously dynamic growth of the WWW results in a need for new, more powerful information extraction tools we designed and implemented a system, which adapts techniques originally introduced in the field of data mining. We believe that similar systems, Which usually base on machine learning or natural language processing methods, can prove to be ineffective When dealing with the very large numbers of hypertext documents of different structure and subject. Moreover, such systems tend to treat HTML documents as plain texts not taking into account the additional information contained in their markup tags.
机译:我们提出了一种在万维网上自动文本挖掘的新颖方法。考虑到WWW的巨大动态增长导致需要新的,功能更强大的信息提取工具这一事实,我们设计并实现了一个系统,该系统适应了最初在数据挖掘领域引入的技术。我们认为,通常基于机器学习或自然语言处理方法的类似系统在处理大量不同结构和主题的超文本文档时可能会失效。此外,这样的系统倾向于将HTML文档视为纯文本,而不考虑其标记标签中包含的其他信息。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号