首页> 外文会议>European Conference on Research and Advanced Technology for Digital Libraries(ECDL 2005); 20050918-23; Vienna(AT) >Focused Crawling Using Latent Semantic Indexing- An Application for Vertical Search Engines
【24h】

Focused Crawling Using Latent Semantic Indexing- An Application for Vertical Search Engines

机译:使用潜在语义索引的集中抓取-垂直搜索引擎的应用

获取原文
获取原文并翻译 | 示例

摘要

Vertical search engines and web portals are gaining ground over the general-purpose engines due to their limited size and their high precision for the domain they cover. The number of vertical portals has rapidly increased over the last years, making the importance of a topic-driven (focused) crawler evident. In this paper, we develop a latent semantic indexing classifier that combines link analysis with text content in order to retrieve and index domain specific web documents. We compare its efficiency with other well-known web information retrieval techniques. Our implementation presents a different approach to focused crawling and aims to overcome the size limitations of the initial training data while maintaining a high recall/precision ratio.
机译:垂直搜索引擎和Web门户由于其有限的规模和所覆盖领域的高精度而在通用引擎方面获得了发展。在过去的几年中,垂直门户的数量迅速增加,这使得主题驱动(集中)爬虫的重要性显而易见。在本文中,我们开发了一种潜在的语义索引分类器,该分类器将链接分析与文本内容结合在一起,以便检索和索引特定于域的Web文档。我们将其效率与其他知名的Web信息检索技术进行了比较。我们的实现提供了一种针对集中爬网的不同方法,旨在克服初始训练数据的大小限制,同时保持较高的查全率/精确率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号