CrawlPart: Creating Crawl Partitions in Parallel Crawlers

机译：CrawlPart：在并行爬网程序中创建爬网分区

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

With the ever proliferating size and scale of the WWW [1], efficient ways of exploring content are of increasing importance. How can we efficiently retrieve information from it through crawling? And in this "era of tera" and multi-core processors, we ought to think of multi-threaded processes as a serving solution. So, even better how can we improve the crawling performance by using parallel crawlers that work independently? The paper devotes to the fundamental advantages and challenges arising from the design of parallel crawlers [4]. The paper mainly focuses on the aspect of URL distribution among the various parallel crawling processes. How to distribute URLs from the URL frontier to the various concurrently executing crawling process threads is an orthogonal problem. The paper provides a solution to the problem by designing a framework that partitions the URL frontier into a several URL queues by ordering the URLs within each of the distributed set of URLs.

机译：随着WWW [1]的规模和规模的不断扩大，有效的内容浏览方式变得越来越重要。我们如何通过爬网有效地从中检索信息？在这个“ tera时代”和多核处理器中，我们应该将多线程进程视为服务解决方案。因此，更好的是，如何通过使用独立工作的并行搜寻器来提高搜寻性能？本文致力于平行履带设计的基本优点和挑战[4]。本文主要侧重于各种并行爬网过程之间的URL分发方面。如何将URL的URL边界中的URL分发到同时执行的各种爬网过程线程是一个正交的问题。本文通过设计一个框架来解决该问题，该框架通过在每个分布式URL集中对URL进行排序来将URL边界划分为几个URL队列。

著录项

来源
《International Symposium on Computational and Business Intelligence》|2013年|137-142|共6页
会议地点
作者
Gupta S.; Bhatia K.K.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Scalability; URL distribution; WWW; Web-Partitioning; parallel crawler; search engine;

机译：可扩展性; URL分配; WWW; Web分区;并行搜寻器;搜索引擎;

相似文献

外文文献
中文文献
专利

1. Highly Efficient Architecture for Scalable Focused Crawling Using Incremental Parallel Web Crawler [J] . P. Jaganathan, T. Karthikeyan Journal of computer sciences . 2015,第1期

机译：高效的架构，可使用增量并行Web爬网程序进行可扩展的集中爬网
2. Highly Efficient Architecture for Scalable Focused Crawling Using Incremental Parallel Web Crawler | Science Publications [J] . P. Jaganathan, T. Karthikeyan Journal of computer sciences . 2014,第1期

机译：高效的体系结构，可使用增量并行Web爬网程序进行可扩展的集中爬网|科学出版物
3. WebParF:A Web Partitioning Framework for Parallel Crawler [J] . Sonali Gupta, Komal Bhatia International Journal on Computer Science and Engineering . 2013,第8期

机译：WebParF：用于并行爬网程序的Web分区框架
4. CrawlPart: Creating Crawl Partitions in Parallel Crawlers [C] . Gupta S., Bhatia K.K. International Symposium on Computational and Business Intelligence . 2013

机译：crawlpart：在并行爬网程序中创建爬网分区
5. Learning to crawl: Classifier-guided topical crawlers. [D] . Pant, Gautam. 2004

机译：学习爬网：分类器指导的主题爬网程序。
6. Crawling Creating Creatures On Beckett’s liminal minds [O] . Marco Bernini -1

机译：在Beckett有限的思想上爬行创造生物
7. Compound locomotion control system combining crawling and walking for multi-crawler multi-arm robot to adapt unstructured and unknown terrain [O] . Kui Chen, Mitsuhiro Kamezaki, Takahiro Katano, 2018

机译：复合运动控制系统结合爬行和走动多履带式多臂机器人，适应非结构化和未知地形

CrawlPart: Creating Crawl Partitions in Parallel Crawlers

摘要

著录项

相似文献

相关主题

期刊订阅