首页> 外文期刊>World Wide Web >Fast T-overlap query algorithms using graphics processor units and its applications in web data query
【24h】

Fast T-overlap query algorithms using graphics processor units and its applications in web data query

机译:使用图形处理器单元的快速T重叠查询算法及其在Web数据查询中的应用

获取原文
获取原文并翻译 | 示例
           

摘要

Given a collection of sets and a query set, a T-Overlap query identifies all sets having at least T common elements with the query. T-Overlap query is the foundation of set similarity query and join and plays an important role on web data query and processing, such as the behavior analysis of web users and the near duplicated detection of web documents. To address T-Overlap query efficiently, unlike traditional algorithms based on CPU, we aim at designing efficient GPU based algorithms. We firstly design inverted index in GPU, then choose ScanCount, a straightforward but efficient T-Overlap algorithm, as underlying algorithm to develop our GPU based T-Overlap algorithms. Depending on queries processed serially or in parallel, three new efficient algorithms are proposed based on our GPU based inverted index. Among all these three algorithms, GS-Parallel-Group processes a group of queries in parallel and supports a high degree of parallelism. Extensive experiments are carried out to compare our GPU based algorithms with other state-of-the-art CPU based algorithms. Results show that GS-Parallel-Group outperforms CPU based algorithms significantly.
机译:给定集合的集合和查询集合,T重叠查询标识与查询具有至少T个公共元素的所有集合。 T重叠查询是集合相似性查询和联接的基础,在Web数据查询和处理(如Web用户的行为分析和Web文档的几乎重复检测)中起着重要作用。与传统的基于CPU的算法不同,为了有效解决T-Overlap查询,我们旨在设计高效的基于GPU的算法。我们首先在GPU中设计倒排索引,然后选择一种简单但高效的T重叠算法ScanCount作为开发基于GPU的T重叠算法的基础算法。根据串行或并行处理的查询,基于基于GPU的倒排索引,提出了三种新的高效算法。在这三种算法中,GS-Parallel-Group并行处理一组查询并支持高度的并行性。进行了广泛的实验,将我们基于GPU的算法与其他基于CPU的最新算法进行了比较。结果表明,GS-Parallel-Group明显优于基于CPU的算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号