首页> 外文期刊>International Journal of High Performance Computing Applications >Fitness evaluation reuse for accelerating GPU-based evolutionary induction of decision trees
【24h】

Fitness evaluation reuse for accelerating GPU-based evolutionary induction of decision trees

机译:加速基于GPU的进化诱导决策树的健身评估重用

获取原文
获取原文并翻译 | 示例
           

摘要

Decision trees (DTs) are one of the most popular white-box machine-learning techniques. Traditionally, DTs are induced using a top-down greedy search that may lead to sub-optimal solutions. One of the emerging alternatives is an evolutionary induction inspired by the biological evolution. It searches for the tree structure and tests simultaneously, which results in less complex DTs with at least comparable prediction performance. However, the evolutionary search is computationally expensive, and its effective application to big data mining needs algorithmic and technological progress. In this paper, noting that many trees or their parts reappear during the evolution, we propose a reuse strategy. A fixed number of recently processed individuals (DTs) is stored in a so-called repository. A part of the repository entry (related to fitness calculations) is maintained on a CPU side to limit CPU/GPU memory transactions. The rest of the repository entry (tree structures) is located on a GPU side to speed up searching for similar DTs. As the most time-demanding task of the induction is the DTs’ evaluation, the GPU first searches similar DTs in the repository for reuse. If it fails, the GPU has to evaluate DT from the ground up. Large artificial and real-life datasets and various repository strategies are tested. Results show that the concept of reusing information from previous generations can accelerate the original GPU-based solution further. It is especially visible for large-scale data. To give an idea of the overall acceleration scale, the proposed solution can process even billions of objects in a few hours on a single GPU workstation.
机译:决策树(DTS)是最受欢迎的白盒机学习技术之一。传统上,使用可导致次优的解决方案的自上而下的贪婪搜索来引发DTS。其中一个新兴的替代品是一种受到生物学进化的进化感应。它同时搜索树结构和测试,这导致至少具有相当的预测性能的复杂DTS。然而,进化搜索是计算昂贵的,其有效应用于大数据挖掘需求算法和技术进步。在本文中,注意到许多树木或其部件在进化期间重新出现,我们提出了一种重用策略。固定数量的最近处理的个人(DTS)存储在所谓的存储库中。存储库条目的一部分(与健身计算相关)保持在CPU侧以限制CPU / GPU内存事务。存储库条目(树结构)的其余部分位于GPU侧,以加速搜索类似的DTS。由于归纳的最苛刻任务是DTS评估,GPU首先在存储库中搜索类似的DTS以进行重用。如果失败,GPU必须从下面评估DT。测试大型人工和现实生活数据集和各种存储库策略。结果表明,从先前世代重用信息的概念可以进一步加速原始的GPU的解决方案。对于大规模数据特别可见。为了了解整体加速度规模,所提出的解决方案可以在单个GPU工作站上处理甚至数十亿个对象。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号