【24h】

Parallel Algorithm for Mining Frequent Closed Sequences

机译:挖掘频繁闭合序列的并行算法

获取原文
获取原文并翻译 | 示例

摘要

Previous studies have presented convincing arguments that a frequent sequence mining algorithm should not mine all frequent sequences but only the closed ones because the latter leads to not only more compact yet complete result set but also better efficiency. However, frequent closed sequence mining is still challenging on stand-alone for its large size and high dimension. In this paper, an algorithm, PFCSeq, is presented for mining frequent closed sequence based on distributed-memory parallel machine, in which each processor mines local frequent closed sequence set independently using task parallelism with data parallelism approach, and only two communications are needed except that imbalance is detected. Therefore, time spent in communications is significantly reduced. In order to ensure good load balance among processors, a dynamic workload balance strategy is proposed. Experiments show that it is linearly scalable in terms of database size and the number of processors.
机译:先前的研究提出了令人信服的论点,即频繁序列挖掘算法不应该挖掘所有频繁序列,而应该只挖掘封闭序列,因为封闭序列不仅可以使结果集更紧凑,更完整,而且可以提高效率。但是,频繁的封闭序列挖掘在单机上仍然具有挑战性,因为它具有较大的尺寸和较高的尺寸。本文提出了一种基于分布式内存并行机的频繁闭合序列挖掘算法PFCSeq,其中每个处理器使用任务并行和数据并行方法独立挖掘局部频繁闭合序列集,除了需要进行两次通信外发现不平衡。因此,大大减少了通信时间。为了确保处理器之间的良好负载平衡,提出了一种动态工作负载平衡策略。实验表明,它在数据库大小和处理器数量方面具有线性可扩展性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号