首页> 外文学位 >Supporting efficient and scalable frequent pattern mining.
【24h】

Supporting efficient and scalable frequent pattern mining.

机译:支持高效且可扩展的频繁模式挖掘。

获取原文
获取原文并翻译 | 示例

摘要

With the large amount of data collected in various applications, data mining has become an essential tool for data analysis and decision support. Frequent itemset mining is an important problem in the data mining area with a wide range of applications. In this dissertation, we investigate several techniques to support efficient and scalable frequent itemset mining.; We first identify the key factors of a frequent itemset mining algorithm, and propose an algorithm AFOPT for efficient mining of frequent itemsets. The AFOPT algorithm adopts the pattern growth approach. It uses a new structure, called Ascending Frequency Ordered Prefix-Trie (or AFOPT for short), and two other structures to store conditional databases. An adaptive strategy is employed to choose proper structures according to the density of the conditional databases, which makes the AFOPT algorithm perform well on both sparse and dense databases.; To deal with very large databases with millions of transactions, we propose a Search Space based Partitioning approach SSP for scalable out-of-core mining. The novel idea of SSP is to partition the database based on the search space. Different partitions share data but not frequent itemsets. As such, frequent itemsets can be mined from each partition without the need to scan the whole database to verify their support. Furthermore, techniques are developed so that the available memory can be utilized during the mining process to minimize disk I/O. Our experiment results show that the proposed algorithms achieve significant performance improvement over existing out-of-core mining algorithms.; Many decision support systems need to support online interactive frequent itemset mining, which is very challenging because frequent itemset mining is a computation intensive repetitive process. We propose a compact disk-based data structure---CFP-tree (Condensed Frequent Pattern tree) for storing precomputed frequent itemsets to support online mining requests. Efficient algorithms for retrieving frequent itemsets and association rules from a CFP-tree, as well as efficient algorithms to construct and update a CFP-tree, are developed. One obstacle to online association rule mining is the undesirably large result size. We introduce the concept of frequent itemset class and the class association rule to reduce the result size without information loss. The benefits of organizing itemsets into classes are two-fold. On the one hand, the output size is reduced and it is easier for users to browse the results. On the other hand, the computation cost is saved because the rule generation cost of the itemsets in the same class is shared. Our performance study demonstrates that with CFP-tree and the concept of frequent itemset class, frequent itemset and association rule mining requests can be responded to promptly.
机译:随着在各种应用程序中收集大量数据,数据挖掘已成为数据分析和决策支持的重要工具。频繁项集挖掘是数据挖掘领域中一个重要的问题,具有广泛的应用范围。本文研究了几种支持有效且可扩展的频繁项集挖掘的技术。我们首先确定频繁项集挖掘算法的关键因素,然后提出一种用于高效挖掘频繁项集的算法AFOPT。 AFOPT算法采用模式增长方法。它使用一种称为“升序频率前缀-Trie”(简称“ AFOPT”)的新结构以及另外两个结构来存储条件数据库。根据条件数据库的密度,采用自适应策略选择合适的结构,这使AFOPT算法在稀疏和密集数据库上都能表现良好。为了处理具有数百万个事务的超大型数据库,我们提出了一种基于搜索空间的分区方法SSP,用于可扩展的内核外挖掘。 SSP的新颖思想是根据搜索空间对数据库进行分区。不同的分区共享数据,但不共享频繁的项目集。这样,可以从每个分区中挖掘频繁的项目集,而无需扫描整个数据库以验证其支持。此外,还开发了一些技术,以便可以在挖掘过程中利用可用内存以最小化磁盘I / O。我们的实验结果表明,与现有的核外挖掘算法相比,所提算法具有显着的性能提升。许多决策支持系统需要支持在线交互式频繁项目集挖掘,这非常具有挑战性,因为频繁项目集挖掘是一个计算密集型重复过程。我们提出了一种基于磁盘的紧凑数据结构-CFP树(压缩频繁模式树),用于存储预先计算的频繁项目集以支持在线挖掘请求。开发了用于从CFP树检索频繁项集和关联规则的高效算法,以及用于构造和更新CFP树的高效算法。在线关联规则挖掘的一个障碍是结果大小过大。我们引入频繁项集类的概念和类关联规则,以减少结果大小而不会丢失信息。将项目集组织到类中的好处是双重的。一方面,减小了输出大小,使用户更容易浏览结果。另一方面,因为共享了同一类别中的项目集的规则生成成本,所以节省了计算成本。我们的性能研究表明,使用CFP-tree和频繁项集类的概念,可以快速响应频繁项集和关联规则挖掘请求。

著录项

  • 作者

    Liu, Guimei.;

  • 作者单位

    Hong Kong University of Science and Technology (People's Republic of China).;

  • 授予单位 Hong Kong University of Science and Technology (People's Republic of China).;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2005
  • 页码 134 p.
  • 总页数 134
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号