首页> 外文学位 >Frequent Itemset Hiding Algorithm Using Frequent Pattern Tree Approach.
【24h】

Frequent Itemset Hiding Algorithm Using Frequent Pattern Tree Approach.

机译:使用频繁模式树方法的频繁项集隐藏算法。

获取原文
获取原文并翻译 | 示例

摘要

A problem that has been the focus of much recent research in privacy preserving data-mining is the frequent itemset hiding (FIH) problem. Identifying itemsets that appear together frequently in customer transactions is a common task in association rule mining. Organizations that share data with business partners may consider some of the frequent itemsets sensitive and aim to hide such sensitive itemsets by removing items from certain transactions. Since such modifications adversely affect the utility of the database for data mining applications, the goal is to remove as few items as possible. Since the frequent itemset hiding problem is NP-hard and practical instances of this problem are too large to be solved optimally, there is a need for heuristic methods that provide good solutions. This dissertation developed a new method called Min_Items_Removed, using the Frequent Pattern Tree (FP-Tree) that outperforms extant methods for the FIH problem. The FP-Tree enables the compression of large databases into significantly smaller data structures. As a result of this compression, a search may be performed with increased speed and efficiency.;To evaluate the effectiveness and performance of the Min_Items_Removed algorithm, eight experiments were conducted. The results showed that the Min_Items_Removed algorithm yields better quality solutions than extant methods in terms of minimizing the number of removed items. In addition, the results showed that the newly introduced metric (normalized number of leaves) is a very good indicator of the problem size or difficulty of the problem instance that is independent of the number of sensitive itemsets.
机译:在隐私保护数据挖掘中,最近研究的重点是频繁项集隐藏(FIH)问题。识别频繁出现在客户交易中的项目集是关联规则挖掘中的常见任务。与业务伙伴共享数据的组织可能会认为某些频繁的项目集比较敏感,并打算通过从某些交易中删除项目来隐藏此类敏感的项目集。由于此类修改会对数据库在数据挖掘应用程序中的效用产生不利影响,因此目标是删除尽可能少的项目。由于频繁项集隐藏问题是NP难题,并且此问题的实际情况太大而无法最佳解决,因此需要提供良好解决方案的启发式方法。本文开发了一种新的名为Min_Items_Removed的方法,该方法使用了频繁模式树(FP-Tree),它优于FIH问题的现有方法。 FP-Tree可以将大型数据库压缩为明显较小的数据结构。作为这种压缩的结果,可以以提高的速度和效率执行搜索。为了评估Min_Items_Removed算法的有效性和性能,进行了八次实验。结果表明,就最小化已删除项目的数量而言,Min_Items_Removed算法比现有方法产生的质量更好。此外,结果表明,新引入的度量标准(叶子的标准化数量)是问题大小或问题实例难度的很好指标,与敏感项目集的数量无关。

著录项

  • 作者

    Alnatsheh, Rami.;

  • 作者单位

    Nova Southeastern University.;

  • 授予单位 Nova Southeastern University.;
  • 学科 Information Technology.;Computer Science.
  • 学位 Ph.D.
  • 年度 2012
  • 页码 89 p.
  • 总页数 89
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号