首页> 外文期刊>Expert Systems >Efficient weighted probabilistic frequent itemset mining in uncertain databases
【24h】

Efficient weighted probabilistic frequent itemset mining in uncertain databases

机译:在不确定数据库中有效的加权概率频繁漏洞挖掘

获取原文
获取原文并翻译 | 示例
           

摘要

Uncertain data mining has attracted so much interest in many emerging applications over the past decade. An issue of particular interest is to discover the frequent itemsets in uncertain databases. As an item would not appear in a transaction of such database for certain, several probability models are presented to measure the frequency of an itemset, and the frequent itemset over probabilistic data generally has two different definitions: the expected support-based frequent itemset and probabilistic frequent itemset. Meanwhile, it is noted that the frequency itself cannot identify useful or meaningful patterns in some scenarios. Other measures such as the importance of items should be also taken into account. To this end, some studies recently have been done on weighted (importance) frequent itemset mining in uncertain databases. However, they are only designed for the expected support-based frequent itemset, and suffer from low efficiency due to generating too many frequent itemset candidates. To address this issue, we propose a novel weighted probabilistic frequent itemsets (w-PFIs) algorithm. Moreover, we derive a probability model for the support of a w-PFI candidate in our method and present three pruning techniques to narrow the search space and remove the unpromising candidates immediately. Extensive experiments have been conducted on both real and synthetic datasets, to evaluate the performance of our w-PFI algorithm in terms of runtime, accuracy and scalability. Results show that our algorithm yields the best performance among the existing algorithms.
机译:在过去十年中,不确定的数据挖掘吸引了许多新兴应用的兴趣。特别感兴趣的问题是在不确定数据库中发现频繁的项目集。由于某些项目不会出现在此类数据库的事务中,提出了几种概率模型来测量项目集的频率,并且概率数据的频繁项目集通常具有两个不同的定义:基于预期的支持的频繁项目集和概率频繁的项目集。同时,有人指出,频率本身不能在某些情况下识别有用或有意义的模式。还应考虑其他措施,例如物品的重要性。为此,最近在不确定数据库中对加权(重要性)频繁的项目集挖掘进行了一些研究。但是,它们仅设计用于基于预期的基于支持的频繁项目集,并且由于产生太多频繁的项目集候选而导致的低效率。为了解决这个问题,我们提出了一种新的加权概率频繁项目集(W-PFIS)算法。此外,我们从我们的方法中推导了支持W-PFI候选的概率模型,并提出了三种修剪技术来缩小搜索空间并立即移除未妥协的候选。在实际和合成数据集中进行了广泛的实验,以评估我们的W-PFI算法在运行时,准确性和可扩展性方面的性能。结果表明,我们的算法在现有算法之间产生了最佳性能。

著录项

  • 来源
    《Expert Systems》 |2021年第5期|e12551.1-e12551.17|共17页
  • 作者单位

    Dalian Maritime Univ Informat Sci & Technol Coll Dalian Peoples R China;

    Dalian Maritime Univ Informat Sci & Technol Coll Dalian Peoples R China;

    Dalian Ocean Univ Sch Informat Engn Dalian Peoples R China;

    Dalian Maritime Univ Informat Sci & Technol Coll Dalian Peoples R China;

    Dalian Maritime Univ Informat Sci & Technol Coll Dalian Peoples R China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    probability model; pruning; uncertain database; weighted probabilistic frequent itemset;

    机译:概率模型;修剪;不确定数据库;加权概率频繁项目集;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号