...
首页> 外文期刊>The VLDB journal >Mining frequent subgraphs over uncertain graph databases under probabilistic semantics
【24h】

Mining frequent subgraphs over uncertain graph databases under probabilistic semantics

机译:概率语义在不确定图数据库上挖掘频繁子图

获取原文
获取原文并翻译 | 示例
           

摘要

Frequent subgraph mining has been extensively studied on certain graph data. However, uncertainty is intrinsic in graph data in practice, but there is very few work on mining uncertain graph data. This paper focuses on mining frequent subgraphs over uncertain graph data under the probabilistic semantics. Specifically, a measure called φ-frequent probability is introduced to evaluate the degree of recurrence of subgraphs. Given a set of uncertain graphs and two real numbers 0 < φ, τ < 1, the goal is to quickly find all subgraphs with φ-frequent probability at least τ. Due to the NP-hardness of the problem and to the #P-hardness of computing the φ-frequent probability of a subgraph, an approximate mining algorithm is proposed to produce an (ε, δ)-approximate set ∏ of "frequent subgraphs", where 0 < ε < τ is error tolerance, and 0 < δ < 1 is a confidence bound. The algorithm guarantees that (1) any frequent subgraph S is contained in ∏ with probability at least ((1 - δ)/2)~s, where s is the number of edges in S; (2) any infrequent subgraph with φ-frequent probability less than τ-ε is contained in ∏ with probability at most δ/2. The theoretical analysis shows that to obtain any frequent subgraph with probability at least 1 - Δ, the input parameter S of the algorithm must be set to at most 1 - 2(1 - Δ)~(l/e)_(max), where 0 < Δ < 1, and e_(max) is the maximum number of edges in frequent subgraphs. Extensive experiments on real uncertain graph data verify that the proposed algorithm is practically efficient and has very high approximation quality. Moreover, the difference between the probabilistic semantics and the expected semantics on mining frequent subgraphs over uncertain graph data has been discussed in this paper for the first time.
机译:频繁的子图挖掘已在某些图数据上进行了广泛的研究。但是,在实践中,不确定性是图形数据的内在因素,但是挖掘不确定性图形数据的工作很少。本文重点研究概率语义下不确定图数据上的频繁子图挖掘。具体而言,引入了一种称为φ概率的度量,以评估子图的重复程度。给定一组不确定图和两个实数0 <φ,τ<1,目标是快速找到所有φ概率至少为τ的子图。由于问题的NP难度和计算子图的φ概率的#P难度,提出了一种近似挖掘算法来生成(ε,δ)近似的“频繁子图”集合∏ ,其中0 <ε<τ是容错能力,0 <δ<1是置信界。该算法保证(1)在frequent中包含任何频繁的子图S,且概率至少为((1-δ)/ 2)〜s,其中s是S中的边数; (2)在∏中包含φ概率小于τ-ε的任何不频繁子图,概率最大为δ/ 2。理论分析表明,要获得概率至少为1-Δ的频繁子图,算法的输入参数S必须设置为至多1-2(1-Δ)〜(l / e)_(max),其中0 <Δ<1,并且e_(max)是频繁子图中的最大边数。对实际不确定图数据的大量实验证明,该算法实用有效,并且具有很高的近似质量。此外,本文首次讨论了在不确定的图数据上挖掘频繁子图的概率语义和预期语义之间的差异。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号