...
首页> 外文期刊>Knowledge and Information Systems >Maximum entropy based significance of itemsets
【24h】

Maximum entropy based significance of itemsets

机译:基于最大熵的项集重要性

获取原文
获取原文并翻译 | 示例
           

摘要

We consider the problem of defining the significance of an itemset. We say that the itemset is significant if we are surprised by its frequency when compared to the frequencies of its sub-itemsets. In other words, we estimate the frequency of the itemset from the frequencies of its sub-itemsets and compute the deviation between the real value and the estimate. For the estimation we use Maximum Entropy and for measuring the deviation we use Kullback–Leibler divergence. A major advantage compared to the previous methods is that we are able to use richer models whereas the previous approaches only measure the deviation from the independence model. We show that our measure of significance goes to zero for derivable itemsets and that we can use the rank as a statistical test. Our empirical results demonstrate that for our real datasets the independence assumption is too strong but applying more flexible models leads to good results.
机译:我们考虑定义项目集重要性的问题。我们说,如果与子项目集的频率相比我们对它的频率感到惊讶,则该项目集很重要。换句话说,我们从子项目集的频率估计项目集的频率,并计算实际值与估计值之间的偏差。对于估计,我们使用最大熵,对于偏差,我们使用Kullback-Leibler散度。与以前的方法相比,一个主要优点是我们能够使用更丰富的模型,而以前的方法仅测量与独立模型的偏差。我们表明,对于衍生项目集,我们的重要性度量为零,并且可以将等级用作统计检验。我们的经验结果表明,对于我们的真实数据集,独立性假设过强,但应用更灵活的模型会产生良好的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号