首页> 外文期刊>Artificial Intelligence Research >Heavy path based super-sequence frequent pattern mining on web log dataset
【24h】

Heavy path based super-sequence frequent pattern mining on web log dataset

机译:Web日志数据集上基于重路径的超序列频繁模式挖掘

获取原文
获取原文并翻译 | 示例
           

摘要

Mining web log datasets has been extensively studied using Frequent Pattern Mining (FPM) and its various other forms. Identifying frequent patterns in different sequences can help in analyzing the most common sub-sequences (e.g., the pages visited together). However, this approach would not be able to identify general structures spanning over multiple sequences. In response to understanding general structures, we introduce a new form of sequential pattern mining called super-sequence frequent pattern mining (SS-FPM). In contrast to sub-sequences determined by FPM, SS-FPM determines the super-sequences that can contain the common parts from different sequences. This can be useful in understanding the general behavior/flow of users in web usage mining, classifying web pages and users, making predictions etc. In essence, finding frequent super-sequence patterns turns out to be related to the well-known heaviest (longest) path problem in graphs, which is known to be NP-hard. Accordingly, we transform a given sequential dataset into a sequence graph and formulate the problem as k-hop heaviest path problem. We then propose an efficient heuristic called sequence matrix method using dynamic programming techniques. We compared our method to the existing Heavypath method. The results show that our method is more efficient especially on large datasets.
机译:使用频繁模式挖掘(FPM)及其各种其他形式对挖掘Web日志数据集进行了广泛的研究。识别不同序列中的频繁模式可以帮助分析最常见的子序列(例如,一起访问的页面)。但是,这种方法将无法识别跨越多个序列的一般结构。为了响应对一般结构的理解,我们介绍了一种新型的顺序模式挖掘形式,称为超序列频繁模式挖掘(SS-FPM)。与FPM确定的子序列相反,SS-FPM确定可以包含来自不同序列的公共部分的超序列。这对于了解Web使用挖掘中用户的一般行为/流向,对Web页面和用户进行分类,进行预测等很有用。本质上,发现频繁的超序列模式与众所周知的最重(最长)相关。 )在图中的路径问题,这是已知的NP问题。因此,我们将给定的顺序数据集转换为序列图,并将该问题表述为k跳最重路径问题。然后,我们提出了一种使用动态编程技术的有效启发式方法,称为序列矩阵方法。我们将我们的方法与现有的Heavypath方法进行了比较。结果表明,我们的方法效率更高,特别是在大型数据集上。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号