首页> 外文会议>2011 International Conference on Research and Innovation in Information Systems >Sequential pattern mining using personalized minimum support threshold with minimum items
【24h】

Sequential pattern mining using personalized minimum support threshold with minimum items

机译:使用个性化的最小支持阈值和最少项目进行顺序模式挖掘

获取原文

摘要

One of the challenges of Sequential Pattern Mining is finding frequent sequential patterns in a huge click stream data (web logs) since the data has the issue of a very low support distribution. By applying a Frequent Pattern Discovery technique, a sequence is considered as frequent if it occurs more than the minimum support (min sup) threshold value. The conventional method of assuming one min sup value is valid for all levels of k-sequence, may have an impact on the overall results or pattern generation. In this paper, a personalized minimum support (P_minsup) threshold with user specified minimum items or min_i is introduced. The P_minsup is generated for each k-sequence by analyzing the overall support pattern distribution of the click stream data; while the min_i value gives the user the flexibility to gain control on the number of patterns to be generated on the next k-sequence by using the top min_i items. This approach is then applied in the SPADE Algorithm using vector array as an extension from the previous method of using relational database and pre-defined threshold. The result from this experiment demonstrates that P_minsup with the complement of min_i value approach is applicable in assisting the process of determining the suitable threshold value to be used in detecting users' frequent k-sequential topics in navigating the World Wide Web (WWW).
机译:顺序模式挖掘的挑战之一是在巨大的点击流数据(Web日志)中找到频繁的顺序模式,因为数据存在支持分配非常低的问题。通过应用“频繁模式发现”技术,如果序列出现的次数超过最小支持(最小阈值)阈值,则该序列被认为是频繁的。假设一个最小sup值对所有k序列级别均有效的常规方法可能会影响总体结果或模式生成。本文介绍了具有用户指定的最小项目或min_i的个性化最小支持(P_minsup)阈值。通过分析点击流数据的总体支持模式分布,为每个k序列生成P_minsup。而min_i值使用户可以灵活地通过使用顶部的min_i项目来控制在下一个k序列上要生成的模式数量。然后将这种方法应用到SPADE算法中,使用向量数组作为对以前使用关系数据库和预定义阈值的方法的扩展。该实验的结果表明,结合min_i值方法的P_minsup适用于协助确定合适的阈值的过程,该阈值可用于在浏览万维网(WWW)时检测用户的频繁k序列主题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号