首页> 外文学位 >Reliable identification of significant sets of episodes in event sequences.
【24h】

Reliable identification of significant sets of episodes in event sequences.

机译:可靠地识别事件序列中重要事件集。

获取原文
获取原文并翻译 | 示例

摘要

In this thesis we present a solution to the problem of identification of significant sets of episodes in event sequences. In order to determine the significance of an episode in a monitored event sequence, we compare its observed frequency to its frequency in a reference sequence. The reference sequence in our work is represented by a variable-length Markov model of generating symbols in the reference sequence. An episode is significant if the probability that it would have a given frequency by chance, in the reference sequence, is very small. In order to identify significant episodes we first show how to select the sliding window size to ensure that a discovered episode is meaningful and then we show how to compute a lower threshold for under-represented and an upper threshold for overrepresented significant episodes. The frequency of occurrence alone is not enough to determine significance, i.e., an infrequent episode can be more significant than a frequent one, and the significance depends on the structure of the episode and on probabilistic characteristics of the reference and monitored event streams. As an extension, we propose a novel method for providing approximate answers, with probabilistic guarantees, to a class of ad hoc sliding window queries referencing past data in data streams. The queries in that class compute the frequency of past windows that satisfy given join conditions among tuples in a window comprising multiple streams. To represent the join conditions consisting of intra-stream and inter-stream constraints between tuples in the window we introduce a concept of a 2D-episode.
机译:在本文中,我们提出了一种解决事件序列中重要事件集的方法。为了确定事件在受监视事件序列中的重要性,我们将其观测频率与参考序列中的频率进行比较。我们工作中的参考序列由在参考序列中生成符号的变长马尔可夫模型表示。如果情节在参考序列中偶然具有给定频率的可能性很小,则该情节很重要。为了识别重要事件,我们首先展示如何选择滑动窗口大小以确保发现的事件有意义,然后我们展示如何为代表性不足的事件计算较低的阈值,并为过多表示的重要事件计算较高的阈值。仅发生的频率不足以确定重要性,即,不频繁的发作可能比频繁的发作更重要,并且显着性取决于发作的结构以及参考事件和监视事件流的概率特征。作为扩展,我们提出了一种新颖的方法,该方法可通过概率保证为一类临时滑动窗口查询提供近似答案,该查询引用数据流中的过去数据。该类中的查询计算过去的窗口的频率,这些窗口满足包含多个流的窗口中元组之间给定的连接条件。为了表示窗口中元组之间的流内约束和流间约束组成的连接条件,我们引入了2D片段的概念。

著录项

  • 作者

    Gwadera, Robert.;

  • 作者单位

    Purdue University.;

  • 授予单位 Purdue University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2005
  • 页码 112 p.
  • 总页数 112
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号