首页> 外文期刊>Knowledge and Information Systems >Detecting anomalies in cross-classified streams: a Bayesian approach
【24h】

Detecting anomalies in cross-classified streams: a Bayesian approach

机译:检测交叉分类流中的异常:贝叶斯方法

获取原文
获取原文并翻译 | 示例
           

摘要

We consider the problem of detecting anomalies in data that arise as multidimensional arrays with each dimension corresponding to the levels of a categorical variable. In typical data mining applications, the number of cells in such arrays are usually large. Our primary focus is detecting anomalies by comparing information at the current time to historical data. Naive approaches advocated in the process control literature do not work well in this scenario due to the multiple testing problem—performing multiple statistical tests on the same data produce excessive number of false positives. We use an empirical Bayes method which works by fitting a two-component Gaussian mixture to deviations at current time. The approach is scalable to problems that involve monitoring massive number of cells and fast enough to be potentially useful in many streaming scenarios. We show the superiority of the method relative to a naive “per component error rate” procedure through simulation. A novel feature of our technique is the ability to suppress deviations that are merely the consequence of sharp changes in the marginal distributions. This research was motivated by the need to extract critical application information and business intelligence from the daily logs that accompany large-scale spoken dialog systems. We illustrate our method on one such system.
机译:我们考虑检测以多维数组形式出现的数据异常的问题,每个多维数组都对应于类别变量的级别。在典型的数据挖掘应用中,此类阵列中的单元数通常很大。我们的主要重点是通过将当前时间的信息与历史数据进行比较来检测异常。由于多重测试问题,过程控制文献中提倡的朴素方法在这种情况下无法很好地工作-对同一数据执行多次统计测试会产生过多的误报。我们使用经验贝叶斯方法,该方法通过将两分量高斯混合拟合到当前时间的偏差来工作。该方法可扩展到涉及监视大量信元的问题,并且速度足够快,足以在许多流方案中发挥作用。通过仿真,我们展示了该方法相对于幼稚的“每个组件的错误率”过程的优势。我们技术的一个新功能是能够抑制偏差,这些偏差仅仅是边际分布急剧变化的结果。这项研究的动机是需要从大型口语对话系统随附的日常日志中提取关键的应用程序信息和商业智能。我们在一个这样的系统上说明我们的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号