...
首页> 外文期刊>International journal of machine learning and cybernetics >An online ensembles approach for handling concept drift in data streams: diversified online ensembles detection
【24h】

An online ensembles approach for handling concept drift in data streams: diversified online ensembles detection

机译:在线合奏方法来处理数据流中的概念漂移:多样化的在线合奏检测

获取原文
获取原文并翻译 | 示例
           

摘要

Data Streams are continuous data instances arriving at a very high speed with varying underlying conceptual distribution. We present a novel online ensemble approach, Diversified online ensembles detection (DOED), for handling these drifting concepts in data streams. Our approach maintains two ensembles of weighted experts, an ensemble with low diversity and an ensemble with high diversity, which are updated as per their accuracy in classifying the new data instances. Our approach detects drifts by comparing the two accuracies: an accuracy of an ensemble on the recent examples and its accuracy since the beginning of the learning. The final prediction for an instance is the class predicted by the ensemble which gives better accuracy in classifying the recent examples. When a drift is detected by an ensemble, it is reinitialized still maintaining its diversity levels. Experimental evaluation using various artificial and real-world datasets proves that DOED provides very high accuracy in classifying new data instances, irrespective of the size of dataset, type of drift or presence of noise. We compare DOED with the other learners in terms of new performance metrics such as kappa statistic, model cost, and the evaluation time and memory requirements. Our approach proved to be highly resource effective achieving very high accuracies even in a resource constrained environment.
机译:数据流是连续的数据实例,以不同的底层概念分布以很高的速度到达。我们提出了一种新颖的在线集成方法,即分布式在线集成检测(DOED),用于处理数据流中的这些漂移概念。我们的方法维护了两个加权专家集合,一个低多样性的集合和一个高多样性的集合,它们根据对新数据实例进行分类的准确性进行更新。我们的方法通过比较两个精度来检测漂移:最近示例的合奏精度和自学习开始以来的精度。实例的最终预测是集合预测的类,在对最新实例进行分类时可以提供更好的准确性。当一个整体检测到一个漂移时,它会被重新初始化,仍然保持其多样性水平。使用各种人工和现实数据集进行的实验评估证明,DOED在对新数据实例进行分类时提供了非常高的准确性,而与数据集的大小,漂移的类型或噪声的存在无关。我们在新的性能指标(例如kappa统计信息,模型成本以及评估时间和内存要求)方面将DOED与其他学习者进行了比较。事实证明,即使在资源有限的环境中,我们的方法仍具有很高的资源利用率,可以实现很高的精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号