首页> 外文期刊>Data Mining and Knowledge Discovery >Flexible decision tree for data stream classification in the presence of concept change, noise and missing values
【24h】

Flexible decision tree for data stream classification in the presence of concept change, noise and missing values

机译:在存在概念变化,噪声和缺失值的情况下,用于数据流分类的灵活决策树

获取原文
获取原文并翻译 | 示例
           

摘要

In recent years, classification learning for data streams has become an important and active research topic. A major challenge posed by data streams is that their underlying concepts can change over time, which requires current classifiers to be revised accordingly and timely. To detect concept change, a common methodology is to observe the online classification accuracy. If accuracy drops below some threshold value, a concept change is deemed to have taken place. An implicit assumption behind this methodology is that any drop in classification accuracy can be interpreted as a symptom of concept change. Unfortunately however, this assumption is often violated in the real world where data streams carry noise that can also introduce a significant reduction in classification accuracy. To compound this problem, traditional noise cleansing methods are incompetent for data streams. Those methods normally need to scan data multiple times whereas learning for data streams can only afford one-pass scan because of data’s high speed and huge volume. Another open problem in data stream classification is how to deal with missing values. When new instances containing missing values arrive, how a learning model classifies them and how the learning model updates itself according to them is an issue whose solution is far from being explored. To solve these problems, this paper proposes a novel classification algorithm, flexible decision tree (FlexDT), which extends fuzzy logic to data stream classification. The advantages are three-fold. First, FlexDT offers a flexible structure to effectively and efficiently handle concept change. Second, FlexDT is robust to noise. Hence it can prevent noise from interfering with classification accuracy, and accuracy drop can be safely attributed to concept change. Third, it deals with missing values in an elegant way. Extensive evaluations are conducted to compare FlexDT with representative existing data stream classification algorithms using a large suite of data streams and various statistical tests. Experimental results suggest that FlexDT offers a significant benefit to data stream classification in real-world scenarios where concept change, noise and missing values coexist.
机译:近年来,数据流的分类学习已经成为重要而活跃的研究课题。数据流带来的主要挑战是其基本概念会随着时间而变化,这要求对当前的分类器进行相应及时的修订。为了检测概念变化,一种常见的方法是观察在线分类的准确性。如果准确性下降到某个阈值以下,则认为已发生概念更改。这种方法背后的一个隐含假设是,分类准确性的任何下降都可以解释为概念改变的症状。但是,不幸的是,这种假设在现实世界中经常被违反,在现实世界中,数据流携带的噪声也可能导致分类准确性的显着降低。为了解决这个问题,传统的噪声清除方法不适用于数据流。这些方法通常需要多次扫描数据,而学习数据流只能进行一次扫描,因为数据的速度快且数据量巨大。数据流分类中的另一个开放问题是如何处理缺失值。当包含缺失值的新实例到达时,学习模型如何对它们进行分类以及学习模型如何根据它们进行自我更新是一个问题,其解决方案尚待探索。为了解决这些问题,本文提出了一种新颖的分类算法,即柔性决策树(FlexDT),它将模糊逻辑扩展到数据流分类。优点是三方面的。首先,FlexDT提供了一种灵活的结构来有效地处理概念变更。其次,FlexDT具有强大的抗噪能力。因此,它可以防止噪声干扰分类准确性,并且准确性下降可以安全地归因于概念变化。第三,它以优雅的方式处理缺失的价值观。进行了广泛的评估,以比较FlexDT与使用大量数据流和各种统计测试的代表性现有数据流分类算法。实验结果表明,在概念变化,噪声和缺失值共存的现实世界场景中,FlexDT为数据流分类提供了显着优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号