首页> 外文会议>International Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems >Mining Concept-Drifting Data Streams Containing Labeled and Unlabeled Instances
【24h】

Mining Concept-Drifting Data Streams Containing Labeled and Unlabeled Instances

机译:挖掘概念漂移的数据流,包含标记和未标记的实例

获取原文

摘要

Recently, mining data streams has attracted significant attention and has been considered as a challenging task in supervised classification. Most of the existing methods dealing with this problem assume the availability of entirely labeled data streams. Unfortunately, such assumption is often violated in real-world applications given that obtaining labels is a time-consuming and expensive task, while a large amount of unlabeled instances are readily available. In this paper, we propose a new approach for handling concept-drifting data streams containing labeled and unlabeled instances. First, we use KL divergence and bootstrapping method to quantify and detect three possible kinds of drift: feature, conditional or dual. Then, if any occurs, a new classifier is learned using the EM algorithm; otherwise, the current classifier is kept unchanged. Our approach is general so that it can be applied with different classification models. Experiments performed with naive Bayes and logistic regression, on two benchmark datasets, show the good performance of our approach using only limited amounts of labeled instances.
机译:最近,采矿数据流引起了重大关注,并被认为是监督分类中的具有挑战性的任务。处理此问题的大多数现有方法都假设完全标记的数据流的可用性。遗憾的是,这种假设通常违反了现实世界的应用程序,因为获得标签是耗时和昂贵的任务,而大量未标记的实例很容易获得。在本文中,我们提出了一种新方法,用于处理包含标记和未标记的实例的概念漂移数据流。首先,我们使用KL发散和引导方法来量化和检测三种可能的漂移:特征,条件或双重。然后,如果发生任何,则使用EM算法学习新分类器;否则,当前分类器保持不变。我们的方法是一般的,因此它可以应用于不同的分类模型。在两个基准数据集上,使用Naive Bayes和Logistic回归执行的实验,显示了我们的方法的良好性能,只使用有限的标记实例。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号