Mining Concept-Drifting Data Streams Containing Labeled and Unlabeled Instances

机译：挖掘概念漂移的数据流，包含标记和未标记的实例

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recently, mining data streams has attracted significant attention and has been considered as a challenging task in supervised classification. Most of the existing methods dealing with this problem assume the availability of entirely labeled data streams. Unfortunately, such assumption is often violated in real-world applications given that obtaining labels is a time-consuming and expensive task, while a large amount of unlabeled instances are readily available. In this paper, we propose a new approach for handling concept-drifting data streams containing labeled and unlabeled instances. First, we use KL divergence and bootstrapping method to quantify and detect three possible kinds of drift: feature, conditional or dual. Then, if any occurs, a new classifier is learned using the EM algorithm; otherwise, the current classifier is kept unchanged. Our approach is general so that it can be applied with different classification models. Experiments performed with naive Bayes and logistic regression, on two benchmark datasets, show the good performance of our approach using only limited amounts of labeled instances.

机译：最近，采矿数据流引起了重大关注，并被认为是监督分类中的具有挑战性的任务。处理此问题的大多数现有方法都假设完全标记的数据流的可用性。遗憾的是，这种假设通常违反了现实世界的应用程序，因为获得标签是耗时和昂贵的任务，而大量未标记的实例很容易获得。在本文中，我们提出了一种新方法，用于处理包含标记和未标记的实例的概念漂移数据流。首先，我们使用KL发散和引导方法来量化和检测三种可能的漂移：特征，条件或双重。然后，如果发生任何，则使用EM算法学习新分类器;否则，当前分类器保持不变。我们的方法是一般的，因此它可以应用于不同的分类模型。在两个基准数据集上，使用Naive Bayes和Logistic回归执行的实验，显示了我们的方法的良好性能，只使用有限的标记实例。

著录项

来源
《International Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems》|2010年||共10页
会议地点
作者
Hanen Borchani; Pedro Larrafiaga; Concha Bielza;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词

相似文献

外文文献
中文文献
专利

1. Mining multi-dimensional concept-drifting data streams using Bayesian network classifiers [J] . Borchani Hanen, Larranaga Pedro, Gama Joao, Intelligent data analysis . 2016,第2期

机译：使用贝叶斯网络分类器挖掘多维概念漂移数据流
2. Ambiguous decision trees for mining concept-drifting data streams [J] . Jing Liu, Xue Li, Weicai Zhong Pattern recognition letters . 2009,第15期

机译：用于挖掘概念漂移数据流的模糊决策树
3. An Efficient and Sensitive Decision Tree Approach to Mining Concept-Drifting Data Streams [J] . Cheng-Jung TSAI, Chien-I LEE, Wei-Pang YANG Informatica . 2008,第1期

机译：挖掘概念漂移数据流的高效灵敏决策树方法
4. Mining Concept-Drifting Data Streams Containing Labeled and Unlabeled Instances [C] . Hanen Borchani, Pedro Larranaga, Concha Bielza IEA/AIE 2010;International conference on industrial engineering and other applications of applied intelligent systems . 2010

机译：挖掘包含标签和未标签实例的概念抽取数据流
5. Reducing Labeling Complexity in Streaming Data Mining [D] . Izenov, Yesdaulet. 2018

机译：减少流数据挖掘中的标签复杂性
6. Discriminatory Target Learning: Mining Significant Dependence Relationships from Labeled and Unlabeled Data [O] . Zhi-Yi Duan, Li-Min Wang, Musa Mammadov, 2019

机译：歧视目标学习：从标记和未标记的数据中挖掘显着的依赖关系
7. Mining multi-dimensional concept-drifting data streams using Bayesian network classifiers [O] . Borchani Hanen, Larrañaga Múgica Pedro María, Gama João, 2016

机译：使用贝叶斯网络分类器挖掘多维概念漂移数据流
8. Cognitive Study of Learning with Labeled and Unlabeled Data. [R] . Zhu, X., Rogers, T. T. 2012

机译：标记和未标记数据学习的认知研究。

Mining Concept-Drifting Data Streams Containing Labeled and Unlabeled Instances

摘要

著录项

相似文献

相关主题

期刊订阅