Identifying mislabeled training data with the aid of unlabeled data

Donghai Guan; Weiwei Yuan; Young-Koo Lee; Sungyoung Lee

首页> 外文期刊>Applied Intelligence >Identifying mislabeled training data with the aid of unlabeled data

【24h】

Identifying mislabeled training data with the aid of unlabeled data

机译：借助未贴标签的数据识别贴错标签的训练数据

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents a new approach for identifying and eliminating mislabeled training instances for supervised learning algorithms. The novelty of this approach lies in the using of unlabeled instances to aid the detection of mislabeled training instances. This is in contrast with existing methods which rely upon only the labeled training instances. Our approach is straightforward and can be applied to many existing noise detection methods with only marginal modifications on them as required. To assess the benefit of our approach, we choose two popular noise detection methods: majority filtering (MF) and consensus filtering (CF). MFAUD/CFAUD is the new proposed variant of MF/CF which relies on our approach and denotes majority/consensus filtering with the aid of unlabeled data. Empirical study validates the superiority of our approach and shows that MFAUD and CFAUD can significantly improve the performances of MF and CF under different noise ratios and labeled ratios. In addition, the improvement is more remarkable when the noise ratio is greater.

机译：本文提出了一种新方法，用于识别和消除监督学习算法中标记错误的训练实例。这种方法的新颖之处在于使用未标记的实例来帮助检测错误标记的训练实例。这与仅依赖于标记的训练实例的现有方法相反。我们的方法简单明了，可以应用到许多现有的噪声检测方法中，仅需根据需要对其进行少量修改。为了评估我们方法的好处，我们选择了两种流行的噪声检测方法：多数滤波（MF）和共识滤波（CF）。 MFAUD / CFAUD是MF / CF的新提议变体，它依赖于我们的方法，并借助未标记的数据表示多数/共识过滤。实证研究证实了我们方法的优越性，并表明MFAUD和CFAUD在不同的噪声比率和标记比率下可以显着改善MF和CF的性能。另外，当噪声比更大时，改进更加显着。

著录项

来源
《Applied Intelligence》 |2011年第3期|p.345-358|共14页
作者
Donghai Guan; Weiwei Yuan; Young-Koo Lee; Sungyoung Lee;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Identifying mislabeled training data with the aid of unlabeled data [J] . Guan D., Yuan W., Lee Y.-K., Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies . 2011,第3期

机译：借助未贴标签的数据识别贴错标签的训练数据
2. Identifying Mislabeled Training Data [J] . Brodley C. E., Friedl M. A. The Journal of Artificial Intelligence Research . 1999,第7期

机译：识别贴错标签的训练数据
3. Identifying mislabeled training data [J] . Carla E. Brodley, Mark A. Friedl The Journal of Artificial Intelligence Research . 1999,第0期

机译：识别贴错标签的训练数据
4. Improving automated land cover mapping by identifying and eliminating mislabeled observations from training data [C] . Brodley, C.E., Friedl, Geoscience and Remote Sensing Symposium, 1996. IGARSS '96. 'Remote Sensing for a Sustainable Future.', International . 1996

机译：通过识别和消除训练数据中标记错误的观测值来改善自动土地覆盖图的绘制
5. Improving named entity recognition with co-training and unlabeled bilingual data. [D] . Ma, Xiaoyi. 2008

机译：通过共同训练和未标记的双语数据来改善命名实体的识别能力。
6. Identifying mislabeled and contaminated DNA methylation microarray data: an extended quality control toolset with examples from GEO [O] . Jonathan A. Heiss, Allan C. Just 2018

机译：识别标签错误和受污染的DNA甲基化微阵列数据：扩展的质量控制工具集包含GEO的示例
7. Identifying Mislabeled Training Data [O] . Brodley, C. E., Friedl, M. A. 2011

机译：识别错误标记的培训数据
8. Methods for Identifying AIDS Cases in Medicare and Medicaid Claims Data [R] . Thornton, C., Fasciano, N., Turner, B. J., 1997

机译：在医疗保险和医疗补助索赔数据中识别艾滋病病例的方法

Identifying mislabeled training data with the aid of unlabeled data

摘要

著录项

相似文献

相关主题

期刊订阅