Using the Cluster-based Tree Structure of k-Nearest Neighbor to Reduce the Effort Required to Classify Unlabeled Large Datasets

机译：使用基于群的基于群的树结构k-collect exceld，以减少对解码未标记的大型数据集所需的努力

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The usual practice in the classification problem is to create a set of labeled data for training and then use it to tune a classifier for predicting the classes of the remaining items in the dataset. However, labeled data demand great human effort, and classification by specialists is normally expensive and consumes a large amount of time. In this paper, we discuss how we can benefit from a cluster-based tree kNN structure to quickly build a training dataset from scratch. We evaluated the proposed method on some classification datasets, and the results are promising because we reduced the amount of labeling work by the specialists to 4% of the number of documents in the evaluated datasets. Furthermore, we achieved an average accuracy of 72.19% on tested datasets, versus 77.12% when using 90% of the dataset for training.

机译：分类问题的通常实践是创建一组标记的训练数据，然后使用它来调整分类器以预测数据集中剩余项的类。然而，标记的数据需求巨大的人性化努力，专家分类通常是昂贵的并且消耗大量时间。在本文中，我们讨论了如何从基于群集的树knn结构中受益，以便从头开始快速构建训练数据集。我们在某些分类数据集上评估了所提出的方法，结果很有希望，因为我们将专家将标签工作的数量减少到评估数据集中的文档数量的4％。此外，在使用90％的数据集进行培训时，我们在测试数据集中实现了72.19％的平均精度，而77.12％。

著录项

来源
《International Conference on Knowledge Discovery and Information Retrieval;International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management》|2015年|1(CD-ROM)|共10页
会议地点
作者
Elias de Oliveira; Howard Roatti; Matheus de Araujo Nogueira; Henrique Gomes Basoni; Patrick Marques Ciarelli;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 G354-53;
关键词
Text Classification; Social Network; Textmining;

机译：文本分类;社交网络;教科;

相似文献

外文文献
中文文献
专利

1. Fast k-nearest neighbor classification using cluster-based trees [J] . Bin Zhang, Srihari S.N. IEEE Transactions on Pattern Analysis and Machine Intelligence . 2004,第4期

机译：使用基于聚类的树快速进行k最近邻分类
2. Kernel k-nearest neighbor classifier based on decision tree ensemble for SAR modeling analysis [J] . Dong-Sheng Cao, Qing-Song Xu, Xin Huang, Analytical methods . 2014,第17期

机译：基于决策树集成的核k近邻分类器用于SAR建模分析
3. Kernel k-nearest neighbor classifier based on decision tree ensemble for SAR modeling analysis [J] . Xin Huang, Qing-Song Xu, Dong-Sheng Cao Analytical methods . 2013,第17期

机译：基于决策树集成的核k近邻分类器用于SAR建模分析
4. Using the cluster-based tree structure of k-nearest neighbor to reduce the effort required to classify unlabeled large datasets [C] . Elias de Oliveira, Howard Roatti, Matheus de Araujo Nogueira, 2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management . 2015

机译：使用k最近邻的基于集群的树结构来减少对未标记的大型数据集进行分类所需的工作
5. Randomized and Evolutionary Approaches to Dataset Characterization, Feature Weighting, and Sampling in K-Nearest Neighbors [D] . Basak, Suryoday. 2020

机译：基于数据集特征的随机和进化方法，具有在k离邻居中的采样和抽样
6. RNA secondary structure prediction from sequence alignments using a network of k-nearest neighbor classifiers [O] . ECKART BINDEWALD, BRUCE A. SHAPIRO 2006

机译：使用k最近邻分类器从序列比对预测RNA二级结构
7. KOMPARASI METODE KOMBINASI SELEKSI FITUR DAN MACHINE LEARNING K-NEAREST NEIGHBOR PADA DATASET LABEL HOURS SOFTWARE EFFORT ESTIMATION [O] . Indra Kurniawan, Ahmad Faiq Abror 2019

机译：比较组合方法选择特征和机器学习K-Collect邻居在Label Label Houth软件工作估算数据集

Using the Cluster-based Tree Structure of k-Nearest Neighbor to Reduce the Effort Required to Classify Unlabeled Large Datasets

摘要

著录项

相似文献

相关主题

期刊订阅