首页> 外文会议>International Conference on Knowledge Discovery and Information Retrieval;International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management >Using the Cluster-based Tree Structure of k-Nearest Neighbor to Reduce the Effort Required to Classify Unlabeled Large Datasets
【24h】

Using the Cluster-based Tree Structure of k-Nearest Neighbor to Reduce the Effort Required to Classify Unlabeled Large Datasets

机译:使用基于群的基于群的树结构k-collect exceld,以减少对解码未标记的大型数据集所需的努力

获取原文

摘要

The usual practice in the classification problem is to create a set of labeled data for training and then use it to tune a classifier for predicting the classes of the remaining items in the dataset. However, labeled data demand great human effort, and classification by specialists is normally expensive and consumes a large amount of time. In this paper, we discuss how we can benefit from a cluster-based tree kNN structure to quickly build a training dataset from scratch. We evaluated the proposed method on some classification datasets, and the results are promising because we reduced the amount of labeling work by the specialists to 4% of the number of documents in the evaluated datasets. Furthermore, we achieved an average accuracy of 72.19% on tested datasets, versus 77.12% when using 90% of the dataset for training.
机译:分类问题的通常实践是创建一组标记的训练数据,然后使用它来调整分类器以预测数据集中剩余项的类。 然而,标记的数据需求巨大的人性化努力,专家分类通常是昂贵的并且消耗大量时间。 在本文中,我们讨论了如何从基于群集的树knn结构中受益,以便从头开始快速构建训练数据集。 我们在某些分类数据集上评估了所提出的方法,结果很有希望,因为我们将专家将标签工作的数量减少到评估数据集中的文档数量的4%。 此外,在使用90%的数据集进行培训时,我们在测试数据集中实现了72.19%的平均精度,而77.12%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号