...
首页> 外文期刊>Journal of Computers >An Improved KNN Text Classification Algorithm Based on Clustering
【24h】

An Improved KNN Text Classification Algorithm Based on Clustering

机译:一种基于聚类的改进的KNN文本分类算法

获取原文
           

摘要

—The traditional KNN text classification algorithm used all training samples for classification, so it had a huge number of training samples and a high degree of calculation complexity, and it also didn’t reflect the different importance of different samples. In allusion to the problems mentioned above, an improved KNN text classification algorithm based on clustering center is proposed in this paper. Firstly, the given training sets are compressed and the samples near by the border are deleted, so the multipeak effect of the training sample sets is eliminated. Secondly, the training sample sets of each category are clustered by k-means clustering algorithm, and all cluster centers are taken as the new training samples. Thirdly, a weight value is introduced, which indicates the importance of each training sample according to the number of samples in the cluster that contains this cluster center. Finally, the modified samples are used to accomplish KNN text classification. The simulation results show that the algorithm proposed in this paper can not only effectively reduce the actual number of training samples and lower the calculation complexity, but also improve the accuracy of KNN text classification algorithm.
机译:- 传统的KNN文本分类算法使用了所有培训样本进行分类,因此它具有大量的培训样本和高度的计算复杂性,并且还没有反映不同样本的不同重要性。在本文中提出了一种基于聚类中心的改进的KNN文本分类算法。首先,压缩给定的训练集,删除边界附近的样本,因此消除了训练样本集的多跳效果。其次,每个类别的训练样本集由K-means聚类算法集群,所有群集中心都被视为新的培训样本。第三,介绍了权重值,这表明每个训练样本根据包含该群集中心的群集中的样本数量的重要性。最后,修改后的样本用于完成KNN文本分类。仿真结果表明,本文提出的算法不仅可以有效地降低训练样本的实际数量并降低计算复杂性,而且提高了KNN文本分类算法的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号