...
首页> 外文期刊>Journal of Zhejiang university science >Accelerated k-nearest neighbors algorithm based on principal component analysis for text categorization
【24h】

Accelerated k-nearest neighbors algorithm based on principal component analysis for text categorization

机译:基于主成分分析的加速k最近邻算法在文本分类中的应用

获取原文
           

摘要

text categorization is a significant technique to manage the surging text data on the Internet. The k-nearest neighbors (kNN)%29&ck%5B%5D=abstract&ck%5B%5D=keyword'>k-nearest neighbors (kNN) algorithm is an effective, but not efficient, classification model for text categorization. In this paper, we propose an effective strategy to accelerate the standard kNN, based on a simple principle: usually, near points in space are also near when they are projected into a direction, which means that distant points in the projection direction are also distant in the original space. Using the proposed strategy, most of the irrelevant points can be removed when searching for the k-nearest neighbors of a query point, which greatly decreases the computation cost. Experimental results show that the proposed strategy greatly improves the time performance of the standard kNN, with little degradation in accuracy. Specifically, it is superior in applications that have large and high-dimensional datasets.
机译:文本分类是一种管理Internet上涌动的文本数据的重要技术。 k最近邻居(kNN)%29&ck %5B %5D = abstract&ck %5B %5D = keyword'> k最近邻居(kNN)算法是一种有效的文本分类模型,但效率不高。在本文中,我们基于一个简单的原理提出了一种有效的策略来加速标准kNN:通常,空间中的近点在投影到一个方向时也很近,这意味着投影方向上的远点也很远在原始空间中。使用所提出的策略,当搜索查询点的k个最近邻居时,大多数不相关的点都可以删除,这大大降低了计算成本。实验结果表明,所提出的策略大大提高了标准kNN的时间性能,而准确性却几乎没有下降。特别是,在具有大型和高维数据集的应用程序中,它是优越的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号