首页> 外文会议>International Conference on Advanced Language Processing and Web Information Technology >A Weighted κ-Nearest Neighborhood for BaseNP Detection under Covariate Shift
【24h】

A Weighted κ-Nearest Neighborhood for BaseNP Detection under Covariate Shift

机译:在协变量移位下的基础检测的加权κ最近的邻居

获取原文

摘要

In common machine learning methods, there is a basic assumption that training data and test data are sampled from the same distribution. However, this assumption is commonly violated in practical fields. The situation where the training and test data are generated from different distributions is so-called covariate shift. In natural language processing, it is highly possible to occur covariate shift due to the size of sample space. Natural language data have theoretically infinite size, which causes that the distribution of training data can not reflect that of entire data. In this paper, we try to verify that the performance of methods on natural language processing can be improved by reducing error from covariate shift. For this purpose, we propose the importance weighted k-NN for base noun detection. In the proposed method, the weights are set as a difference between the training and test distribution. Theoretically, the performance under covariate shift can be improved using importance weight method. In the experiment, the proposed method shows better performance than normal k-NN.
机译:在公共机器学习方法中,存在培训数据和测试数据的基本假设是从相同的分布采样的。但是,这种假设通常在实际领域违反。从不同的分布生成培训和测试数据的情况是所谓的协变速。在自然语言处理中,由于样本空间的大小,因此非常可能发生协变量。自然语言数据具有理论上无限的大小,这导致培训数据的分布无法反映整个数据的分布。在本文中,我们尝试通过减少协变速转移来验证可以提高对自然语言处理的方法的性能。为此目的,我们提出了基础名词检测的重要性加权K-Nn。在所提出的方法中,重物被设置为训练和测试分布之间的差异。从理论上讲,使用重要性重量方法可以改善协变量移位下的性能。在实验中,所提出的方法显示出比正常的K-NN更好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号