首页> 外文会议>International Conference on Program Comprehension >Structural information based term weighting in text retrieval for feature location
【24h】

Structural information based term weighting in text retrieval for feature location

机译:特征位置文本检索中基于结构信息的术语加权

获取原文

摘要

Many recent feature location techniques (FLTs) apply text retrieval (TR) techniques to corpora built from text embedded in source code. Term weighting is a standard preprocessing step in TR and is used to adjust the importance of a term within a document or corpus. Common term weighting schemes such as tf-idf may not be optimal for use with source code, because they originate from a natural language context and were designed for use with unstructured documents. In this paper we propose a new approach to term weighting in which term weights are assigned using the structural information from the source code. We then evaluate the proposed approach by conducting an empirical study of a TR-based FLT. In all, we study over 400 bugs and features from five open source Java systems and find that structural term weighting can cause a statistically significant improvement in the accuracy of the FLT.
机译:许多最近的特征定位技术(FLTS)将文本检索(TR)技术应用于从源代码中的文本内置的语料库。术语加权是TR中的标准预处理步骤,用于调整文档或语料库内的术语的重要性。诸如TF-IDF的共同术语加权方案可能与源代码一起使用可能不是最佳的,因为它们来自自然语言上下文,并且设计用于非结构化文档。在本文中,我们提出了一种新的方法来加权,其中使用来自源代码的结构信息分配术语权重。然后,我们通过进行TR基FLT的实证研究来评估所提出的方法。总而言之,我们研究了来自五个开源Java系统的400多个错误和功能,并发现结构术语加权可能导致FLT的精度造成统计上显着的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号