首页> 外文会议>IEEE/ACM International Conference on Mining Software Repositories >Exploring Word Embedding Techniques to Improve Sentiment Analysis of Software Engineering Texts
【24h】

Exploring Word Embedding Techniques to Improve Sentiment Analysis of Software Engineering Texts

机译:探索嵌入技术,以改善软件工程文本的情感分析

获取原文

摘要

Sentiment analysis (SA) of text-based software artifacts is increasingly used to extract information for various tasks including providing code suggestions, improving development team productivity, giving recommendations of software packages and libraries, and recommending comments on defects in source code, code quality, possibilities for improvement of applications. Studies of state-of-the-art sentiment analysis tools applied to software-related texts have shown varying results based on the techniques and training approaches. In this paper, we investigate the impact of two potential opportunities to improve the training for sentiment analysis of SE artifacts in the context of the use of neural networks customized using the Stack Overflow data developed by Lin et al. We customize the process of sentiment analysis to the software domain, using software domain-specific word embeddings learned from Stack Overflow (SO) posts, and study the impact of software domain-specific word embeddings on the performance of the sentiment analysis tool, as compared to generic word embeddings learned from Google News. We find that the word embeddings learned from the Google News data performs mostly similar and in some cases better than the word embeddings learned from SO posts. We also study the impact of two machine learning techniques, oversampling and undersampling of data, on the training of a sentiment classifier for handling small SE datasets with a skewed distribution. We find that oversampling alone, as well as the combination of oversampling and undersampling together, helps in improving the performance of a sentiment classifier.
机译:基于文本的软件工件的情感分析(SA)越来越多地用于提取各种任务的信息,包括提供代码建议,提高开发团队的生产力,推荐软件包和库的建议,并推荐对源代码中的缺陷,代码质量的缺陷的建议,改进应用的可能性。应用于软件相关文本的最先进的情感分析工具的研究表明了基于技术和培训方法的不同结果。在本文中,我们调查了两个潜在机会在使用由Lin等人开发的堆栈溢出数据使用的神经网络中使用神经网络的语境中改进SE神器的情绪分析培训的影响。我们使用从堆栈溢出(SO)帖子(SO)帖子中学习的软件域特定的单词嵌入来自定义情感分析的过程,并在比较的情况下研究软件域特定单词EMB嵌入式的影响。从谷歌新闻中了解的通用单词嵌入式。我们发现,从谷歌新闻数据中了解的嵌入词大多数情况下,在某些情况下比从SO POST获得的单词eMbeddings更好。我们还研究了两种机器学习技术,过采样和欠采样的影响,对数据的训练,用于处理具有偏斜分布的小型SE数据集。我们发现单独的过采样,以及过采样和欠采样的结合在一起,有助于提高情感分类器的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号