Exploring Word Embedding Techniques to Improve Sentiment Analysis of Software Engineering Texts

机译：探索嵌入技术，以改善软件工程文本的情感分析

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Sentiment analysis (SA) of text-based software artifacts is increasingly used to extract information for various tasks including providing code suggestions, improving development team productivity, giving recommendations of software packages and libraries, and recommending comments on defects in source code, code quality, possibilities for improvement of applications. Studies of state-of-the-art sentiment analysis tools applied to software-related texts have shown varying results based on the techniques and training approaches. In this paper, we investigate the impact of two potential opportunities to improve the training for sentiment analysis of SE artifacts in the context of the use of neural networks customized using the Stack Overflow data developed by Lin et al. We customize the process of sentiment analysis to the software domain, using software domain-specific word embeddings learned from Stack Overflow (SO) posts, and study the impact of software domain-specific word embeddings on the performance of the sentiment analysis tool, as compared to generic word embeddings learned from Google News. We find that the word embeddings learned from the Google News data performs mostly similar and in some cases better than the word embeddings learned from SO posts. We also study the impact of two machine learning techniques, oversampling and undersampling of data, on the training of a sentiment classifier for handling small SE datasets with a skewed distribution. We find that oversampling alone, as well as the combination of oversampling and undersampling together, helps in improving the performance of a sentiment classifier.

机译：基于文本的软件工件的情感分析（SA）越来越多地用于提取各种任务的信息，包括提供代码建议，提高开发团队的生产力，推荐软件包和库的建议，并推荐对源代码中的缺陷，代码质量的缺陷的建议，改进应用的可能性。应用于软件相关文本的最先进的情感分析工具的研究表明了基于技术和培训方法的不同结果。在本文中，我们调查了两个潜在机会在使用由Lin等人开发的堆栈溢出数据使用的神经网络中使用神经网络的语境中改进SE神器的情绪分析培训的影响。我们使用从堆栈溢出（SO）帖子（SO）帖子中学习的软件域特定的单词嵌入来自定义情感分析的过程，并在比较的情况下研究软件域特定单词EMB嵌入式的影响。从谷歌新闻中了解的通用单词嵌入式。我们发现，从谷歌新闻数据中了解的嵌入词大多数情况下，在某些情况下比从SO POST获得的单词eMbeddings更好。我们还研究了两种机器学习技术，过采样和欠采样的影响，对数据的训练，用于处理具有偏斜分布的小型SE数据集。我们发现单独的过采样，以及过采样和欠采样的结合在一起，有助于提高情感分类器的性能。

著录项

来源
《IEEE/ACM International Conference on Mining Software Repositories》|2019年|xxxiv 606 p. :|共11页
会议地点
作者
Eeshita Biswas; K. Vijay-Shanker; Lori Pollock;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类安全保密;
关键词
data mining; learning (artificial intelligence); neural nets; pattern classification; social networking (online); software engineering; text analysis;

机译：数据挖掘;学习（人工智能）;神经网络;模式分类;社交网络（在线）;软件工程;文本分析;

相似文献

外文文献
中文文献
专利

1. SentiStrength-SE: Exploiting domain specificity for improved sentiment analysis in software engineering text [J] . Islam Md Rakibul, Zibran Minhaz F. The Journal of Systems and Software . 2018,第NOVa期

机译：SentiStrength-SE：利用领域特异性来改善软件工程文本中的情感分析
2. Using Word Embedding and Ensemble Learning for Highly Imbalanced Data Sentiment Analysis in Short Arabic Text [J] . Sadam Al-Azani, El-Sayed M. El-Alfy Procedia Computer Science . 2017,第1期

机译：使用词嵌入和集成学习在阿拉伯语短文本中高度不平衡的数据情感分析
3. From word embeddings to document similarities for improved information retrieval in software engineering [J] . Mariam Kiran Computing reviews . 2017,第7期

机译：从词嵌入到文档相似性，以改善软件工程中的信息检索
4. Exploring Word Embedding Techniques to Improve Sentiment Analysis of Software Engineering Texts [C] . Eeshita Biswas, K. Vijay-Shanker, Lori Pollock IEEE/ACM International Conference on Mining Software Repositories . 2019

机译：探索词嵌入技术以改善软件工程文本的情感分析
5. Things and Strings and More: Improving Place Name Disambiguation from Short Texts by Combining Entity Co-Occurrence, Topic Modeling, and Word Embedding [D] . Ju, Yiting. 2017

机译：事物和字符串和更多：通过组合实体共同发生，主题建模和单词嵌入来改善从短文本的歧义
6. Wide range screening of algorithmic bias in word embedding models using large sentiment lexicons reveals underreported bias types [O] . David Rozado 2020

机译：使用大型情绪词典的Word嵌入模型中的算法偏置的广泛绘制筛选揭示了额外的偏差类型
7. Exploiting the Unique Expression for Improved Sentiment Analysis in Software Engineering Text [O] . Kexin Sun, Hui Gao, Hongyu Kuang, 2021

机译：利用软件工程文本中改进情感分析的独特表达

Exploring Word Embedding Techniques to Improve Sentiment Analysis of Software Engineering Texts

摘要

著录项

相似文献

相关主题

期刊订阅