Man vs Machine – A Study into Language Identification of Stack Overflow Code Snippets

机译：人与机器–堆栈溢出代码段的语言识别研究

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Software engineers produce large amounts of publicly accessible data that enables researchers to mine knowledge, fostering a better understanding of the field. Knowledge extraction often relies on meta data. This meta data can either be harvested from user-provided tags, or inferred by algorithms from the respective data. The question arises to which extent either type of meta data can be trusted and relied upon. We study this problem in the context of language identification of code snippets posted on Stack Overflow. We analyse the consistency between user-provided tags and the classification obtained with GitHub linguist, an industry-strength automated language recognition tool. We find that the results obtained by both approaches are often not consistent. This indicates that both have to be used with great care. Our results also suggest that developers may not follow the evolutionary path of programming languages beyond one step when seeking or providing answers to software engineering challenges encountered.

机译：软件工程师产生大量可公开访问的数据，这些数据使研究人员能够挖掘知识，从而加深对这一领域的了解。知识提取通常依赖于元数据。该元数据既可以从用户提供的标签中获取，也可以由算法从相应数据中推断出来。提出了一个问题，即在何种程度上可以信任和依赖任何类型的元数据。我们将在Stack Overflow上发布的代码段的语言识别的上下文中研究此问题。我们分析了用户提供的标签与使用行业实力强大的自动语言识别工具GitHub语言学家获得的分类之间的一致性。我们发现，通过两种方法获得的结果通常不一致。这表明两者都必须非常小心地使用。我们的结果还表明，开发人员在寻求或提供所遇到的软件工程挑战时，可能不会一步一步地遵循编程语言的发展道路。

著录项

来源
《IEEE/ACM International Conference on Mining Software Repositories》|2019年|205-209|共5页
会议地点
作者
Jens Dietrich; Markus Luczak-Roesch; Elroy Dalefield;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
data mining; meta data; software engineering; source code (software);

机译：数据挖掘;元数据;软件工程;源代码（软件）;

相似文献

外文文献
中文文献
专利

1. SCC++: Predicting the programming language of questions and snippets of Stack Overflow [J] . Kamel Alrashedy, Dhanush Dharmaretnam, Daniel M. German, The Journal of Systems and Software . 2020,第Apra期

机译：SCC ++：预测问题和堆栈溢出摘要的编程语言
2. Toxic Code Snippets on Stack Overflow [J] . Ragkhitwetsagul Chaiyong, Krinke Jens, Paixao Matheus, IEEE Transactions on Software Engineering . 2021,第3期

机译：堆栈溢出的有毒代码片段
3. Generating Question Titles for Stack Overflow from Mined Code Snippets [J] . ZHIPENG GAO, XIN XIA, JOHN GRUNDY, ACM transactions on software engineering and methodology . 2020,第4期

机译：生成堆栈溢出的问题标题，来自挖掘代码片段
4. Man vs Machine – A Study into Language Identification of Stack Overflow Code Snippets [C] . Jens Dietrich, Markus Luczak-Roesch, Elroy Dalefield IEEE/ACM International Conference on Mining Software Repositories . 2019

机译：MAN VS机器 - 堆栈溢出代码片段语言识别的研究
5. Study of Outdated Cryptography Algorithms Posts of Stack Overflow [D] . Kharche, Shraddha. 2021

机译：堆栈溢流过期加密算法的研究
6. A novel framework for the identification of drug target proteins: Combining stacked auto-encoders with a biased support vector machine [O] . Qi Wang, YangHe Feng, JinCai Huang, -1

机译：用于鉴定药物靶蛋白的新型框架：将堆叠式自动编码器与有偏支持向量机结合使用
7. SOTorrent: Studying the Origin, Evolution, and Usage of Stack Overflow Code Snippets [O] . Sebastian Baltes, Christoph Treude, Stephan Diehl 2019

机译：袜子：研究堆栈溢出代码片段的起源，演化和用法

Man vs Machine – A Study into Language Identification of Stack Overflow Code Snippets

摘要

著录项

相似文献

相关主题

期刊订阅