Man vs Machine – A Study into Language Identification of Stack Overflow Code Snippets

机译：MAN VS机器 - 堆栈溢出代码片段语言识别的研究

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Software engineers produce large amounts of publicly accessible data that enables researchers to mine knowledge, fostering a better understanding of the field. Knowledge extraction often relies on meta data. This meta data can either be harvested from user-provided tags, or inferred by algorithms from the respective data. The question arises to which extent either type of meta data can be trusted and relied upon. We study this problem in the context of language identification of code snippets posted on Stack Overflow. We analyse the consistency between user-provided tags and the classification obtained with GitHub linguist, an industry-strength automated language recognition tool. We find that the results obtained by both approaches are often not consistent. This indicates that both have to be used with great care. Our results also suggest that developers may not follow the evolutionary path of programming languages beyond one step when seeking or providing answers to software engineering challenges encountered.

机译：软件工程师产生大量公开可访问的数据，使研究人员能够对挖掘知识，促进更好地了解该领域。知识提取通常依赖于元数据。该元数据可以从用户提供的标签收获，或者通过来自各个数据的算法推断。问题出现了可以信任和依赖于可以信任和依赖的型号的程度。我们在堆栈溢出上发布的代码片段语言识别的语言识别中研究了这个问题。我们分析了用户提供的标签之间的一致性以及使用GitHub语言学家获得的分类，该识别工业强度自动语言识别工具。我们发现，两种方法获得的结果往往不一致。这表明两者都必须非常谨慎使用。我们的结果还表明，在寻求或提供遇到的软件工程挑战的答案时，开发人员可能不会遵循一步之后的编程语言的进化路径。

著录项

来源
《IEEE/ACM International Conference on Mining Software Repositories》|2019年|xxxiv 606 p. :|共5页
会议地点
作者
Jens Dietrich; Markus Luczak-Roesch; Elroy Dalefield;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类安全保密;
关键词
data mining; meta data; software engineering; source code (software);

机译：数据挖掘;元数据;软件工程;源代码（软件）;

相似文献

外文文献
中文文献
专利

1. SCC++: Predicting the programming language of questions and snippets of Stack Overflow [J] . Kamel Alrashedy, Dhanush Dharmaretnam, Daniel M. German, The Journal of Systems and Software . 2020,第Apra期

机译：SCC ++：预测问题和堆栈溢出摘要的编程语言
2. Toxic Code Snippets on Stack Overflow [J] . Ragkhitwetsagul Chaiyong, Krinke Jens, Paixao Matheus, IEEE Transactions on Software Engineering . 2021,第3期

机译：堆栈溢出的有毒代码片段
3. Generating Question Titles for Stack Overflow from Mined Code Snippets [J] . ZHIPENG GAO, XIN XIA, JOHN GRUNDY, ACM transactions on software engineering and methodology . 2020,第4期

机译：生成堆栈溢出的问题标题，来自挖掘代码片段
4. Man vs Machine – A Study into Language Identification of Stack Overflow Code Snippets [C] . Jens Dietrich, Markus Luczak-Roesch, Elroy Dalefield IEEE/ACM International Conference on Mining Software Repositories . 2019

机译：人与机器–堆栈溢出代码段的语言识别研究
5. Study of Outdated Cryptography Algorithms Posts of Stack Overflow [D] . Kharche, Shraddha. 2021

机译：堆栈溢流过期加密算法的研究
6. A novel framework for the identification of drug target proteins: Combining stacked auto-encoders with a biased support vector machine [O] . Qi Wang, YangHe Feng, JinCai Huang, -1

机译：用于鉴定药物靶蛋白的新型框架：将堆叠式自动编码器与有偏支持向量机结合使用
7. SOTorrent: Studying the Origin, Evolution, and Usage of Stack Overflow Code Snippets [O] . Sebastian Baltes, Christoph Treude, Stephan Diehl 2019

机译：袜子：研究堆栈溢出代码片段的起源，演化和用法

Man vs Machine – A Study into Language Identification of Stack Overflow Code Snippets

摘要

著录项

相似文献

相关主题

期刊订阅