首页> 外文会议>IEEE/ACM International Conference on Mining Software Repositories >Man vs Machine – A Study into Language Identification of Stack Overflow Code Snippets
【24h】

Man vs Machine – A Study into Language Identification of Stack Overflow Code Snippets

机译:MAN VS机器 - 堆栈溢出代码片段语言识别的研究

获取原文

摘要

Software engineers produce large amounts of publicly accessible data that enables researchers to mine knowledge, fostering a better understanding of the field. Knowledge extraction often relies on meta data. This meta data can either be harvested from user-provided tags, or inferred by algorithms from the respective data. The question arises to which extent either type of meta data can be trusted and relied upon. We study this problem in the context of language identification of code snippets posted on Stack Overflow. We analyse the consistency between user-provided tags and the classification obtained with GitHub linguist, an industry-strength automated language recognition tool. We find that the results obtained by both approaches are often not consistent. This indicates that both have to be used with great care. Our results also suggest that developers may not follow the evolutionary path of programming languages beyond one step when seeking or providing answers to software engineering challenges encountered.
机译:软件工程师产生大量公开可访问的数据,使研究人员能够对挖掘知识,促进更好地了解该领域。知识提取通常依赖于元数据。该元数据可以从用户提供的标签收获,或者通过来自各个数据的算法推断。问题出现了可以信任和依赖于可以信任和依赖的型号的程度。我们在堆栈溢出上发布的代码片段语言识别的语言识别中研究了这个问题。我们分析了用户提供的标签之间的一致性以及使用GitHub语言学家获得的分类,该识别工业强度自动语言识别工具。我们发现,两种方法获得的结果往往不一致。这表明两者都必须非常谨慎使用。我们的结果还表明,在寻求或提供遇到的软件工程挑战的答案时,开发人员可能不会遵循一步之后的编程语言的进化路径。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号