首页> 外文会议>IEEE/ACM International Conference on Mining Software Repositories >Man vs Machine – A Study into Language Identification of Stack Overflow Code Snippets
【24h】

Man vs Machine – A Study into Language Identification of Stack Overflow Code Snippets

机译:人与机器–堆栈溢出代码段的语言识别研究

获取原文

摘要

Software engineers produce large amounts of publicly accessible data that enables researchers to mine knowledge, fostering a better understanding of the field. Knowledge extraction often relies on meta data. This meta data can either be harvested from user-provided tags, or inferred by algorithms from the respective data. The question arises to which extent either type of meta data can be trusted and relied upon. We study this problem in the context of language identification of code snippets posted on Stack Overflow. We analyse the consistency between user-provided tags and the classification obtained with GitHub linguist, an industry-strength automated language recognition tool. We find that the results obtained by both approaches are often not consistent. This indicates that both have to be used with great care. Our results also suggest that developers may not follow the evolutionary path of programming languages beyond one step when seeking or providing answers to software engineering challenges encountered.
机译:软件工程师产生大量可公开访问的数据,这些数据使研究人员能够挖掘知识,从而加深对这一领域的了解。知识提取通常依赖于元数据。该元数据既可以从用户提供的标签中获取,也可以由算法从相应数据中推断出来。提出了一个问题,即在何种程度上可以信任和依赖任何类型的元数据。我们将在Stack Overflow上发布的代码段的语言识别的上下文中研究此问题。我们分析了用户提供的标签与使用行业实力强大的自动语言识别工具GitHub语言学家获得的分类之间的一致性。我们发现,通过两种方法获得的结果通常不一致。这表明两者都必须非常小心地使用。我们的结果还表明,开发人员在寻求或提供所遇到的软件工程挑战时,可能不会一步一步地遵循编程语言的发展道路。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号