首页> 外文会议>IEEE/ACM International Conference on Mining Software Repositories >Can Duplicate Questions on Stack Overflow Benefit the Software Development Community?
【24h】

Can Duplicate Questions on Stack Overflow Benefit the Software Development Community?

机译:可以重复关于堆栈溢出的问题,使软件开发社区受益吗?

获取原文

摘要

Duplicate questions on Stack Overflow are questions that are flagged as being conceptually equivalent to a previously posted question. Stack Overflow suggests that duplicate questions should not be discussed by users, but rather that attention should be redirected to their previously posted counterparts. Roughly 53% of closed Stack Overflow posts are closed due to duplication. Despite their supposed overlapping content, user activity suggests duplicates may generate additional or superior answers. Approximately 9% of duplicates receive more views than their original counterparts despite being closed. In this paper, we analyze duplicate questions from two perspectives. First, we analyze the experience of those who post duplicates using activity and reputation-based heuristics. Second, we compare the content of duplicates both in terms of their questions and answers to determine the degree of similarity between each duplicate pair. Through analysis of the MSR challenge dataset, we find that although duplicate questions are more likely to be created by inexperienced users, they often receive dissimilar answers to their original counterparts. Indeed, supplementary textual analysis using Natural Language Processing (NLP) techniques suggests duplicate questions provide additional information about the underlying concepts being discussed. We recommend that the Stack Overflow's duplication policy be revised to account for the benefits that leaving duplicate questions open may have for the developer community.
机译:关于堆栈溢出的重复问题是标记为概念上相当于先前发布的问题的问题。堆栈溢出表明,不应通过用户讨论重复的问题,而是应将注意力重定向到其先前发布的同行。大约53%的封闭堆栈溢出柱由于复制而关闭。尽管他们认为的重叠内容,但用户活动建议重复可能会产生其他或卓越的答案。尽管被关闭,但大约9%的重复员会收到比原始对应物更多的观点。在本文中,我们从两个角度分析了重复的问题。首先,我们分析了使用活动和基于信誉的职位发布的人的经验。其次,我们在其问题和答案方面比较重复的内容,以确定每个重复对之间的相似程度。通过分析MSR挑战数据集,我们发现,尽管缺乏经验的用户更有可能创建重复的问题,但他们经常接受其原始对应物的异种答案。实际上,使用自然语言处理(NLP)技术的补充文本分析表明重复的问题提供了有关正在讨论的底层概念的其他信息。我们建议修改堆栈溢出的复制策略,以解释为开发人员社区留下重复问题的好处。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号