首页> 外文会议>IEEE/ACM International Conference on Mining Software Repositories >An Exploratory Study of Log Placement Recommendation in an Enterprise System
【24h】

An Exploratory Study of Log Placement Recommendation in an Enterprise System

机译:企业系统日志放置推荐的探索性研究

获取原文

摘要

Logging is a development practice that plays an important role in the operations and monitoring of complex systems. Developers place log statements in the source code and use log data to understand how the system behaves in production. Unfortunately, anticipating where to log during development is challenging. Previous studies show the feasibility of leveraging machine learning to recommend log placement despite the data imbalance since logging is a fraction of the overall code base. However, it remains unknown how those techniques apply to an industry setting, and little is known about the effect of imbalanced data and sampling techniques. In this paper, we study the log placement problem in the code base of Adyen, a large-scale payment company. We analyze 34,526 Java files and 309,527 methods that sum up +2M SLOC. We systematically measure the effectiveness of five models based on code metrics, explore the effect of sampling techniques, understand which features models consider to be relevant for the prediction, and evaluate whether we can exploit 388,086 methods from 29 Apache projects to learn where to log in an industry setting. Our best performing model achieves 79% of balanced accuracy, 81% of precision, 60% of recall. While sampling techniques improve recall, they penalize precision at a prohibitive cost. Experiments with open-source data yield under-performing models over Adyen’s test set; nevertheless, they are useful due to their low rate of false positives. Our supporting scripts and tools are available to the community.
机译:日志记录是一个开发实践,在复杂系统的运营和监控中起着重要作用。开发人员在源代码中将日志语句放置在源代码中,并使用日志数据来了解系统在生产中的行为方式。不幸的是,预测在开发期间登录的地方都具有挑战性。以前的研究表明,尽管日志记录是整个代码基础的一小部分,但是尽管数据不平衡,但耗尽机器学习建议的可行性。然而,它仍然不知道这些技术如何适用于行业环境,并且关于不平衡数据和采样技术的影响很少。在本文中,我们研究了一家大型支付公司Adyen代码库的日志放置问题。我们分析34,526个Java文件和309,527个方法,总结+ 2M Sloc。我们系统地测量基于代码指标的五种模型的有效性,探讨采样技术的效果,了解模型考虑与预测相关的功能,并评估我们是否可以从29个Apache项目中利用388,086种方法来学习登录的位置一个行业环境。我们最好的表演模式达到了79%的均衡准确度,精度的81%,召回了60%。虽然采样技术改善了召回,但他们以禁止的成本惩罚精度。在Adyen的测试集上进行开源数据产量的实验;然而,由于它们的误报率低,它们是有用的。我们的支持脚本和工具可供社区使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号