首页> 外文会议>International Conference on Business Information Systems >Materia: A Data Quality Control Embedded Domain Specific Language in Python
【24h】

Materia: A Data Quality Control Embedded Domain Specific Language in Python

机译:原药:Python中的数据质量控制嵌入域特定语言

获取原文

摘要

Current solutions for data quality control (QC) in the environmental sciences are locked within propriety platforms or reliant on specialized software. This can pose a problem for data users when attempting to integrate QC into their existing workflows. To address this limitation, we developed an embedded domain specific language (EDSL), Materia, that provides functions, data structures, and a fluent syntax for defining and executing quality control tests on data. Materia enables developers to more easily integrate QC into complex data pipelines and makes QC more accessible for students and citizen scientists. We evaluate Materia via two metrics: productivity and a quantitative performance analysis. Our productivity examples show how Materia can simplify complex descriptions of tests in Pandas and mirror natural language descriptions of common QC tests. We also demonstrate that Materia achieves satisfactory performance with over 200,000 floatingpoint values processed in under three seconds.
机译:在环境科学数据质量控制(QC)当前的解决方案礼的平台内锁定或依赖于专门的软件。试图QC整合到他们现有的工作流程时,这可能会造成数据的用户的一个问题。为了解决这种限制,我们开发了一个嵌入域专用语言(EDSL),中药,它提供的功能,数据结构,以及用于限定和数据执行质量控制测试一个流利语法。本草使开发人员能够更轻松地QC集成到复杂的数据管道,使QC学生和公民科学家更容易获得。生产力和定量性能分析:我们通过两个指标来评估魔石。我们的生产力例子表明本草如何简化测试复杂的描述在熊猫和反映的共同QC测试自然语言描述。我们还表明,本草实现了与下三秒钟处理超过20万的浮点值令人满意的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号