首页> 外文学位 >Provenance-aware framework for lightweight capture and high quality data regeneration.
【24h】

Provenance-aware framework for lightweight capture and high quality data regeneration.

机译:起源感知框架,用于轻量级捕获和高质量数据再生。

获取原文
获取原文并翻译 | 示例

摘要

The provenance, or derivation history, of a dataset is additional data that is necessary for making determinations of data quality, and for repeatability of the science behind the results, leading to data regeneration. Existing models and techniques for provenance capture identify and record provenance only during the execution of an experiment. These mechanisms are coarse grained, frequently unable to capture the exact mapping between the inputs and outputs of an experiment and capture only the processes and the inputs to the processes that generate the data. But the underlying programs change and the applications that create an experiment are frequently regenerated with different configurations or parameters. Additionally, provenance capture from different programming environments and execution platforms have so far required human intervention, manually annotating applications that can result in incorrect or missing information. Currently, no generic model or framework exists to automatically identify and capture provenance, addressing these issues that are required for regenerating the data from a wide range of applications running on different environments.;This dissertation proposes a unified model of provenance capture for high quality data regeneration and builds a framework for automatically capturing provenance with extremely low overhead. The generic framework captures provenance for both the application and the data generated by the application. A combination of compilers, a runtime system and a rule-engine is used for collecting provenance at different levels of application deployment and execution. This dissertation highlights the generality of this framework and studies the viability of capturing provenance through static (compile-time) and dynamic (runtime) components. We evaluate our methodology for different types of applications and benchmarks and show that capturing static and dynamic provenance contributes to high quality data analysis and regeneration. Finally, we study and recommend provenance capture approaches taking into consideration application and platform limitations.
机译:数据集的出处或派生历史记录是确定数据质量以及结果背后的科学可重复性所必需的附加数据,从而导致数据再生。现有的种源捕获模型和技术仅在执行实验时才识别和记录种源。这些机制是粗粒度的,经常无法捕获实验的输入和输出之间的精确映射,而仅捕获过程和生成数据的过程的输入。但是底层程序发生了变化,并且创建实验的应用程序经常使用不同的配置或参数来重新生成。此外,迄今为止,从不同的编程环境和执行平台捕获物源需要人工干预,手动注释可能会导致错误或丢失信息的应用程序。当前,尚不存在通用的模型或框架来自动识别和捕获来源,从而解决了从运行在不同环境中的各种应用程序中重新生成数据所需要的这些问题。重新生成并建立一个框架,以极低的开销自动捕获源。通用框架可捕获应用程序以及应用程序生成的数据的来源。编译器,运行时系统和规则引擎的组合用于收集应用程序部署和执行的不同级别的出处。本文着重介绍了该框架的一般性,并研究了通过静态(编译时)和动态(运行时)组件捕获源的可行性。我们针对不同类型的应用程序和基准评估了我们的方法,并表明捕获静态和动态来源有助于高质量的数据分析和再生。最后,我们考虑到应用程序和平台的局限性,研究并推荐出处捕获方法。

著录项

  • 作者

    Ghoshal, Devarshi.;

  • 作者单位

    Indiana University.;

  • 授予单位 Indiana University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2014
  • 页码 242 p.
  • 总页数 242
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号