首页> 外文学位 >A transparent collaborative framework for efficient data analysis and knowledge annotation on the Web.
【24h】

A transparent collaborative framework for efficient data analysis and knowledge annotation on the Web.

机译:一个透明的协作框架,用于在Web上进行有效的数据分析和知识注释。

获取原文
获取原文并翻译 | 示例

摘要

High-throughput experiments and ultrascale computing generate scientific data of growing size and complexity. These trends challenge traditional data analysis environments, most of which are based on scripting languages such as R, MATLAB or IDL, in a number of ways. To address some of these challenges, this research proposes a framework with the overarching goal to enable large-scale high-performance data analytics and collaborative knowledge annotation over the Web.;The proposed framework has three major components, which parallel the three core steps of the knowledge discovery cycle. (1) For the first step, defining the data analysis pipeline, the research designs and implements a Web-enabled interactive and collaborative statistical R-based environment. The component implements a memory management system that minimizes memory requirements thereby enabling multi-user scalability. To the best of our knowledge, this is the first Web-enabled R system that supports interactive remote access to R servers and enables users to share data, results and analysis sessions. (2) For the second step, executing the data analysis pipeline, the research investigates and proposes a transparent and low-overhead means for executing external compiled language parallel codes from within R, thus seamlessly bridging two code development paradigms: efficient, compiled parallel codes and high abstraction and easy-to-use scripting codes. This component contains three elements: a transparent bidirectional translation of data objects between R and compiled languages, such as C/C++/Fortran; seamless integration of external parallel codes; and automatic parallelization of data-parallel computations in hybrid multi-core and multi-node execution environments. (3) For the third step, annotating the predictive knowledge derived from community analysis pipelines, the research explores an environment for semantically rich, structured and queriable annotation of facts, relationships between those facts, and complex events reported in scientific literature. The social networking nature of this component allows the community to improve the predictions as well as generate new, higher-level inferences, thus rolling in the gaps in the communities' understanding of physical phenomena. The environment offers mechanisms for streamlining the annotated and curated knowledge into distributed public databases, thus enabling a feedback loop into the database-publication cycle to allow scientists to make connections between data-driven predictions and published evidence.
机译:高通量实验和超大规模计算生成的科学数据的规模和复杂性都在不断增长。这些趋势挑战了传统的数据分析环境,其中大多数以多种方式基于脚本语言,例如R,MATLAB或IDL。为了解决其中的一些挑战,本研究提出了一个框架,该框架的总体目标是通过Web实现大规模的高性能数据分析和协作性知识注释。该框架包含三个主要组成部分,与三个核心步骤并行知识发现周期。 (1)在第一步中,定义数据分析管道,该研究设计并实现了一个基于Web的交互式和协作式统计R环境。该组件实现了一个内存管理系统,该系统将内存需求降至最低,从而实现了多用户可伸缩性。据我们所知,这是第一个支持Web的R系统,该系统支持对R服务器的交互式远程访问,并使用户能够共享数据,结果和分析会话。 (2)第二步,执行数据分析管道,研究并提出一种透明且低开销的方法,用于从R内部执行外部编译语言并行代码,从而无缝地桥接两种代码开发范例:高效,编译后的并行代码以及高度抽象和易于使用的脚本代码。该组件包含三个元素:R和已编译语言(例如C / C ++ / Fortran)之间数据对象的透明双向转换;无缝集成外部并行代码;混合多核和多节点执行环境中数据并行计算的自动并行化。 (3)在第三步中,对从社区分析管道中获得的预测知识进行注释,该研究探索了一个事实的语义丰富,结构化和可查询的注释,这些事实之间的关系以及科学文献中报道的复杂事件的环境。此组件的社交网络性质允许社区改进预测并生成新的更高级别的推断,从而填补社区对物理现象的理解的空白。该环境提供了将带注释的和经过整理的知识简化为分布式公共数据库的机制,从而使反馈循环进入数据库的发布周期,从而使科学家能够在数据驱动的预测与已发表的证据之间建立联系。

著录项

  • 作者

    Breimyer, Paul.;

  • 作者单位

    North Carolina State University.;

  • 授予单位 North Carolina State University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 133 p.
  • 总页数 133
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号