首页> 外文学位 >Using Pilot-Jobs for Developing eThread, a Meta-threading Pipeline.
【24h】

Using Pilot-Jobs for Developing eThread, a Meta-threading Pipeline.

机译:使用Pilot-Jobs开发eThread(一种元线程管道)。

获取原文
获取原文并翻译 | 示例

摘要

The genome revolution has produced vast amount of sequence information, but the functional annotation of most of the gene products are yet to be explored in depth. Functional inference of low sequence identity is brought about by the structure based template methods. To model and understand these proteome scale functions, state of-the-art algorithms like eThread is used. They are compute intensive and demand effe cient and optimal use of the underlying resources. Combination of large scale data and complex workload raises the need for pilot based approaches. eThread is a metathreading protein structure modeling algorithm which is supported by ten independent single-threading algorithms whose computational complexity also depends on the number and size of the input sequences. In this thesis, eThread pipeline is developed on an extensible, scalable and interoperable pilot-job based framework and it supports concurrent tasks execution and data-parallelization on heterogeneous resources deployed on Amazon EC2 with S3 as data repository. This study aims to understand the dominant factors which influence the performance of eThread on EC2. This analysis suggests an optimized solution based on execution time and cost of implementation. It primarily achieves better utilization of resources by scaling workload on multiple resources. Further ideas on increasing resource capacity and discussions on the importance of dynamic execution of tasks are also laid out.
机译:基因组革命已产生了大量的序列信息,但是大多数基因产物的功能注释尚待深入研究。低序列同一性的功能推断是通过基于结构的模板方法实现的。为了建模和理解这些蛋白质组规模功能,使用了诸如eThread之类的最新算法。它们是计算密集型的,需要对基础资源的有效和最佳利用。大规模数据和复​​杂工作负载的结合提出了对基于试验方法的需求。 eThread是一种元线程蛋白质结构建模算法,由十个独立的单线程算法支持,该算法的计算复杂度还取决于输入序列的数量和大小。在本文中,eThread管道是在基于可扩展,可伸缩且可互操作的领航工作的框架上开发的,它支持在以S3作为数据存储库的Amazon EC2上部署的异构资源上执行并发任务执行和数据并行化。本研究旨在了解影响eThread在EC2上性能的主要因素。该分析提出了基于执行时间和实施成本的优化解决方案。它主要是通过在多个资源上扩展工作负载来更好地利用资源。还提出了有关增加资源容量的进一步构想,并讨论了动态执行任务的重要性。

著录项

  • 作者

    Ragothaman, Anjanibhargavi.;

  • 作者单位

    Rutgers The State University of New Jersey - New Brunswick.;

  • 授予单位 Rutgers The State University of New Jersey - New Brunswick.;
  • 学科 Electrical engineering.;Bioinformatics.
  • 学位 M.S.
  • 年度 2014
  • 页码 60 p.
  • 总页数 60
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号