首页> 外文会议>IEEE International Conference on Bioinformatics and Bioengineering >Extracting the Co-occurrences of DNA Maximal Repeats in both Human and Viruses
【24h】

Extracting the Co-occurrences of DNA Maximal Repeats in both Human and Viruses

机译:提取人类和病毒中的DNA最大重复的共发生

获取原文

摘要

This paper aims to extract significant DNA sequences appearing in both the genomes of human and viruses. To extract the co-occurrences of DNA sequences as long as possible, this study adopts a scalable approach that is based on hadoop mapreduce programming model and can extract maximal repeats; meanwhile compute the class frequency distribution of these repeats. The genomes of human and all of 4,388 viruses available in NCBI are downloaded at 2017/1/14. To take the taxonomy of viruses into consideration for further observation, only the 2,712 viruses that had been named with genus are selected from those 4,388 viruses. In this study, the taxonomic level “genus” is as the units (classes) when comparing viruses and human for experiments. Experimental results show that the longest DNA sequence appearing in both human and viruses extracted in this study is 463 base pair (bp), and that sequence, consisting of tandem repeats as “(CTAACC)n”, appears in the 5th human chromosome and virus “Human herpesvirus 6B”. It may be attractive for virologists to have further research why there exists such a long DNA fragment existed in both human and that virus. Indeed, this study may provide a new direction for genomic sequences comparison across classesthat can provide clues to inspect the existence of the relationship between these DNA maximal repeats (genotypes) with biased class frequency distribution and the features of classes (phenotypes).
机译:本文旨在提取在人和病毒的基因组中出现的显着DNA序列。为了尽可能长地提取DNA序列的共发生,本研究采用一种基于Hadoop MapReduce编程模型的可扩展方法,可以提取最大重复;同时计算这些重复的类频率分布。在2017/1/14下载了NCBI中可用的人类和所有4,388病毒的基因组。为了进一步观察,考虑到病毒的分类,只有2,712个患有Genus的病毒选自这些病毒。在这项研究中,在比较病毒和人类的实验时,分类水平“属”是单位(类)。实验结果表明,本研究中提取的人和病毒中出现的最长的DNA序列是463个碱基对(BP),并且由串联重复作为“(CTAACC)N”组成的序列出现在第5次人染色体和病毒中“人类疱疹病毒6b”。病毒学家可能具有吸引力,进一步研究为什么存在在人类和这种病毒中存在这样的长DNA片段。实际上,该研究可以提供跨学期的基因组序列的新方向可以提供与偏置类频率分布的这些DNA最大重复(基因型)与类别(表型)的特征来检查存在的线索。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号