首页> 外文会议>Frontiers in the Convergence of Bioscience and Information Technologies >A method for evaluating quality of clustering DNA fragments encoded in different nucleotide frequencies
【24h】

A method for evaluating quality of clustering DNA fragments encoded in different nucleotide frequencies

机译:一种评价不同核苷酸频率编码的聚类DNA片段质量的方法

获取原文

摘要

The whole-genome shotgun sequencing technique has been successfully applied to environmental genomes. However, a considerable amount of DNA sequences and small contigs remain generally unassembled after the shotgun sequencing. Binning is a step of grouping these sequences based on some biological and molecular features. The combination of oligonucleotide frequency and Self-Organising Maps (SOM) clustering algorithm shows high potential as a compositional binning tool. As the previous work did not provide methods for assessing results, we proposed a systematic quantitative method to evaluate the clustering results specifically for this type of application. We used this method to investigate the suitability of each of di, tri, tetra and pentanucleotide frequencies as training feature for this binning technique. The results show that dinucleotide frequency is unable to bin 10kb DNA sequence fragments into well-clustered species groups. Furthermore, we noticed that increasing order of oligonucleotide frequency may deteriorate the assignment of DNA sequences to classes in our test, which indicates the possible existence of optimal species-specific oligonucleotide frequency. Results suggest that using trinucleotide frequency for the combination of oligonucleotide frequency and SOM as a binning process gives sufficiently good clustering quality in this case.
机译:全基因组霰弹枪测序技术已成功应用于环境基因组。然而,在霰弹枪测序后,大量的DNA序列和小折叠仍然是无组合的。分箱是基于一些生物学和分子特征对这些序列进行分组的步骤。寡核苷酸频率和自组织地图(SOM)聚类算法的组合显示出作为组成衬砌工具的高潜力。随着前面的工作没有提供评估结果的方法,我们提出了一种系统的定量方法,以评估专门针对这种类型的应用的聚类结果。我们使用该方法来研究DI,TRI,TETRA和五核核苷酸频率的适用性作为这种分衬技术的训练特征。结果表明,二核苷酸频率不能将10KB DNA序列片段置于聚类良好的物种组中。此外,我们注意到,寡核苷酸频率的增加顺序可能会使DNA序列的分配在我们的测试中对类别进行分配,这表明可能存在最佳的特异性寡核苷酸频率。结果表明,在这种情况下,使用寡核苷酸频率和SOM组合的三核苷酸频率给出足够良好的聚类质量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号