首页> 外文会议>National Conference on Biomedical Engineering >A Clustering-Based Algorithm for De Novo Motif Discovery in DNA Sequences
【24h】

A Clustering-Based Algorithm for De Novo Motif Discovery in DNA Sequences

机译:一种基于聚类的DNA序列De Novo Motif发现的算法

获取原文

摘要

Motif discovery is a challenging problem in molecular biology and has been attracting researcher's attention for years. Different kind of data and computational methods have been used to unravel this problem, but there is still room for improvement. In this study, our goal was to develop a method with the ability to identify all the TFBS signals, including known and unknown, inside the input set of sequences. We developed a clustering method specialized as part of our algorithm which outperforms other existing clustering methods such as DNACLUST and CD-HIT-EST in clustering short sequences. A scoring system was needed to determine how much a cluster is close to being a real motif. Multiple features are calculated based on the contents of each cluster to determine the score of the cluster. These features contain a set of divergence measures, positional, and occurrence information. These scores are combined in a way that a trade-off between them determines the clusters situation. There is an option to compare the final results with the motif databases such as Jolma2013, and UniProbe using Tomtom motif comparison tool. Algorithm Evaluation has been performed on three datasets from ABS database.
机译:MOTIF发现是分子生物学的一个具有挑战性的问题,多年来一直吸引研究员的注意。不同类型的数据和计算方法已被用于解开此问题,但仍有改进的余地。在这项研究中,我们的目标是开发一种能够识别输入组的输入组中的所有TFB信号的方法,包括已知和未知。我们开发了一种专门为我们算法的一部分的聚类方法,其特殊地优于其他现有的聚类方法,例如DNAClust和CD-HIT-EST中的聚类短序列。需要评分系统来确定群集靠近成为真正的主题。基于每个群集的内容来计算多个功能以确定群集的分数。这些功能包含一组分歧测量,位置和发生信息。这些分数在某种程度上组合在某种程度上,它们之间的权衡决定了集群状况。有一个选项可以使用TomTom Motif比较工具将最终结果与Jolma2013等主语数据库进行比较。从ABS数据库的三个数据集执行了算法评估。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号