首页> 外文期刊>Journal of Intelligent Systems >A New Approach to Automatic Species Identification Using Biological Data Mining
【24h】

A New Approach to Automatic Species Identification Using Biological Data Mining

机译:使用生物数据挖掘自动种类鉴定的新方法

获取原文
获取原文并翻译 | 示例
           

摘要

The paper aims at designing a scheme for automatic identification of a species from its genome sequence. A set of 64 three-tuple 004 is first generated using the four types of bases: A, T, C, and G. These 004 are searched on N randomly sampled genome sequences, each of a given length (10,000 elements) and the frequency count for each of the 43 = 64 004 is performed to obtain a DNA-descriptor for each sample. Principal Component analysis is then employed on the DNA-descriptors for N sampled instances. The principal component analysis yields a unique feature descriptor for identifying the species from its genome sequence. The variance of the descriptors for a given genome sequence being negligible, the proposed scheme finds extensive applications in automatic species identification. Next, a computational map is trained by the Self-Organizing Feature Map algorithm using the DNA-descriptors from different species as the training inputs. The map is shown to provide an easier technique for recognition and classification of a species based on its genomic data.
机译:本文旨在设计一种从其基因组序列自动鉴定物种的方案。首先使用四种类型的碱基产生64个三元组004:A,T,C和G.这些004在N随机采样的基因组序列上搜索,每个给定长度(10,000个元素)和频率执行43 = 64 004中的每一个的计数以获得每个样品的DNA描述符。然后对N采样实例的DNA描述符进行主成分分析。主成分分析产生了唯一的特征描述符,用于从其基因组序列识别物种。对于给定的基因组序列的描述符的方差是可忽略不计的,所提出的方案在自动种类识别中找到了广泛的应用。接下来,通过使用来自不同物种的DNA描述符作为训练输入,通过自组织特征映射算法训练计算地图。示出了地图以提供一种更容易的技术,用于基于其基因组数据来识别和分类物种。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号