首页> 外文会议>International Conference on Advanced Computer Science and Information Systems >Bootstrap aggregating of classification and regression trees in identification of single nucleotide polymorphisms
【24h】

Bootstrap aggregating of classification and regression trees in identification of single nucleotide polymorphisms

机译:自举聚合分类树和回归树以鉴定单核苷酸多态性

获取原文

摘要

Big data in area of molecular biology has increased rapidly since Next-Generation Sequencing (NGS) technology introduced, a new technology used to sequence DNA with high throughput. Identification of polymorphism in nucleotide is an upstream analysis for some downstream analysis such as producing quality seed based on molecular marker for plant breeding. This paper discusses identification of Single Nucleotide Polymorphism (SNP) underlying NGS data of cultivated soybean (Glycine max L) using CART (Classification and Regression Tree). The Identification showed that 51% of true positive SNP could be identified with precision 67%. In order to increase model's performance, Bootstrap Aggregating (bagging) CART was developed with varied number of bootstrap (11, 21, 31, 41, 51, 61, 71, 81, 91). The evaluation indicated that TPR and precision was trade off, when model's TPR was increase the precision one would be decreased. Because of that, F-measure was used as metrics of evaluation. Bagging CART with 51 bootstrap was the best model since it could identify 60% of true positive SNP with precision 66% and F-measure 0.63, while F-measure of model with raw CART was 0.58. The results pointed out that, applying bagging in CART could increase model's performance based on F-measure.
机译:自从引入下一代测序(NGS)技术以来,分子生物学领域的大数据迅速增长,该技术是一种用于高通量DNA测序的新技术。核苷酸多态性的鉴定是一些下游分析的上游分析,例如基于分子标记为植物育种生产优质种子。本文讨论使用CART(分类和回归树)鉴定栽培大豆(Glycine max L)的NGS数据基础上的单核苷酸多态性(SNP)。鉴定表明,可以以67%的精度鉴定出真正阳性SNP的51%。为了提高模型的性能,开发了具有不同数量的引导程序(11、21、31、41、51、61、71、81、91)的Bootstrap聚合(装袋)CART。评估表明,TPR和精度是折衷的,当模型的TPR增加时,精度会降低。因此,将F度量用作评估指标。采用51自举程序的袋装CART是最好的模型,因为它可以识别出60%的真实阳性SNP,精确度为66%,F值为0.63,而原始CART模型的F值为0.58。结果表明,在CART中应用套袋可以提高基于F量度的模型的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号