首页> 外文期刊>BMC Medical Informatics and Decision Making >A machine learning framework for accurately recognizing circular RNAs for clinical decision-supporting
【24h】

A machine learning framework for accurately recognizing circular RNAs for clinical decision-supporting

机译:用于准确识别循环RNA的机器学习框架,用于临床决策支持

获取原文
           

摘要

Circular RNAs (circRNAs) are those RNA molecules that lack the poly (A) tails, which present the?closed-loop structure. Recent studies emphasized that some?circRNAs imply different functions from canonical transcripts, and further associated with complex diseases. Several?computational methods have been developed for detecting circRNAs from RNA-seq data. However, the existing methods prefer to high sensitivity strategies, which always?introduce many false positives. Thus, in clinical decision-supporting system, a comprehensive filtering approach is needed for accurately recognizing real?circRNAs?for decision models. In this paper, we first reviewed the detection strategies of the existing methods. According to the features from RNA-seq?data, we showed that?any single feature (data signal) selected by the existing strategies cannot accurately distinguish a?circRNA. However, we found that some combinations of those?features (data signals) could be used as signatures?for recognizing circRNAs. To avoid the?high computational complexity of the combinational optimization problem, we present CIRCPlus2, which adopts a machine learning framework to recognize real?circRNAs according to multiple data signals captured from RNA-seq data. By comparing multiple machine learning frameworks, CIRCPlus2 adopts a Gradient Boosting Decision Tree (GBDT) framework. Given a set of candidate circRNAs, reported by any existing detection tool(s), the features of each candidate are extracted from the aligned reads. The GBDT framework can be?trained by a?training dataset. By applying the selected?features on the framework, the predictions on true/false positives are reported. To verify the performance?of the proposed approach, we conducted several groups of experiments on both real RNA-seq datasets and a series of simulation datasets with different preset configurations. The results demonstrated that CIRCPlus2 clearly improved the specificities, while it also maintained high levels of sensitivities. Filtering false positives is quite important in RNA-seq data analysis pipeline. Machine learning framework is suitable for solving this?filtering problem. CIRCPlus2 is an efficient approach to identify the false positive?circRNAs from the real ones.
机译:圆形RNA(CircRNA)是那些缺乏缺乏聚(a)尾部的RNA分子,其呈现闭环结构。最近的研究强调,一些?Circrnas意味着典型成绩单的不同功能,以及与复杂的疾病相关。已经开发了几个用于从RNA-SEQ数据检测CircRNA的计算方法。然而,现有方法更喜欢高灵敏度策略,总是?引入许多误报。因此,在临床决策支持系统中,需要一种综合的过滤方法来准确识别真实?Circrnas?用于决策模型。在本文中,我们首先审查了现有方法的检测策略。根据RNA-SEQ的特征?数据,我们展示了现有策略所选择的任何单一特征(数据信号)不能准确地区分a relcrna。但是,我们发现这些组合?特征(数据信号)可以用作签名?识别Circrnas。为了避免组合优化问题的高计算复杂性,我们呈现CircPlus2,它采用机器学习框架来识别REARCRNA根据从RNA-SEQ数据捕获的多个数据信号。通过比较多机器学习框架,CircPlus2采用梯度升压决策树(GBDT)框架。给定一组候选Circrnas由任何现有检测工具报告,每个候选的特征都从对准的读取中提取。 GBDT框架可以是?由A训练训练数据集。通过应用所选?在框架上的功能,报告了对真/误呈现的预测。为了验证性能?提出的方法,我们在真正的RNA-SEQ数据集和具有不同预设配置的一系列模拟数据集上进行了几组实验。结果表明,昼夜普利2明显改善了特异性,而它也保持了高水平的敏感性。过滤误报在RNA-SEQ数据分析管道中非常重要。机器学习框架适用于解决此问题?过滤问题。 CircPlus2是一种有效的方法来识别真实的误报的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号