首页> 外文期刊>BMC Bioinformatics >IPCARF: improving lncRNA-disease association prediction using incremental principal component analysis feature selection and a random forest classifier
【24h】

IPCARF: improving lncRNA-disease association prediction using incremental principal component analysis feature selection and a random forest classifier

机译:IPCarf:使用增量主成分分析特征选择和随机林分类器改善LNCRNA疾病关联预测

获取原文
           

摘要

Identifying lncRNA-disease associations not only helps to better comprehend the underlying mechanisms of various human diseases at the lncRNA level but also speeds up the identification of potential biomarkers for disease diagnoses, treatments, prognoses, and drug response predictions. However, as the amount of archived biological data continues to grow, it has become increasingly difficult to detect potential human lncRNA-disease associations from these enormous biological datasets using traditional biological experimental methods. Consequently, developing new and effective computational methods to predict potential human lncRNA diseases is essential. Using a combination of incremental principal component analysis (IPCA) and random forest (RF) algorithms and by integrating multiple similarity matrices, we propose a new algorithm (IPCARF) based on integrated machine learning technology for predicting lncRNA-disease associations. First, we used two different models to compute a semantic similarity matrix of diseases from a directed acyclic graph of diseases. Second, a characteristic vector for each lncRNA-disease pair is obtained by integrating disease similarity, lncRNA similarity, and Gaussian nuclear similarity. Then, the best feature subspace is obtained by applying IPCA to decrease the dimension of the original feature set. Finally, we train an RF model to predict potential lncRNA-disease associations. The experimental results show that the IPCARF algorithm effectively improves the AUC metric when predicting potential lncRNA-disease associations. Before the parameter optimization procedure, the AUC value predicted by the IPCARF algorithm under 10-fold cross-validation reached 0.8529; after selecting the optimal parameters using the grid search algorithm, the predicted AUC of the IPCARF algorithm reached 0.8611. We compared IPCARF with the existing LRLSLDA, LRLSLDA-LNCSIM, TPGLDA, NPCMF, and ncPred prediction methods, which have shown excellent performance in predicting lncRNA-disease associations. The compared results of 10-fold cross-validation procedures show that the predictions of the IPCARF method are better than those of the other compared methods.
机译:鉴定LNCRNA疾病协会不仅有助于更好地理解LNCRNA水平的各种人类疾病的潜在机制,而且还加速了潜在的疾病诊断,治疗,预测和药物反应预测的潜在生物标志物。然而,随着存档的生物数据的持续增长,使用传统的生物实验方法越来越难以从这些巨大的生物数据集中检测潜在的人类lncrna疾病关联。因此,开发新的和有效的计算方法来预测潜在的人类LNCRNA疾病至关重要。使用增量主成分分析(IPCA)和随机森林(RF)算法的组合以及通过集成多个相似性矩阵,我们提出了一种基于集成机器学习技术的新算法(IPCARF),用于预测LNCRNA疾病关联。首先,我们使用了两种不同的模型来计算来自疾病的定向非环状图的语义相似性矩阵。其次,通过整合疾病相似性,LNCRNA相似性和高斯核相似性来获得每个LNCRNA疾病对的特征载体。然后,通过应用IPCA来减少原始功能集的维度来获得最佳特征子空间。最后,我们训练RF模型以预测潜在的LNCRNA疾病协会。实验结果表明,当预测潜在的LNCRNA疾病关联时,IPCarf算法有效改善AUC度量。在参数优化过程之前,在10倍交叉验证下由IPCARF算法预测的AUC值达到0.8529;使用网格搜索算法选择最佳参数后,IPCARF算法的预测AUC达到0.8611。我们将IPCarf与现有的LRLSLDA,LRLSLDA-LNCSIM,TPGLDA,NPCMF和NCPRED预测方法进行比较,这在预测LNCRNA疾病关联方面表现出出色的性能。 10倍交叉验证程序的比较结果表明,IPCarf方法的预测优于其他比较方法的预测。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号