...
首页> 外文期刊>Computational statistics & data analysis >Weighted kernel Fisher discriminant analysis for integrating heterogeneous data
【24h】

Weighted kernel Fisher discriminant analysis for integrating heterogeneous data

机译:用于集成异构数据的加权核Fisher判别分析

获取原文
获取原文并翻译 | 示例
           

摘要

Data integration is becoming an essential tool to cope with and make sense of the ever increasing amount of biological data. Genomic data arises in various shapes and forms including vectors, graphs or sequences, therefore, it is essential to carefully consider strategies that best capture the most information contained in each data type. The need for integration of heterogeneous data measured on the same individuals arises in a wide range of clinical applications as well. We propose weighted kernel Fisher discriminant (wKFD) analysis for integrating heterogeneous data sets. We use weights that measure relative importance of each of the data sets to be integrated. Simulation studies are conducted to assess performance of our proposed method. The results show that our method performs very well including in the presence of noisy data. We also illustrate our method using gene expression and clinical data from breast cancer patients. Weighted integration of heterogeneous data leads to improved predictive accuracy. The amount of improvement, however, depends on the quality and informativity of each of the data sets being integrated. If a data set is of poor quality and/or non-informative, one should not expect a significant improvement by adding this particular data set to other informative data sets. Likewise, important improvement might not be obtained if data do not contain independent information, that is, if there is redundancy in the data.
机译:数据集成正在成为应付和理解不断增长的生物数据量的重要工具。基因组数据以各种形状和形式出现,包括矢量,图形或序列,因此,必须仔细考虑能最好地捕获每种数据类型中包含的最多信息的策略。整合在同一个人身上测量的异构数据的需求也出现在广泛的临床应用中。我们建议使用加权核Fisher判别式(wKFD)分析来集成异构数据集。我们使用权重来衡量要集成的每个数据集的相对重要性。进行仿真研究以评估我们提出的方法的性能。结果表明,在存在嘈杂数据的情况下,我们的方法性能很好。我们还使用乳腺癌患者的基因表达和临床数据说明了我们的方法。异构数据的加权集成可提高预测准确性。但是,改进的程度取决于要集成的每个数据集的质量和信息性。如果数据集的质量较差和/或信息量不足,则不应期望通过将此特定数据集添加到其他信息性数据集来获得显着改善。同样,如果数据不包含独立信息,即数据中存在冗余,则可能无法获得重要的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号