首页> 外文会议>International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics >Random Sample Consensus for the Robust Identification of Outliers in Cancer Data
【24h】

Random Sample Consensus for the Robust Identification of Outliers in Cancer Data

机译:随机样本共识,用于癌症数据中的异常值识别

获取原文

摘要

Random sample consensus (Ransac) is a technique that has been widely used for modeling data with a large amount of noise. Although successfully employed in areas such as computer vision, extensive testing and applications to clinical data, particularly in oncology, are still lacking. We applied this technique to synthetic and biomedical datasets, publicly available at The Cancer Genome Atlas (TCGA) and the UC Irvine Machine Learning Repository, to identify outliers in the classification of tumor samples. The results obtained by combining Ransac with logistic regression were compared against a baseline classical logistic model. To evaluate the robustness of this method, the original datasets were then perturbed by generating noisy data and by artificially switching the labels. The flagged outlier observations were compared against the misclassifications of the baseline logistic model, along with the evaluation of the overall accuracy of both strategies. Ransac has shown high precision in classifying a subset of core (inlier) observations in the datasets evaluated, while simultaneously identifying the outlier observations, as well as robustness to increasingly perturbed data.
机译:随机样本共识(RANSAC)是一种通过广泛用于使用大量噪声建模数据的技术。虽然在计算机视觉,广泛的测试和应用程序等领域成功雇用,但特别是在肿瘤学中,仍然缺乏。我们将该技术应用于合成和生物医学数据集,公开可用于癌症基因组Atlas(TCGA)和UC Irvine机器学习储存库,以识别肿瘤样本分类中的异常值。通过将Ransac与Logistic回归组合获得的结果与基线古典物流模型进行了比较。为了评估该方法的稳健性,然后通过产生噪声数据并通过人工切换标签来扰乱原始数据集。将标记的异常观测与基线物流模型的错误分类进行了比较,以及评估两种策略的整体准确性。 Ransac在分类评估的数据集中的核心(Inlier)观测的子集中,ransac已经高精度,同时识别异常观察,以及越来越扰动数据的鲁棒性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号