首页> 外文会议>International Conference on Materials, Alloys and Experimental Mechanics >A Novel Multi-Class Ensemble Model for Classifying Imbalanced Biomedical Datasets
【24h】

A Novel Multi-Class Ensemble Model for Classifying Imbalanced Biomedical Datasets

机译:一种用于分类不平衡生物医学数据集的新型多级合奏模型

获取原文

摘要

This paper mainly focuseson developing aHadoop based framework for feature selection and classification models to classify high dimensionality data in heterogeneous biomedical databases.Wide research has been performing in the fields of Machine learning, Big data and Data mining for identifying patterns. The main challenge is extracting useful features generated from diverse biological systems. The proposed model can be used for predicting diseases in various applications and identifying the features relevant to particular diseases. There is an exponential growth of biomedical repositories such as PubMed and Medline, an accurate predictive model is essential for knowledge discovery in Hadoop environment. Extracting key features from unstructured documents often lead to uncertain results due to outliers and missing values. In this paper, we proposed a two phase map-reduce framework with text preprocessor and classification model. In the first phase, mapper based preprocessing method was designed to eliminate irrelevant features, missing values and outliers from the biomedical data. In the second phase, a Map-Reduce based multi-class ensemble decision tree model was designed and implemented in the preprocessed mapper data to improve the true positive rate and computational time. The experimental results on the complex biomedical datasets show that the performance of our proposed Hadoop based multi-class ensemble model significantly outperforms state-of-the-art baselines.
机译:本文主要侧重于特征选择和分类模型开发基于Ahadoop的框架,以对异构生物医学数据库中的高维数数据进行分类。Wide Research一直在机器学习领域,大数据和数据挖掘用于识别模式。主要挑战是提取从不同的生物系统产生的有用功能。该拟议模型可用于预测各种应用中的疾病,并鉴定与特定疾病相关的特征。生物医学储存库(如PubMed和Medline)的指数增长,准确的预测模型对于Hadoop环境中的知识发现至关重要。从非结构化文件中提取关键特征通常导致由于异常值和缺失值导致的不确定结果。在本文中,我们提出了一种与文本预处理器和分类模型的两相映射减少框架。在第一阶段,基于映射器的预处理方法旨在消除生物医学数据中的无关功能,缺少值和异常值。在第二阶段,在预处理的映射器数据中设计并实现了一种基于Map-Deford的多级合奏决策树模型,以提高真正的阳性率和计算时间。复杂生物医学数据集的实验结果表明,我们提出的基于Hadoop的多级集合模型的性能显着优于最先进的基线。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号