A Novel Multi-Class Ensemble Model for Classifying Imbalanced Biomedical Datasets

机译：一种用于分类不平衡生物医学数据集的新型多级合奏模型

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper mainly focuseson developing aHadoop based framework for feature selection and classification models to classify high dimensionality data in heterogeneous biomedical databases.Wide research has been performing in the fields of Machine learning, Big data and Data mining for identifying patterns. The main challenge is extracting useful features generated from diverse biological systems. The proposed model can be used for predicting diseases in various applications and identifying the features relevant to particular diseases. There is an exponential growth of biomedical repositories such as PubMed and Medline, an accurate predictive model is essential for knowledge discovery in Hadoop environment. Extracting key features from unstructured documents often lead to uncertain results due to outliers and missing values. In this paper, we proposed a two phase map-reduce framework with text preprocessor and classification model. In the first phase, mapper based preprocessing method was designed to eliminate irrelevant features, missing values and outliers from the biomedical data. In the second phase, a Map-Reduce based multi-class ensemble decision tree model was designed and implemented in the preprocessed mapper data to improve the true positive rate and computational time. The experimental results on the complex biomedical datasets show that the performance of our proposed Hadoop based multi-class ensemble model significantly outperforms state-of-the-art baselines.

机译：本文主要侧重于特征选择和分类模型开发基于Ahadoop的框架，以对异构生物医学数据库中的高维数数据进行分类。Wide Research一直在机器学习领域，大数据和数据挖掘用于识别模式。主要挑战是提取从不同的生物系统产生的有用功能。该拟议模型可用于预测各种应用中的疾病，并鉴定与特定疾病相关的特征。生物医学储存库（如PubMed和Medline）的指数增长，准确的预测模型对于Hadoop环境中的知识发现至关重要。从非结构化文件中提取关键特征通常导致由于异常值和缺失值导致的不确定结果。在本文中，我们提出了一种与文本预处理器和分类模型的两相映射减少框架。在第一阶段，基于映射器的预处理方法旨在消除生物医学数据中的无关功能，缺少值和异常值。在第二阶段，在预处理的映射器数据中设计并实现了一种基于Map-Deford的多级合奏决策树模型，以提高真正的阳性率和计算时间。复杂生物医学数据集的实验结果表明，我们提出的基于Hadoop的多级集合模型的性能显着优于最先进的基线。

著录项

来源
《International Conference on Materials, Alloys and Experimental Mechanics》|2017年|798-1610 p.|共15页
会议地点
作者
ThulasiBikku; DrN Sambasiva Rao; DrAnanda Rao Akepogu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类工程材料学;
关键词
Ensemble model; Map-Reduce; Medical databases; Bioinformatics; Textual Decision Patterns.;

机译：合奏模型;地图减少;医疗数据库;生物信息学;文本决策模式。;

相似文献

外文文献
中文文献
专利

1. A novel multi-class ensemble model based on feature selection using Hadoop framework for classifying imbalanced biomedical data [J] . Thulasi Bikku, N. Sambasiva Rao, Ananda Rao Akepogu International Journal of Business Intelligence and Data Mining . 2019,第1a2期

机译：一种基于特征选择的新型多类集成模型，该模型使用Hadoop框架对不平衡的生物医学数据进行分类
2. AdaBoost-CNN: An adaptive boosting algorithm for convolutional neural networks to classify multi-class imbalanced datasets using transfer learning [J] . Taherkhani Aboozar, Cosma Georgina, McGinnity T. M. Neurocomputing . 2020,第Sepa3期

机译：adaboost-cnn：卷积神经网络的自适应促进算法，用于使用传输学习对多级不平衡数据集进行分类
3. Classifier Selection and Ensemble Model for Multi-class Imbalance Learning in Education Grants Prediction [J] . Sun Yu, Li Zhanli, Li Xuewen, Applied Artificial Intelligence . 2021,第1a4期

机译：教育补助预测多级不平衡学习的分类器选择和集合模型
4. A Novel Multi-Class Ensemble Model for Classifying Imbalanced Biomedical Datasets [C] . ThulasiBikku, DrN Sambasiva Rao, DrAnanda Rao Akepogu International Conference on Materials, Alloys and Experimental Mechanics . 2017

机译：用于分类生物医学数据集的新型多级合奏模型
5. Classifier design to improve pattern classification and knowledge discovery for imbalanced datasets. [D] . Wang, Kun. 2009

机译：分类器设计可改进模式分类和不平衡数据集的知识发现。
6. iPPBS-Opt: A Sequence-Based Ensemble Classifier for Identifying Protein-Protein Binding Sites by Optimizing Imbalanced Training Datasets [O] . Jianhua Jia, Zi Liu, Xuan Xiao, 2016

机译：iPPBS-Opt：一种基于序列的集成分类器用于通过优化不平衡训练数据集来识别蛋白质与蛋白质的结合位点
7. iPPBS-Opt: A Sequence-Based Ensemble Classifier for Identifying Protein-Protein Binding Sites by Optimizing Imbalanced Training Datasets [O] . Jianhua Jia, Zi Liu, Xuan Xiao, 2016

机译：ippBs-Opt：基于序列的集成分类器，用于通过优化不平衡训练数据集来识别蛋白质 - 蛋白质结合位点
8. Identification and Optimization of Classifier Genes from Multi-Class Earthworm Microarray Dataset [R] . 2010

机译：多类蚯蚓微阵列数据集分类器基因的鉴定与优化

A Novel Multi-Class Ensemble Model for Classifying Imbalanced Biomedical Datasets

摘要

著录项

相似文献

相关主题

期刊订阅