A contemporary feature selection and classification framework for imbalanced biomedical datasets

Thulasi Bikku; Sambasiva Rao Nandam; Ananda Rao Akepogu

首页> 外文期刊>Egyptian Informatics Journal >A contemporary feature selection and classification framework for imbalanced biomedical datasets

【24h】

A contemporary feature selection and classification framework for imbalanced biomedical datasets

机译：不平衡生物医学数据集的当代特征选择和分类框架

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Due to the availability of a large number of biomedical documents in the PubMed and Medline repositories, it is difficult to analyze, predict and interpret the document’s information using the traditional document clustering and classification models. Traditional document clustering and classification models were failed to analyze the document sets based on the user’s keyword and MESH terms. Due to the large number of feature sets, conventional models, such as SVM, Neural Networks, Multi-nominal na?ve bayes have been used as feature classification, where additional text filtering measures are typically used as feature selection process. Also, as the size of the document’s increases, it becomes difficult to find the outliers using the document’s features and MESH terms. Biomedical document clustering and classification is one of the essential machine learning models for the knowledge extraction process of the real-time user recommended systems. In this paper, we developed a novel biomedical document feature clustering and classification model as a user recommended system for large document sets using the Hadoop framework. In this model, a novel gene feature clustering with ensemble document classification was implemented on biomedical repositories (PubMed and Medline) using the MapReduce framework. Experimental results show that the proposed model has a high computational cluster quality rate and true positive classification rate compared to traditional document clustering and classification models.

机译：由于PubMed和Medline信息库中提供了大量生物医学文档，因此难以使用传统文档聚类和分类模型来分析，预测和解释文档信息。传统的文档聚类和分类模型无法基于用户的关键字和MESH术语来分析文档集。由于大量的特征集，传统的模型（例如SVM，神经网络，多名词幼稚贝叶斯）已被用作特征分类，而附加的文本过滤措施通常被用作特征选择过程。另外，随着文档大小的增加，使用文档的功能和MESH术语很难找到异常值。生物医学文档聚类和分类是实时用户推荐系统的知识提取过程中必不可少的机器学习模型之一。在本文中，我们开发了一种新颖的生物医学文档特征聚类和分类模型，作为使用Hadoop框架的大型文档集的用户推荐系统。在该模型中，使用MapReduce框架在生物医学存储库（PubMed和Medline）上实现了具有整体文档分类的新型基因特征聚类。实验结果表明，与传统的文档聚类和分类模型相比，该模型具有较高的计算聚类质量率和真实的正分类率。

著录项

来源
《Egyptian Informatics Journal》 |2018年第3期|共8页
作者
Thulasi Bikku; Sambasiva Rao Nandam; Ananda Rao Akepogu;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类生物信息技术及电子计算机的应用;
关键词
Biomedical dataDocument clusteringDocument classificationBioinformaticsUser recommended system;

机译：生物医学数据文档聚类文档分类生物信息学用户推荐系统;

相似文献

外文文献
中文文献
专利

1. Dealing with high-dimensional class-imbalanced datasets: Embedded feature selection for SVM classification [J] . Maldonado Sebastian, Lopez Julio Applied Soft Computing . 2018,第期

机译：处理高维类别 - 不平衡数据集：SVM分类的嵌入式功能选择
2. Feature selection and classification of imbalanced datasets: application to PET images of children with autistic spectrum disorders. [J] . Duchesnay E, Cachia A, Boddaert N, NeuroImage . 2011,第3期

机译：不平衡数据集的特征选择和分类：应用于自闭症谱系障碍儿童的PET图像。
3. A novel multi-class ensemble model based on feature selection using Hadoop framework for classifying imbalanced biomedical data [J] . Thulasi Bikku, N. Sambasiva Rao, Ananda Rao Akepogu International Journal of Business Intelligence and Data Mining . 2019,第1a2期

机译：一种基于特征选择的新型多类集成模型，该模型使用Hadoop框架对不平衡的生物医学数据进行分类
4. Addressing Overlapping in Classification with Imbalanced Datasets: A First Multi-objective Approach for Feature and Instance Selection [C] . Alberto Fernandez, Maria Jose del Jesus, Francisco Herrera International conference on intelligent data engineering and automated learning . 2015

机译：解决不平衡数据集分类中的重叠：特征和实例选择的第一种多目标方法
5. A Model Fusion Based Framework For Imbalanced Classification Problem with Noisy Dataset [D] . He, Miao. 2014

机译：含噪声数据集的不平衡分类问题的基于模型融合的框架
6. Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification [O] . Jinyan Li, Simon Fong, Yunsick Sung, 2016

机译：生物医学数据分类中基于二元不平衡数据集的自适应群聚动态多目标综合少数抽样技术算法
7. Feature selection and classification of imbalanced datasets. Application to PET images of children with Autistic Spectrum Disorders [O] . Duchesnay Edouard, Cachia Arnaud, Boddaert Nathalie, 2011

机译：不平衡数据集的特征选择和分类。在自闭症儿童PET图像中的应用

A contemporary feature selection and classification framework for imbalanced biomedical datasets

摘要

著录项

相似文献

相关主题

期刊订阅