首页> 外文期刊>Journal of digital information management >An Improved SMOTE Algorithm Based on Genetic Algorithm for Imbalanced Data Classification
【24h】

An Improved SMOTE Algorithm Based on Genetic Algorithm for Imbalanced Data Classification

机译:改进的基于遗传算法的SMOTE算法在不平衡数据分类中的应用

获取原文
获取原文并翻译 | 示例
           

摘要

Classification of imbalanced data has been recognized as a crucial problem in machine learning and data mining. In an imbalanced dataset, minority class in-stances are likely to be misclassified. When the synthetic minority over-sampling technique (SMOTE) is applied in imbalanced dataset classification, the same sampling rate is set for all samples of the minority class in the process of synthesizing new samples, this scenario involves blindness. To overcome this problem, an improved SMOTE algorithm based on genetic algorithm (GA), namely, GASMOTE was proposed. First, GASMOTE set different sampling rates for different minority class samples. A combination of the sampling rates corresponded to an individual in the population. Second, the selection, crossover, and mutation operators of GA were iteratively applied to the population to obtain the best combination of sampling rates when the stopping criteria were met. Lastly, the best combination of sampling rates was used in SMOTE to synthetize new samples. Experimental results on 10 typical imbalanced datasets show that GASMOTE increases the F-measure value by 5.9% and the G-mean value by 1.6% compared with the SMOTE algorithm. Meanwhile, GASMOTE increases the F-measure value by 3.7% and the G-mean value by 2.3% compared with the borderline-SMOTE algorithm. GASMOTE can be utilized as a new over-sampling technique to address the problem of imbalanced dataset classification. The GA SMO TE algorithm can be then adopted in a practical engineering application, namely, prediction of rockburst in VCR rockburst datasets. The experimental results indicate that the GASMOTE algorithm can accu rately predict the rockburst occurrence and thus provides guidance to the design and construction of safe deep-mining engineering structures.
机译:不平衡数据的分类已被认为是机器学习和数据挖掘中的关键问题。在不平衡的数据集中,少数派的立场很可能会被错误分类。当在不平衡数据集分类中应用合成少数样本过采样技术(SMOTE)时,在合成新样本的过程中为少数类别的所有样本设置了相同的采样率,这种情况涉及盲目性。为了克服这一问题,提出了一种基于遗传算法的改进的SMOTE算法,即GASMOTE。首先,GASMOTE为不同的少数族裔样本设置了不同的采样率。抽样率的组合对应于总体中的一个人。其次,将GA的选择,交叉和变异算子迭代应用于群体,以在满足停止标准时获得最佳采样率组合。最后,在SMOTE中使用了最佳的采样率组合来合成新样本。在10个典型的不平衡数据集上的实验结果表明,与SMOTE算法相比,GASMOTE将F测量值提高了5.9%,将G均值提高了1.6%。同时,与边界线SMOTE算法相比,GASMOTE的F测量值增加了3.7%,G均值增加了2.3%。 GASMOTE可以用作一种新的过采样技术,以解决数据集分类不平衡的问题。然后可以在实际工程应用中采用GA SMO TE算法,即在VCR岩爆数据集中预测岩爆。实验结果表明,GASMOTE算法可以准确预测岩爆的发生,为安全深挖工程结构的设计和施工提供指导。

著录项

  • 来源
    《Journal of digital information management》 |2016年第2期|92-103|共12页
  • 作者单位

    School of Mathematics and Computer Science Hubei University of Arts and Science Xiangyang, 441053, China,Institute of Logic and Intelligence, Southwest University Chongqing, 400715, China;

    Oujiang College, Wenzhou University, Wenzhou, 325035, China;

    School of Mathematics and Computer Science Hubei University of Arts and Science Xiangyang, 441053, China;

    School of Mathematics and Computer Science Hubei University of Arts and Science Xiangyang, 441053, China;

    School of Mathematics and Computer Science Hubei University of Arts and Science Xiangyang, 441053, China,Department of Electrical and Computer Engineering Old Dominion University, Norfolk, USA;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Imbalanced Dataset; Classification; SMOTE; Sampling Rate; Genetic Algorithm; Rockburst;

    机译:数据集不平衡;分类;枪击采样率;遗传算法岩爆;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号