首页> 外文会议>Iberian conference on pattern recognition and image analysis >Addressing the Big Data Multi-class Imbalance Problem with Oversampling and Deep Learning Neural Networks
【24h】

Addressing the Big Data Multi-class Imbalance Problem with Oversampling and Deep Learning Neural Networks

机译:通过过采样和深度学习神经网络解决大数据多级不平衡问题

获取原文

摘要

The class imbalance problem is a challenging situation in machine learning but also it appears frequently in recent Big Data applications. The most studied techniques to deal with the class imbalance problem have been Random Over Sampling (ROS), Random Under Sampling (RUS) and Synthetic Minority Over-sampling Technique (SMOTE), especially in two-class scenarios. However, in the Big Data scale, multi-class imbalance scenarios have not extensively studied yet, and only a few investigations have been performed. In this work, the effectiveness of ROS and SMOTE techniques is analyzed in the Big data multi-class imbalance context. The KDD99 dataset, which is a popular multi-class imbalanced big data set, was used to probe these oversampling techniques, prior to the application of a Deep Learning Multi-Layer Perceptron. Results show that ROS and SMOTE are not always enough to improve the classifier performance in the minority classes. However, they slightly increase the overall performance of the classifier in comparison to the unsampled data.
机译:类别不平衡问题是机器学习中的具有挑战性的情况,但它也常见于最近的大数据应用。用于处理班级不平衡问题的最多研究的技术已经是随机的采样(ROS),在抽样(RUS)和合成少数群体过采样技术(SMOTE)下进行随机,尤其是两类方案。但是,在大数据规模中,多级不平衡方案尚未广泛研究,并且只执行了一些调查。在这项工作中,在大数据多级不平衡上下文中分析了ROS和Smote技术的有效性。 KDD99数据集是流行的多级不平衡大数据集,用于探测这些过采样技术,在应用深度学习多层Perceptron之前。结果表明,ROS和SMOTE并不总是足以提高少数群体中的分类器性能。但是,与未跳法数据相比,它们略微提高了分类器的整体性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号