A comparison of improving multi-class imbalance for internet traffic classification

Qiong Liu; Zhen Liu

首页> 外文期刊>Information systems frontiers >A comparison of improving multi-class imbalance for internet traffic classification

【24h】

A comparison of improving multi-class imbalance for internet traffic classification

机译：改善互联网流量分类的多类别不平衡的比较

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Most research of class imbalance is focused on two class problem to date. A multi-class imbalance is so complicated that one has little knowledge and experience in Internet traffic classification. In this paper we study the challenges posed by Internet traffic classification using machine learning with multi-class unbalanced data and the ability of some adjusting methods, including resampling (random under-sampling, random over-sampling) and cost-sensitive learning. Then we empirically compare the effectiveness of these methods for Internet traffic classification and determine which produces better overall classifier and under what circumstances. Main works are as below. (1) Cost-sensitive learning is deduced with MetaCost that incorporates the misclassification costs into the learning algorithm for improving multi-class imbalance based on flow ratio. (2) A new resampling model is presented including under-sampling and over-sampling to make the multi-class training data more balanced. (3) The solution is presented to compare among three methods or to compare three methods with original case. Experiment results are shown on sixteen datasets that flow g-mean and byte g-mean are statistically increased by 8.6 % and 3.7 %; 4.4 % and 2.8 %; 11.1 % and 8.2 % when three methods are compared with original case. Cost-sensitive learning is as the first choice when the sample size is enough, but resampling is more practical in the rest.

机译：迄今为止，大多数关于阶级失衡的研究都集中在两个阶级问题上。多类不平衡非常复杂，以至于在互联网流量分类方面知识和经验很少。在本文中，我们研究了使用机器学习处理多类不平衡数据对互联网流量进行分类带来的挑战，以及一些调整方法的能力，包括重采样（随机欠采样，随机过采样）和成本敏感型学习。然后，我们根据经验比较这些方法对Internet流量分类的有效性，并确定哪种方法可以产生更好的整体分类器，以及在什么情况下。主要作品如下。（1）使用MetaCost推导成本敏感型学习，该算法将误分类成本合并到学习算法中，以基于流率改善多类不平衡。（2）提出了新的重采样模型，包括欠采样和过采样，以使多类训练数据更加均衡。（3）提出了在三种方法之间进行比较或将三种方法与原始情况进行比较的解决方案。实验结果显示在16个数据集上，其中流量g均值和字节g均值在统计上分别增加了8.6％和3.7％； 4.4％和2.8％;将三种方法与原始情况进行比较时分别为11.1％和8.2％。当样本量足够时，成本敏感型学习是首选，但是在其余样本中，重采样更为实用。

著录项

来源
《Information systems frontiers》 |2014年第3期|509-521|共13页
作者
Qiong Liu; Zhen Liu;
展开▼
作者单位

School of Software Engineering, School of Computer Science and Engineering, South China University of Technology, Guangzhou, Guangdong, China;

School of Software Engineering, School of Computer Science and Engineering, South China University of Technology, Guangzhou, Guangdong, China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Multi-class imbalance; Resampling; Cost-sensitive learning; Internet traffic classification;

机译：多类不平衡;重采样;成本敏感型学习;互联网流量分类;

相似文献

外文文献
中文文献
专利

1. Studying cost-sensitive learning for multi-class imbalance in Internet traffic classification [J] . LIU Zhen, LIU Qiong 中国邮电高校学报（英文版） . 2012,第006期

机译：研究成本敏感型学习，以解决互联网流量分类中的多类别不平衡问题
2. Gradient descent evolved imbalanced data gravitation classification with an application on Internet video traffic identification [J] . Teng Anqi, Peng Lizhi, Xie Yuxi, Information Sciences: An International Journal . 2020,第1期

机译：梯度下降在Internet视频流量识别上具有应用程序的应用程序的不平衡数据重力分类
3. Imbalance accuracy metric for model selection in multi-class imbalance classification problems [J] . Mortaz Ebrahim Knowledge-Based Systems . 2020,第Deca27期

机译：多级不平衡分类问题中模型选择的不平衡精度度量
4. multi-imbalance: Open Source Python Toolbox for Multi-class Imbalanced Classification [C] . Jacek Grycza, Damian Horna, Hanna Klimczak, European conference on machine learning and principles and practice of knowledge discovery in databases . 2020

机译：多不平衡：开源Python工具箱，用于多级不平衡分类
5. Multi-Class ROC Random Forest for Imbalanced Classification [D] . Yan, Jiaju. 2017

机译：用于不平衡分类的多类ROC随机森林
6. Comparison of Two Output-Coding Strategies for Multi-Class Tumor Classification Using Gene Expression Data and Latent Variable Model as Binary Classifier [O] . Sandeep J. Joseph, Kelly R. Robbins, Wensheng Zhang, 2010

机译：使用基因表达数据和潜在变量模型作为二元分类器的多类肿瘤分类的两种输出编码策略的比较
7. Multi-Class Imbalance in Text Classification: A Feature Engineering Approach to Detect Cyberbullying in Twitter [O] . Bandeh Ali Talpur, Declan O’Sullivan 2020

机译：文本分类中的多级不平衡：一种检测Twitter中的网络欺凌的特征工程方法

A comparison of improving multi-class imbalance for internet traffic classification

摘要

著录项

相似文献

相关主题

期刊订阅