A survey of techniques for IP traffic identification based on machine learning is proposed, and related background knowledge is introduced in this paper. Grouping and normalization are applied to process feature sets after feature extraction. Meanwhile, an improved algorithm is proposed by combine classic clustering algorithms of DBSCAN and BIRCH algorithm is proposed. Experiment results show new feature sets get better overall-accuracy and need shorter time to processing than original feature sets. The algorithm has higher precision and recall.% 文章首先对基于机器学习算法的流特征分类方法研究现状进行了总结,对相关背景知识做了介绍。在特征集选择方面,依据选取的基础特征集所表达特点的不同采用独立的归一化度量准则。在聚类算法方面,根据DBSCAN和BIRCH算法的特性,提出了一种结合DBSCAN和BIRCH算法的改进算法;实验结果表明,与使用原始特征集分类相比,基于归一化特征集的分类处理时间缩短、全局准确率提高,而且改进的算法比传统BIRCH算法的精确率和召回率都明显提高。
展开▼