首页> 中文期刊> 《管理工程学报》 >基于SVM求解不均衡数据集分类的主观权重约束方法

基于SVM求解不均衡数据集分类的主观权重约束方法

         

摘要

支持向量机(SVM)的二类分问题中针对不平衡数据集可以通过减少样本信息的不对称性和改进算法来解决.本文针对中小企业中有财务风险与无财务风险样本的不平衡性问题,使用一种带有主观权重约束条件的支持向量机新模型对样本进行分类.实验表明新模型确实提高了财务风险企业即少类样本的识别性能,是一种类别不均衡学习(class imbalance learning)的新方法.%Imbalanced datasets are prevalent in management practice. Developing new methods to deal with imbalanced datasets has been one of the most challenging problems in the field of business intelligence. SVM is a classification method based on statistical learning theory. Although SVM has excellent performance when dealing with general dataset, there is still room for improvement to analyze imbalanced datasets. Two strategies can be used when SVM is employed to classify imbalanced datasets, one is to reduce the asymmetry of the sample information and the other is ID improve the existing SVM algorithms. This paper uses the latter strategy and presents a new SVM algorithm. The experiment shows that compared with the existing SVM the proposed SVM algorithm would have better performance to recognize rare cases in the imbalanced datasets.rnWhen employed the proposed SVM algorithm to solve the linearly separable binary classification problems, SVM constructs a hyper plane whose normal vector could be considered as the objective weights of the corresponding attributes. The larger the absolute value of component, the more influence the corresponding attribute has on decision. Inspired by the integration method of subjective and objective weights in decision theory, this paper first converts the empirical value range of subjective weights to the upper and lower limits of the normal vector, and gets a constraint inequality. We then add the inequality to the constraints of quadratic programming of SVM model. A standard form of the quadratic programming, which is called the SVM method subject to subjective weights, can be derived by Lagrange Equation. We use the proposed model in the practice of financial early-warning for SMEs where datasets are often imbalanced. An approximate linearly separable SVM model is developed to recognize the SMEs with financial risk. We downloaded the financial data of 130 SMEs (including 30 risk SMEs which have been marked by ST) in 2007 from the CSMAR database and selected 12 financial attributes as indicators. First, we trained the dataset using the approximate linearly separable SVM model. Ifω < 0 , thernvalues of attribute j are set as their unary negation forms. The dataset was trained again and we obtained the value of (12Σj=1ωj)Second,rnwe obtained the subjective weight w' ( 0≤w≤1 ) , its standard deviation σ , and inequality B = (ω5+λσ) (12Σj=1ωj)&wS≥ω≥ (ω3-rnλσ) (12Σj=1ωj) = A (λ≥0 ) by simulation based on the dataset. Third, we obtained the integrated vector ω by introducing the value ofrnA and B into the standard form of the proposed SVM method and used the quadprog function in MATLAB to train and test the dataset.rnBecause the increase of variable A might result in a negative value of the component α in vector A, we conducted two different experiments to test the robustness of the proposed model. In the first experiment, αj < 0 was allowed. In the second experiment, the value ofawas set to be 0 if the actual value of α is negative. Our experiment showed that for any arbitraryω , its fluctuant trends in the two types of experiments are basically the same when the constraint effect of the subjective weights becomes weaker and weaker. As the restriction is broadened in some degree, the values of variable w' in the two experiments are the same, they are both equal to the value of ω resolved by the existing SVM method which does not have the restriction of subjective weights. The experiments showed that when λ = 2. 3 , the proposed method can recognize 11 risk SMEs which is one time more than the existing SVM method. The recognition recall for rare cases is 20% more than that of the existing method.rnThis paper proposes a new SVM method to achieve good recognition performance for rare cases through reducing the performance for majority cases. The proposed classification method is effective on imbalanced dataset.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号