Bagging
Bagging的相关文献在2003年到2023年内共计227篇,主要集中在自动化技术、计算机技术、经济计划与管理、无线电电子学、电信技术
等领域,其中期刊论文136篇、会议论文2篇、专利文献89篇;相关期刊105种,包括科学技术与工程、计算机工程、计算机工程与科学等;
相关会议2种,包括第二十五届中国数据库学术会议(NDBC2008)、全国第15届计算机辅助设计与图形学学术会议等;Bagging的相关文献由678位作者贡献,包括玛丽亚·卡特里纳·图尔科、杨新武、王宇飞等。
Bagging
-研究学者
- 玛丽亚·卡特里纳·图尔科
- 杨新武
- 王宇飞
- 翟飞
- 亚历山德拉·罗萨蒂
- 刘汉明
- 刘赵发
- 周钢
- 毛力
- 温琴佐·德劳伦兹
- 胡声洲
- 郑金萍
- D·W·康
- 丁里
- 严群
- 伍杰
- 何勇
- 余之刚
- 余广民
- 刘丽媛
- 刘伟荣
- 刘天明
- 刘志刚
- 刘明启
- 刘杰民
- 刘梓毅
- 刘相涛
- 刘维国
- 刘金花
- 卞希慧
- 吴尽
- 吴平东
- 周凡雅
- 周静
- 姚剑敏
- 安睿
- 张承瑞
- 张晓勇
- 张瑞
- 彭军
- 彭威
- 彭洋
- 彭英
- 徐忠武
- 徐晓君
- 徐茹枝
- 时晓宇
- 朱志腾
- 李恒
- 李政誉
-
-
Yang Yang;
Pengfei Zheng;
Fanru Zeng;
Peng Xin;
GuoxiHe;
Kexi Liao
-
-
摘要:
Accurate prediction of the internal corrosion rates of oil and gas pipelines could be an effective way to prevent pipeline leaks.In this study,a proposed framework for predicting corrosion rates under a small sample of metal corrosion data in the laboratory was developed to provide a new perspective on how to solve the problem of pipeline corrosion under the condition of insufficient real samples.This approach employed the bagging algorithm to construct a strong learner by integrating several KNN learners.A total of 99 data were collected and split into training and test set with a 9:1 ratio.The training set was used to obtain the best hyperparameters by 10-fold cross-validation and grid search,and the test set was used to determine the performance of the model.The results showed that theMean Absolute Error(MAE)of this framework is 28.06%of the traditional model and outperforms other ensemblemethods.Therefore,the proposed framework is suitable formetal corrosion prediction under small sample conditions.
-
-
Quang-Hieu Tran;
Hoang Nguyen;
Xuan-Nam Bui
-
-
摘要:
This study considered and predicted blast-induced ground vibration(PPV)in open-pit mines using bagging and sibling techniques under the rigorous combination of machine learning algorithms.Accordingly,four machine learning algorithms,including support vector regression(SVR),extra trees(ExTree),K-nearest neighbors(KNN),and decision tree regression(DTR),were used as the base models for the purposes of combination and PPV initial prediction.The bagging regressor(BA)was then applied to combine these base models with the efforts of variance reduction,overfitting elimination,and generating more robust predictive models,abbreviated as BA-ExTree,BAKNN,BA-SVR,and BA-DTR.It is emphasized that the ExTree model has not been considered for predicting blastinduced ground vibration before,and the bagging of ExTree is an innovation aiming to improve the accuracy of the inherently ExTree model,as well.In addition,two empirical models(i.e.,USBM and Ambraseys)were also treated and compared with the bagging models to gain a comprehensive assessment.With this aim,we collected 300 blasting events with different parameters at the Sin Quyen copper mine(Vietnam),and the produced PPV values were also measured.They were then compiled as the dataset to develop the PPV predictive models.The results revealed that the bagging models provided better performance than the empirical models,except for the BA-DTR model.Of those,the BA-ExTree is the best model with the highest accuracy(i.e.,88.8%).Whereas,the empirical models only provided the accuracy from 73.6%–76%.The details of comparisons and assessments were also presented in this study.
-
-
郑金萍;
刘赵发;
胡珍珍;
李泽南;
黎姿;
刘汉明;
汪廷华;
胡声洲
-
-
摘要:
随机森林是采用Bagging组合方法集成的决策树集合,在数据分类、预测领域应用广泛.Bagging组合方法在机器学习中具有代表性,但对于实际的大数据挖掘仍存在一些不足.mBagging是基于Bagging组合方法的一种改进,具有更高的统计功效、更低的假阳率以及更快的运算速度.采用全基因组SNP仿真数据集的实验表明,基于mBagging的随机森林运算速度明显快于传统的随机森林,且在保证OOB袋外错误率不劣化的前提下,判断风险SNP的准确率得到了提高.
-
-
司晶硕
-
-
摘要:
在传统的索赔额预测中,广义线性模型(GLM)是一种常用的方法。近年来,机器学习算法在该领域也取得了良好的效果,为索赔额预测提供了新的选择。在大数据时代,如何更准确地进行预测,是亟待解决的问题。为了解决该问题,本文利用两层Stacking模型,两种其他集成学习算法和广义线性模型对累积索赔额进行预测。通过比较各算法的均方根误差及平方绝对误差,可发现包括Stacking的集成算法精度全部优于传统广义线性模型。最后,本文利用累积索赔额建立了奖惩系统的转移规则,将之与集成学习结合可以更合理地开发新的保险产品。
-
-
赵存秀
-
-
摘要:
本文基于两种集成结合策略方法投票和stacking法,这两种方法结合已有的模型(线性分类器,支持向量机,分类回归树)对不平衡数据处理,预测乳腺肿瘤是良性与恶性,实验结果发现随机森林的投票和stacking集成方法都能以较高的准确度识别出肿瘤良恶性,其中实验准确度达到97.41%和97.33%,从kappa值上可以看出,本次研究的一致性较强。
-
-
LEJUN GONG;
SHEHAI ZHOU;
JINGMEI CHEN;
YONGMIN LI;
LI ZHANG;
ZHIHONG GAO
-
-
摘要:
Long non-coding RNAs(lncRNAs)play an important role in many life activities such as epigenetic material regulation,cell cycle regulation,dosage compensation and cell differentiation regulation,and are associated with many human diseases.There are many limitations in identifying and annotating lncRNAs using traditional biological experimental methods.With the development of high-throughput sequencing technology,it is of great practical significance to identify the lncRNAs from massive RNA sequence data using machine learning method.Based on the Bagging method and Decision Tree algorithm in ensemble learning,this paper proposes a method of lncRNAs gene sequence identification called BDLR.The identification results of this classification method are compared with the identification results of several models including Byes,Support Vector Machine,Logical Regression,Decision Tree and Random Forest.The experimental results show that the lncRNAs identification method named BDLR proposed in this paper has an accuracy of 86.61%in the human test set and 90.34%in the mouse for lncRNAs,which is more than the identification results of the other methods.Moreover,the proposed method offers a reference for researchers to identify lncRNAs using the ensemble learning.
-
-
Moheb R.Girgis;
Rofida M.Gamal;
Enas Elgeldawi
-
-
摘要:
Protein structure prediction is one of the most essential objectives practiced by theoretical chemistry and bioinformatics as it is of a vital importance in medicine,biotechnology and more.Protein secondary structure prediction(PSSP)has a significant role in the prediction of protein tertiary structure,as it bridges the gap between the protein primary sequences and tertiary structure prediction.Protein secondary structures are classified into two categories:3-state category and 8-state category.Predicting the 3 states and the 8 states of secondary structures from protein sequences are called the Q3 prediction and the Q8 prediction problems,respectively.The 8 classes of secondary structures reveal more precise structural information for a variety of applications than the 3 classes of secondary structures,however,Q8 prediction has been found to be very challenging,that is why all previous work done in PSSP have focused on Q3 prediction.In this paper,we develop an ensemble Machine Learning(ML)approach for Q8 PSSP to explore the performance of ensemble learning algorithms compared to that of individual ML algorithms in Q8 PSSP.The ensemble members considered for constructing the ensemble models are well known classifiers,namely SVM(Support Vector Machines),KNN(K-Nearest Neighbor),DT(Decision Tree),RF(Random Forest),and NB(Naïve Bayes),with two feature extraction techniques,namely LDA(Linear Discriminate Analysis)and PCA(Principal Component Analysis).Experiments have been conducted for evaluating the performance of single models and ensemble models,with PCA and LDA,in Q8 PSSP.The novelty of this paper lies in the introduction of ensemble learning in Q8 PSSP problem.The experimental results confirmed that ensemble ML models are more accurate than individual ML models.They also indicated that features extracted by LDA are more effective than those extracted by PCA.
-
-
杜嘉伟;
余粟
-
-
摘要:
本文提出一种基于改进模糊FP-Growth的异常检测算法-RFPG算法(Random Frequency Pattern Growth),算法建立2层FP-Tree。第一层基于bagging思想,随机采样生成集合并得到长频繁项集合;第二层将长频繁项集合作为输入,得到模式强关联规则集,再通过相似度计算进行异常检测分类。实验结果显示,本文提出算法的整体异常检测效率与质量良好。
-
-
Naglaa F.El Abady;
Mohamed Taha;
Hala H.Zayed
-
-
摘要:
Because of the widespread availability of low-cost printers and scanners,document forgery has become extremely popular.Watermarks or signatures are used to protect important papers such as certificates,passports,and identification cards.Identifying the origins of printed documents is helpful for criminal investigations and also for authenticating digital versions of a document in today’s world.Source printer identification(SPI)has become increasingly popular for identifying frauds in printed documents.This paper provides a proposed algorithm for identifying the source printer and categorizing the questioned document into one of the printer classes.A dataset of 1200 papers from 20 distinct(13)laser and(7)inkjet printers achieved significant identification results.A proposed algorithm based on global features such as the Histogram of Oriented Gradient(HOG)and local features such as Local Binary Pattern(LBP)descriptors has been proposed for printer identification.For classification,Decision Trees(DT),k-Nearest Neighbors(k-NN),Random Forests,Aggregate bootstrapping(bagging),Adaptive-boosting(boosting),Support Vector Machine(SVM),and mixtures of these classifiers have been employed.The proposed algorithm can accurately classify the questioned documents into their appropriate printer classes.The adaptive boosting classifier attained a 96%accuracy.The proposed algorithm is compared to four recently published algorithms that used the same dataset and gives better classification accuracy.
-
-
Nguyen Thanh Hoan;
Nguyen Van Dung;
Ho Le Thu;
Hoa Thuy Quynh;
Nadhir Al-Ansari;
Tran Van Phong;
Phan Trong Trinh;
Dam Duc Nguyen;
Hiep Van Le;
Hanh Bich Thi Nguyen;
Mahdis Amiri;
Indra Prakash;
Binh Thai Pham
-
-
摘要:
Water level predictions in the river,lake and delta play an important role in flood management.Every year Mekong River delta of Vietnam is experiencing flood due to heavy monsoon rains and high tides.Land subsidence may also aggravate flooding problems in this area.Therefore,accurate predictions of water levels in this region are very important to forewarn the people and authorities for taking timely adequate remedial measures to prevent losses of life and property.There are so many methods available to predict the water levels based on historical data but nowadays Machine Learning(ML)methods are considered the best tool for accurate prediction.In this study,we have used surface water level data of 18 water level measurement stations of the Mekong River delta from 2000 to 2018 to build novel time-series Bagging based hybrid ML models namely:Bagging(RF),Bagging(SOM)and Bagging(M5P)to predict historical water levels in the study area.Performances of the Bagging-based hybrid models were compared with Reduced Error Pruning Trees(REPT),which is a benchmark ML model.The data of 19 years period was divided into 70:30 ratio for the modeling.The data of the period 1/2000 to 5/2013(which is about 70%of total data)was used for the training and for the period 5/2013 to 12/2018(which is about 30%of total data)was used for testing(validating)the models.Performance of the models was evaluated using standard statistical measures:Coefficient of Determination(R2),Root Mean Square Error(RMSE)and Mean Absolute Error(MAE).Results show that the performance of all the developed models is good(R2>0.9)for the prediction of water levels in the study area.However,the Bagging-based hybrid models are slightly better than another model such as REPT.Thus,these Bagging-based hybrid time series models can be used for predicting water levels at Mekong data.
-
-
- 《第二十五届中国数据库学术会议(NDBC2008)》
| 2008年
-
摘要:
在Bagging和Boosting方法的基础上,提出一种改进的支持向量机集成方法以进一步提高集成的泛化性能.给出一种基于混合核函数和相关参数并行扰动的个体支持向量机生成方法,有更多的模型扰动参数可以进一步提高集成的差异度,相应的集成方法分别命名为HK BaggingSVM和HKBoostingSVM.另外,当生成一个个体支持向量机后,采用测试方法确保集成的正确率。在标准UCI和StatLog数据集合上的仿真实验结果表明,HKBaggingSVM和HKBoostingSVM两种集成学习方法可以得到更高的分类性能和推广能力.
-
-
- 《第二十五届中国数据库学术会议(NDBC2008)》
| 2008年
-
摘要:
在Bagging和Boosting方法的基础上,提出一种改进的支持向量机集成方法以进一步提高集成的泛化性能.给出一种基于混合核函数和相关参数并行扰动的个体支持向量机生成方法,有更多的模型扰动参数可以进一步提高集成的差异度,相应的集成方法分别命名为HK BaggingSVM和HKBoostingSVM.另外,当生成一个个体支持向量机后,采用测试方法确保集成的正确率。在标准UCI和StatLog数据集合上的仿真实验结果表明,HKBaggingSVM和HKBoostingSVM两种集成学习方法可以得到更高的分类性能和推广能力.
-
-
- 《第二十五届中国数据库学术会议(NDBC2008)》
| 2008年
-
摘要:
在Bagging和Boosting方法的基础上,提出一种改进的支持向量机集成方法以进一步提高集成的泛化性能.给出一种基于混合核函数和相关参数并行扰动的个体支持向量机生成方法,有更多的模型扰动参数可以进一步提高集成的差异度,相应的集成方法分别命名为HK BaggingSVM和HKBoostingSVM.另外,当生成一个个体支持向量机后,采用测试方法确保集成的正确率。在标准UCI和StatLog数据集合上的仿真实验结果表明,HKBaggingSVM和HKBoostingSVM两种集成学习方法可以得到更高的分类性能和推广能力.
-
-
- 《第二十五届中国数据库学术会议(NDBC2008)》
| 2008年
-
摘要:
在Bagging和Boosting方法的基础上,提出一种改进的支持向量机集成方法以进一步提高集成的泛化性能.给出一种基于混合核函数和相关参数并行扰动的个体支持向量机生成方法,有更多的模型扰动参数可以进一步提高集成的差异度,相应的集成方法分别命名为HK BaggingSVM和HKBoostingSVM.另外,当生成一个个体支持向量机后,采用测试方法确保集成的正确率。在标准UCI和StatLog数据集合上的仿真实验结果表明,HKBaggingSVM和HKBoostingSVM两种集成学习方法可以得到更高的分类性能和推广能力.
-
-
- 《第二十五届中国数据库学术会议(NDBC2008)》
| 2008年
-
摘要:
在Bagging和Boosting方法的基础上,提出一种改进的支持向量机集成方法以进一步提高集成的泛化性能.给出一种基于混合核函数和相关参数并行扰动的个体支持向量机生成方法,有更多的模型扰动参数可以进一步提高集成的差异度,相应的集成方法分别命名为HK BaggingSVM和HKBoostingSVM.另外,当生成一个个体支持向量机后,采用测试方法确保集成的正确率。在标准UCI和StatLog数据集合上的仿真实验结果表明,HKBaggingSVM和HKBoostingSVM两种集成学习方法可以得到更高的分类性能和推广能力.
-
-
- 《全国第15届计算机辅助设计与图形学学术会议》
| 2008年
-
摘要:
神经网络集成通过训练多个神经网络并通过Bagging方法将其结果按投票规则进行合成,Bagging是一种用来提高学习算法准确度的方法,就可以显著地提高学习系统的泛化能力.本文针对三维模型检索系统,设计并实现了一个以神经网络为弱分类器的、基于Bagging的三维模型类别识别系统.对Princeton Shape Benchmark的实验表明,与单个神经网络分类器相比,集成后的系统在三维模型类别识别中取得了较好的效果.
-
-
- 《全国第15届计算机辅助设计与图形学学术会议》
| 2008年
-
摘要:
神经网络集成通过训练多个神经网络并通过Bagging方法将其结果按投票规则进行合成,Bagging是一种用来提高学习算法准确度的方法,就可以显著地提高学习系统的泛化能力.本文针对三维模型检索系统,设计并实现了一个以神经网络为弱分类器的、基于Bagging的三维模型类别识别系统.对Princeton Shape Benchmark的实验表明,与单个神经网络分类器相比,集成后的系统在三维模型类别识别中取得了较好的效果.
-
-
- 《全国第15届计算机辅助设计与图形学学术会议》
| 2008年
-
摘要:
神经网络集成通过训练多个神经网络并通过Bagging方法将其结果按投票规则进行合成,Bagging是一种用来提高学习算法准确度的方法,就可以显著地提高学习系统的泛化能力.本文针对三维模型检索系统,设计并实现了一个以神经网络为弱分类器的、基于Bagging的三维模型类别识别系统.对Princeton Shape Benchmark的实验表明,与单个神经网络分类器相比,集成后的系统在三维模型类别识别中取得了较好的效果.
-
-
- 《全国第15届计算机辅助设计与图形学学术会议》
| 2008年
-
摘要:
神经网络集成通过训练多个神经网络并通过Bagging方法将其结果按投票规则进行合成,Bagging是一种用来提高学习算法准确度的方法,就可以显著地提高学习系统的泛化能力.本文针对三维模型检索系统,设计并实现了一个以神经网络为弱分类器的、基于Bagging的三维模型类别识别系统.对Princeton Shape Benchmark的实验表明,与单个神经网络分类器相比,集成后的系统在三维模型类别识别中取得了较好的效果.
-
-
-
- 华东交通大学
- 公开公告日期:2022-04-01
-
摘要:
本发明公开了一种遗传优化Bagging异质集成模型的异常用电检测方法,涉及基于数据驱动的窃电检测技术领域,本发明通过SMOTE过采样技术对窃电用户数据进行样本增强,其次利用主成分分析降维提取异常用电特征,构建多种类型个体学习器嵌入的Bagging异质集成学习的窃电检测模型,通过引入投票策略将其输出进行结合,并用遗传算法对个体学习器的超参数进行优化。基于遗传优化的Bagging集成检测模型相比于决策树、支持向量机、随机森林以及传统人工神经网络等检测方法在准确率、误检率以及AUC评价指标上有明显提升,为加强我国电力企业对电能输送的高效监管力度、加大窃电的查处惩治力度、维护正常的供用电秩序、保障企业经营效益起到了积极的作用。
-
-
-
-
-
-
-
-