The data complexity index to construct an efficient cross-validation method

Der-Chiang Li; Yao-Hwei Fang; Y.M. Frank Fang

首页> 外文期刊>Decision support systems >The data complexity index to construct an efficient cross-validation method

【24h】

The data complexity index to construct an efficient cross-validation method

机译：数据复杂度指标构建一种有效的交叉验证方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Cross-validation is a widely used model evaluation method in data mining applications. However, it usually takes a lot of effort to determine the appropriate parameter values, such as training data size and the number of experiment runs, to implement a validated evaluation. This study develops an efficient cross-validation method called Complexity-based Efficient (CBE) cross-validation for binary classification problems. CBE cross-validation establishes a complexity index, called the CBE index, by exploring the geometric structure and noise of data. The CBE index is used to calculate the optimal training data size and the number of experiment runs to reduce model evaluation time when dealing with computationally expensive classification data sets. A simulated and three real data sets are employed to validate the performance of the proposed method in the study, while the validation methods compared are repeated random sub-sampling validation and K-fold cross-validation. The results show that CBE cross-validation, repeated random sub-sampling validation and K-fold cross-validation have similar validation performance, except that the training time required for CBE cross-validation is indeed lower than that for the other two methods.

机译：交叉验证是数据挖掘应用程序中一种广泛使用的模型评估方法。但是，通常需要花费大量精力来确定适当的参数值，例如训练数据大小和实验运行次数，以实施经过验证的评估。这项研究开发了一种有效的交叉验证方法，称为二元分类问题的基于复杂度的有效（CBE）交叉验证。 CBE交叉验证通过探索数据的几何结构和噪声来建立称为CBE索引的复杂性指标。 CBE索引用于计算最佳训练数据大小，并且在处理计算量大的分类数据集时，实验运行的次数减少了模型评估时间。模拟的和三个真实的数据集用于验证研究中所提出方法的性能，而比较的验证方法是重复随机子采样验证和K倍交叉验证。结果表明，CBE交叉验证，重复随机子抽样验证和K倍交叉验证具有相似的验证性能，但CBE交叉验证所需的训练时间确实比其他两种方法要少。

著录项

来源
《Decision support systems》 |2010年第1期|p.93-102|共10页
作者
Der-Chiang Li; Yao-Hwei Fang; Y.M. Frank Fang;
展开▼
作者单位

Department of industrial and Information Management National Cheng Kung University, Taiwan;

Division of Biostatistics and Bioinformatics, National Health Research Institutes, Taiwan;

Geographic Information System Research Center, Feng Chia University, Taiwan;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
binary classification problem; cross-validation; data complexity;

机译：二元分类问题;交叉验证;数据复杂度;

相似文献

外文文献
中文文献
专利

1. Development of Efficient Data Sampling Method to Construct Surrogate Model of Severe Accident Analysis Code for SBO Aiming Probabilistic Safety Margin Analysis [J] . Masaki MATSUSHITA, Tomohiro ENDO, Akio YAMAMOTO Transactions of the American nuclear society . 2019,第Nova期

机译：针对SBO的概率安全裕度分析建立严重事故分析代码替代模型的有效数据采样方法的研制。
2. Testing robustness of relative complexity measure method constructing robust phylogenetic trees for Galanthus L. Using the relative complexity measure [J] . Yasin Bak??, Hasan H Otu, Nivart Ta???, BMC Bioinformatics . 2013,第1期

机译：测试相对复杂度测量方法的稳健性为Galanthus L建立稳健的系统树的方法
3. DISCRIMINANT ANALYSIS WITH SINGULAR COVARIANCE MATRICES. A METHOD INCORPORATING CROSS-VALIDATION AND EFFICIENT RANDOMIZED PERMUTATION TESTS [J] . PHILIP JONATHAN, W. V. (MAC) MCCARTHY, ADRIAN M. I. ROBERTS Journal of Chemometrics . 1996,第3期

机译：奇异方差矩阵的判别分析。包含交叉验证和有效随机化渗透测试的方法
4. Bootstrap and cross-validation to assess complexity of data-driven regression models [C] . Willi Sauerbrei, Martin Schumacher International Symposium on Medical Data Analysis . 2000

机译：引导和交叉验证评估数据驱动回归模型的复杂性
5. Predicting Individual Treatment Effect from Randomized Clinical Trial Data: A Nested Cross-Validation Evaluation Framework for Machine Learning Methods [D] . Liu, Yu. 2021

机译：预测随机临床试验数据的个体治疗效果：用于机器学习方法的嵌套交叉验证评估框架
6. Testing robustness of relative complexity measure method constructing robust phylogenetic trees for Galanthus L. Using the relative complexity measure [O] . Yasin Bakış, Hasan H Otu, Nivart Taşçı, 2013

机译：测试相对复杂度测量方法的鲁棒性为Galanthus L建立健壮的系统树的方法
7. THE ENGINEERING COMPLEXITY OF CONSTRUCTING THEORIES ON THE BASIS OF USE OF MEANS OF DISPLAY OF THE VARIOUS MODELS AND METHODS FOR DESIGNING DATA STRUCTURES [O] . V. N. Laptev, V. V. Stepanov, K. �. Lipin, 2020

机译：在使用各种模型的使用方法和设计数据结构的方法的使用基础上构建理论的工程复杂性
8. Data-Driven Choice of a Spectrum Estimate: Extending the Applicability of Cross-Validation Methods [R] . Hurvich, C. M. 1984

机译：数据驱动的频谱估计选择：扩展交叉验证方法的适用性

The data complexity index to construct an efficient cross-validation method

摘要

著录项

相似文献

相关主题

期刊订阅