Measure for data partitioning in m x 2 cross-validation

Wang Yu; Li Jihong; Li Yanfang

首页> 外文期刊>Pattern recognition letters >Measure for data partitioning in m x 2 cross-validation

【24h】

Measure for data partitioning in m x 2 cross-validation

机译：在m x 2交叉验证中测量数据分区

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

An m x 2 cross-validation based on m half-half partitions is widely used in machine learning. However, the cross validation performance often relies on the quality of the data partitioning. Poor data partitioning may cause poor inference results, such as a large variance and large Type I and II errors of the corresponding test. To evaluate the quality of the data partitioning, we propose a statistic based on the difference between the observed and expected numbers of overlapped samples of two training sets in an m x 2 cross validation. The expectation and variance of the proposed statistic are also given. Furthermore, by studying the quantile of the distribution of the statistic, we find that the occurrence of poor data partitioning is not a small probability event. Thus, data partitioning should be predesigned before conducting m x 2 cross-validation experiments in machine learning such that the number of overlapped samples observed is equal or as close as possible to the number expected. (C) 2015 Elsevier B.V. All rights reserved.

机译：基于m个半个半分区的m x 2交叉验证广泛用于机器学习中。但是，交叉验证性能通常取决于数据分区的质量。不良的数据分区可能会导致不良的推理结果，例如较大的方差以及相应测试的I和II类错误。为了评估数据分区的质量，我们基于m x 2交叉验证中两个训练集的重叠样本的观察值与预期值之间的差异，提出了一个统计量。还给出了所建议统计量的期望值和方差。此外，通过研究统计量的分位数，我们发现不良数据分区的发生不是小概率事件。因此，在进行机器学习中的m x 2交叉验证实验之前，应预先设计数据分区，以使观察到的重叠样本数量等于或尽可能接近预期数量。（C）2015 Elsevier B.V.保留所有权利。

著录项

来源
《Pattern recognition letters》 |2015年第1期|211-217|共7页
作者
Wang Yu; Li Jihong; Li Yanfang;
展开▼
作者单位

Shanxi Univ, Ctr Comp, Taiyuan 030006, Peoples R China.;

Shanxi Univ, Ctr Comp, Taiyuan 030006, Peoples R China.;

Shanxi Univ, Sch Math Sci, Taiyuan 030006, Peoples R China.;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Data partitioning; measure; cross-validation; small probability event;

机译：数据划分;措施;交叉验证;小概率事件;

相似文献

外文文献
中文文献
专利

1. Study on the Impact of Partition-Induced Dataset Shift on $k$-Fold Cross-Validation [J] . Moreno-Torres J. G., Saez J. A., Herrera F. Neural Networks and Learning Systems, IEEE Transactions on . 2012,第8期

机译：分区诱导的数据集移位对$ k $ -fold交叉验证的影响研究
2. Embedded Performance Validity Measures with Postdeployment Veterans: Cross-Validation and Efficiency with Multiple Measures [J] . Robert D. Shura Holly M. Miskey, Jared A. Rowl, Ruth E. Yoash-Gantz, Applied neuropsychology. Adult . 2016,第1a2期

机译：具有部署后退伍军人的嵌入式性能有效性度量：具有多个度量的交叉验证和效率
3. New Workflow for QSAR Model Development from Small Data Sets: Small Dataset Curator and Small Dataset Modeler. Integration of Data Curation, Exhaustive Double Cross-Validation, and a Set of Optimal Model Selection Techniques [J] . Ambure Pravin, Gajewicz-Skretna Agnieszka, Cordeiro M. Natalia D. S., Journal of chemical information and modeling . 2019,第10期

机译：来自小型数据集的QSAR模型开发的新工作流程：小型数据集策划器和小型数据集型号。数据策择集成，详尽的双交叉验证以及一组最佳模型选择技术
4. Source-Aware Partitioning for Robust Cross-Validation [C] . Ozsel Kilinc, Ismail Uysal IEEE International Conference on Machine Learning and Applications . 2015

机译：可靠的交叉验证的源感知分区
5. Predicting Individual Treatment Effect from Randomized Clinical Trial Data: A Nested Cross-Validation Evaluation Framework for Machine Learning Methods [D] . Liu, Yu. 2021

机译：预测随机临床试验数据的个体治疗效果：用于机器学习方法的嵌套交叉验证评估框架
6. Challenges of machine learning model validation using correlated behaviour data: Evaluation of cross-validation strategies and accuracy measures [O] . Bence Ferdinandy, Linda Gerencsér, Luca Corrieri, 2020

机译：使用相关行为数据的机器学习模型验证的挑战：交叉验证策略评估和准确度
7. Study on the impact of partition-induced dataset shift on k-fold cross-validation [O] . Jose García Moreno-torres, José A. Sáez, Francisco Herrera 2012

机译：研究分区诱导数据集转换对k折交叉验证的影响

Measure for data partitioning in m x 2 cross-validation

摘要

著录项

相似文献

相关主题

期刊订阅