Comparison of methods for imputing limited-range variables: a simulation study

Laura Rodwell; Katherine J Lee; Helena Romaniuk; John B Carlin

首页> 外文期刊>BMC Medical Research Methodology >Comparison of methods for imputing limited-range variables: a simulation study

【24h】

Comparison of methods for imputing limited-range variables: a simulation study

机译：推算有限范围变量的方法比较：仿真研究

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Background Multiple imputation (MI) was developed as a method to enable valid inferences to be obtained in the presence of missing data rather than to re-create the missing values. Within the applied setting, it remains unclear how important it is that imputed values should be plausible for individual observations. One variable type for which MI may lead to implausible values is a limited-range variable, where imputed values may fall outside the observable range. The aim of this work was to compare methods for imputing limited-range variables, with a focus on those that restrict the range of the imputed values. Methods Using data from a study of adolescent health, we consider three variables based on responses to the General Health Questionnaire (GHQ), a tool for detecting minor psychiatric illness. These variables, based on different scoring methods for the GHQ, resulted in three continuous distributions with mild, moderate and severe positive skewness. In an otherwise complete dataset, we set 33% of the GHQ observations to missing completely at random or missing at random; repeating this process to create 1000 datasets with incomplete data for each scenario. For each dataset, we imputed values on the raw scale and following a zero-skewness log transformation using: univariate regression with no rounding; post-imputation rounding; truncated normal regression; and predictive mean matching. We estimated the marginal mean of the GHQ and the association between the GHQ and a fully observed binary outcome, comparing the results with complete data statistics. Results Imputation with no rounding performed well when applied to data on the raw scale. Post-imputation rounding and imputation using truncated normal regression produced higher marginal means than the complete data estimate when data had a moderate or severe skew, and this was associated with under-coverage of the complete data estimate. Predictive mean matching also produced under-coverage of the complete data estimate. For the estimate of association, all methods produced similar estimates to the complete data. Conclusions For data with a limited range, multiple imputation using techniques that restrict the range of imputed values can result in biased estimates for the marginal mean when data are highly skewed.

机译：背景技术多重插补（MI）被开发为一种在丢失数据的情况下能够获得有效推断，而不是重新创建丢失值的方法。在所应用的范围内，尚不清楚推算值对于每个观察值的合理性有多重要。 MI可能导致不可信值的一种变量类型是有限范围变量，其中估算值可能会超出可观察范围。这项工作的目的是比较推算有限范围变量的方法，重点是限制推算值范围的方法。方法使用来自青少年健康研究的数据，我们基于对一般健康状况调查表（GHQ）（一种检测轻度精神疾病的工具）的回答，考虑了三个变量。这些变量基于GHQ的不同评分方法，导致出现了三个连续的分布，分别具有轻度，中度和严重的正偏度。在另外一个完整的数据集中，我们将33％的GHQ观测值设置为完全随机丢失或随机丢失。重复此过程，以创建1000个数据集，其中每个方案的数据都不完整。对于每个数据集，我们以原始比例和零偏度对数转换来推算值，使用：单变量回归，不舍入;输入后四舍五入;截短正态回归;和预测均值匹配。我们将GHQ的边际均值以及GHQ与充分观察到的二进制结果之间的关联进行了估计，将结果与完整的数据统计进行了比较。结果应用于原始规模的数据时，没有四舍五入的估算效果很好。当数据出现中度或严重偏斜时，使用截尾正态回归进行的四舍五入后四舍五入和插补产生的边际均值要比完整数据估计值高，这与完整数据估计值的覆盖不足有关。预测均值匹配也导致完整数据估计的覆盖率不足。对于关联的估计，所有方法都对完整数据产生相似的估计。结论对于范围有限的数据，当数据高度偏斜时，使用限制插补值范围的技术进行多次插补可能导致边际均值的估计偏差。

著录项

来源
《BMC Medical Research Methodology》 |2014年第1期|共页
作者
Laura Rodwell; Katherine J Lee; Helena Romaniuk; John B Carlin;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类医药、卫生;
关键词

相似文献

外文文献
中文文献
专利

1. Comparison of methods for imputing ordinal data using multivariate normal imputation: A case study of non-linear effects in a large cohort study [J] . LeeK.J., GalatiJ.C., SimpsonJ.A., Statistics in medicine . 2012,第30期

机译：使用多元正态插补插补序数数据方法的比较：以大型队列研究中的非线性影响为例
2. A fresh imputing survey methodology using sensible constraints on study and auxiliary variables: dubious random non-response [J] . Mohamed Choukri, Sedory Stephen A., Singh Sarjinder Journal of statistical computation and simulation . 2018,第7a9期

机译：在研究和辅助变量上使用合理的约束的新的归因调查方法：可疑的随机无响应
3. Advanced predictive methods for wine age prediction: Part I – A comparison study of single-block regression approaches based on variable selection, penalized regression, latent variables and tree-based ensemble methods [J] . Ricardo Rendall, Ana Cristina Pereira, Marco S. Reis Talanta: The International Journal of Pure and Applied Analytical Chemistry . 2017,第期

机译：葡萄酒年龄预测的高级预测方法：第I部分 - 基于变量选择，惩罚回归，潜在变量和基于树的集合方法的单块回归方法的比较研究
4. An extended comparison study of large scale data-driven prediction methods based on variable selection, latent variables, penalized regression and machine learning [C] . Ricardo Rendall, Ana Pereira, Marco Rei European Symposium on Computer Aided Process Engineering . 2016

机译：基于变量选择，潜在变量，惩罚回归与机器学习的大规模数据驱动预测方法的扩展比较研究
5. Comparisons of subscoring methods in computerized adaptive testing: A simulation study [D] . Liu, Fu. 2015

机译：计算机自适应测试中评分方法的比较：一个仿真研究
6. Comparison of methods for imputing limited-range variables: a simulation study [O] . Laura Rodwell, Katherine J Lee, Helena Romaniuk, 2014

机译：推算有限范围变量的方法比较：仿真研究
7. Comparison of methods for imputing limited-range variables: a simulation study [O] . Laura Rodwell, Katherine J Lee, Helena Romaniuk, 2014

机译：推算有限范围变量的方法比较：仿真研究
8. Measurements of Indoor Air Conditioned Environment and Practical Evaluations of Intake of Atmospheric Air. Part 4. Comparison of Measurement Results and Simulation Results. Part 5. Studies of Simulation Methods for Optimal Design of Air-Conditioned Environments [R] . Nakahara, N., Goto, T., Nakajima, Y., 1979

机译：室内空调环境测量与大气空气吸入实用评估。第4部分。测量结果和模拟结果的比较。第五部分空调环境优化设计仿真方法研究

Comparison of methods for imputing limited-range variables: a simulation study

摘要

著录项

相似文献

相关主题

期刊订阅