...
首页> 外文期刊>Journal of Hydrology >Imputation of missing sub-hourly precipitation data in a large sensor network: A machine learning approach
【24h】

Imputation of missing sub-hourly precipitation data in a large sensor network: A machine learning approach

机译:大型传感器网络中缺少的子小时降水数据的归纳:机器学习方法

获取原文
获取原文并翻译 | 示例
           

摘要

Precipitation data collected at sub-hourly resolution represents specific challenges for missing data recovery by being largely stochastic in nature and highly unbalanced in the duration of rain vs non-rain. Here we present a two-step analysis utilising current machine learning techniques for imputing precipitation data sampled at 30-minute intervals by devolving the task into (a) the classification of rain or non-rain samples, and (b) regressing the absolute values of predicted rain samples. Investigating 37 weather stations in the UK, this machine learning process produces more accurate predictions for recovering precipitation data than an established surface fitting technique utilising neighbouring rain gauges. Increasing available features for the training of machine learning algorithms increases performance with the integration of weather data at the target site with externally sourced rain gauges providing the highest performance. This method informs machine learning models by utilising information in concurrently collected environmental data to make accurate predictions of missing rain data. Capturing complex non-linear relationships from weakly correlated variables is critical for data recovery at sub-hourly resolutions. Such pipelines for data recovery can be developed and deployed for highly automated and near instantaneous imputation of missing values in ongoing datasets at high temporal resolutions.
机译:在次小时分辨率下收集的降水数据代表了通过在大大随机性的数据中缺少数据恢复的具体挑战,并且在雨中的持续时间内具有高度不平衡。在这里,我们利用当前机器学习技术来提供两步分析,用于通过将任务转换为雨或非雨样本的分类,以30分钟的间隔采样,以30分钟进行采样,(b)回归绝对值预测的雨水样本。调查英国的37个气象站,该机器学习过程会产生比利用邻近的雨量仪的既定表面配件技术恢复降水数据的更准确的预测。增加机器学习算法培训的可用功能会增加性能,随着目标网站的天气数据集成,具有提供最高性能的外部源地点。该方法通过利用同伴收集的环境数据中的信息来告知机器学习模型,以准确预测缺失的雨量数据。捕获从弱相关变量的复杂非线性关系对于子小时分辨率的数据恢复至关重要。可以开发和部署用于数据恢复的这些管道,以获得高时间分辨率在持续的数据集中缺失值的高度自动化和近的瞬时载体。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号