首页> 外文期刊>BMC Bioinformatics >Lag penalized weighted correlation for time series clustering
【24h】

Lag penalized weighted correlation for time series clustering

机译:滞后于时间序列聚类的受加权相关性

获取原文
           

摘要

The similarity or distance measure used for clustering can generate intuitive and interpretable clusters when it is tailored to the unique characteristics of the data. In time series datasets generated with high-throughput biological assays, measurements such as gene expression levels or protein phosphorylation intensities are collected sequentially over time, and the similarity score should capture this special temporal structure. We propose a clustering similarity measure called Lag Penalized Weighted Correlation (LPWC) to group pairs of time series that exhibit closely-related behaviors over time, even if the timing is not perfectly synchronized. LPWC aligns time series profiles to identify common temporal patterns. It down-weights aligned profiles based on the length of the temporal lags that are introduced. We demonstrate the advantages of LPWC versus existing time series and general clustering algorithms. In a simulated dataset based on the biologically-motivated impulse model, LPWC is the only method to recover the true clusters for almost all simulated genes. LPWC also identifies clusters with distinct temporal patterns in our yeast osmotic stress response and axolotl limb regeneration case studies. LPWC achieves both of its time series clustering goals. It groups time series with correlated changes over time, even if those patterns occur earlier or later in some of the time series. In addition, it refrains from introducing large shifts in time when searching for temporal patterns by applying a lag penalty. The LPWC R package is available at https://github.com/gitter-lab/LPWCand CRANunder a MIT license.
机译:用于聚类的相似性或距离措施可以在对数据的独特特征定制时生成直观和可解释的群集。在具有高通量生物测定产生的时间序列数据集中,依次收集诸如基因表达水平或蛋白质磷酸化强度的测量,并且相似度得分应捕获这种特殊的时间结构。我们提出了一种被称为LAG惩罚加权相关性(LPWC)的聚类相似度测量到组对时间序列对随着时间的推移与时间相比具有密切相关的行为,即使时间不完全同步。 LPWC对齐时间序列配置文件以识别常见的时间模式。它基于引入的时间滞后的长度对齐尺寸对齐的轮廓。我们展示了LPWC与现有时间序列和一般聚类算法的优势。在基于生物动机脉冲模型的模拟数据集中,LPWC是恢复几乎所有模拟基因的真实簇的唯一方法。 LPWC还识别我们酵母渗透压力反应和Axolotl肢体再生案例研究中具有不同时间模式的簇。 LPWC实现了其时间序列聚类目标。即使这些模式在某些时间序列中早先或之后发生,它会随着时间的推移而群时间序列随着时间的推移而变化。此外,当通过应用滞后损失,它抑制了在寻找时间模式的时间内引入大的班次。 LPWC R包在Https://github.com/gitter-lab/lpwcand cranunder a a mit许可证。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号