首页> 外文期刊>American Journal of Bioinformatics Research >A Quick Computational Statistical Pipeline Developed in R Programing Environment for Agronomic Metric Data Analysis
【24h】

A Quick Computational Statistical Pipeline Developed in R Programing Environment for Agronomic Metric Data Analysis

机译:用于农艺公制数据分析的R编程环境中开发了一种快速计算统计管道

获取原文
获取原文并翻译 | 示例
           

摘要

Data harvesting, data pre-treatment and as well data statistical analysis and interpretation are strongly correlated steps in biological and as well agronomical experimental survey. In view to make straightforward the integration of these procedures, rigorous experimental and statistical schemes are required, playing attention to process data typologies. Numerous researchers continue to generate and analyse quantitative and qualitative phenotypical data in their agronomical experimentations. Considering the impressive heterogeneity and as well size of that data, we proposed here a semi-automate analysis procedure based on a computational statistical approach in R programming environment, with the purpose to provide a simple (programmer skills are not requested to users) and efficient (few minute are needed to get output files and/or figures) and as well flexible (authors can add own script and/or bypassed some functions) tool pointing to make straightforward heterogenic metric data interactions in biostatistics survey. The pipeline starts by loading a row data matrix followed by data standardization procedure (if any). Next, data were processed for a multivariate descriptive and as well analytical statistical analysis, comprising data quality control by providing correlation matrix heat-map and as well as p-value clustering analysis graphics and data normality assessment by Shapiro-Wilk normality test. Then, data were handled by principal component analysis (PCA) including PCA n factor survey in discriminating needed factors component explaining data variability. Finally data were submitted to linear and/or multiple linear regression (MLR) survey with the purpose to link mathematically managed data variables. The pipeline exhibits a high performance in term of time saving by processing high amount and heterogenic quantitative data, allowing and/or providing a complete descriptive and analytical statistical framework. In conclusion, we provided a quick and useful semi-automatic computational bio-statistical pipeline in a simple programming language, exempting the researchers to have skills in advanced programming and statistical technics, although it is not exhaustive in terms of features.
机译:数据收获,数据预处理以及数据统计分析和解释是生物学和农艺实验调查中的强烈相关步骤。为了简单地进行这些程序的整合,需要严格的实验和统计计划,引起过程数据类型。许多研究人员继续在农艺实验中产生和分析定量和定性表型数据。考虑到令人印象深刻的异质性,以及该数据的大小,我们提出了一种基于R编程环境中的计算统计方法的半自动分析程序,目的是提供简单的(程序员技能不要求用户)和高效(需要几分钟来获取输出文件和/或数字),并且灵活(作者可以添加自己的脚本和/或绕过一些功能)工具,指向生物统计学调查中的直接的异常度量数据交互。管道通过加载行数据矩阵,然后加载数据标准化过程(如果有)。接下来,处理数据的多变量描述性和分析统计分析,包括通过提供相关矩阵热映射以及Shapiro-Wilk正常性测试的P值聚类分析图形和数据正常评估数据质量控制。然后,通过主成分分析(PCA)处理数据,包括PCA N因子调查,以判断所需的因素组件解释数据变异性。最后,数据被提交给线性和/或多个线性回归(MLR)调查,目的是链接数学管理的数据变量。通过处理大量和异质定量数据,允许和/或提供完整的描述性和分析统计框架,管道在节省的时间内具有高性能。总之,我们以简单的编程语言提供了一种快速而有用的半自动计算生物统计管道,豁免研究人员在先进的编程和统计技术方面具有技能,尽管它在功能方面并不穷。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号