首页> 外国专利> SYSTEM AND METHOD FOR PREDICTING APPLICATION PERFORMANCE FOR LARGE DATA SIZE ON BIG DATA CLUSTER

SYSTEM AND METHOD FOR PREDICTING APPLICATION PERFORMANCE FOR LARGE DATA SIZE ON BIG DATA CLUSTER

机译:大数据簇上的大数据量应用性能预测系统和方法

摘要

A system and method for estimating execution time of an application with Spark platform in a production environment. The application on Spark platform is executed as a sequence of Spark jobs. Each Spark job is executed as a directed acyclic graph (DAG) consisting of stages. Each stage has multiple executors running in parallel and the each executor has set of concurrent tasks. Each executor spawns multiple threads, one for each task. All jobs in the same executor share the same JVM memory. The execution time for each Spark job is predicted as summation of the estimated execution time of all its stages. The execution time constitutes scheduler delay, serialization time, de-serialization time, and JVM overheads. The JVM time estimation depends on type of computation hardware system and number of threads.
机译:一种在生产环境中使用Spark平台估算应用程序执行时间的系统和方法。 Spark平台上的应用程序按一系列Spark作业执行。每个Spark作业都作为由阶段组成的有向无环图(DAG)执行。每个阶段都有多个并行运行的执行程序,每个执行程序都有一组并发任务。每个执行程序产生多个线程,每个任务一个。同一执行程序中的所有作业共享同一JVM内存。每个Spark作业的执行时间被预测为其所有阶段的估计执行时间的总和。执行时间包括调度程序延迟,序列化时间,反序列化时间和JVM开销。 JVM时间估计取决于计算硬件系统的类型和线程数。

著录项

  • 公开/公告号US2019065336A1

    专利类型

  • 公开/公告日2019-02-28

    原文格式PDF

  • 申请/专利权人 TATA CONSULTANCY SERVICES LIMITED;

    申请/专利号US201816107425

  • 发明设计人 REKHA SINGHAL;PRAVEEN KUMAR SINGH;

    申请日2018-08-21

  • 分类号G06F11/34;G06F17/50;G06F11/30;G06F9/455;

  • 国家 US

  • 入库时间 2022-08-21 12:05:20

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号