首页> 中文期刊> 《计算机工程与应用》 >基于子模式的关系数据到图数据ETL方法研究

基于子模式的关系数据到图数据ETL方法研究

         

摘要

For addressing problems such as multi-layer relational query and community detection, graph database outper-forms relational database. However, most data of existing applications have stored in the form of relationship. Therefore, how to extract-transform-load(ETL)relational data to graph data efficiently and absolutely is still an important problem of deploying graph database applications. Existing researches suffer from three major limitations:(1)The quality of con-verted graph data are poor;(2)the efficiency of transforming is low;(3)the transformed results are not suitable for dis-tributed storage. To overcome these limitations, a sub-schema-based ETL method for transforming relational data to graph data is proposed in this paper. By splitting schema of relational database to several sub-schemas, this method improves the algorithm and procedure of previous ETLs and provides an efficient way for parallel ETL. The transformed results can sat-isfy the requirements of distributed storage, and conduct to be the basis data for Spark GraphX computing framework. Fi-nally, Java EE and Neo4j are applied to implement the prototype system for experimental verification. The comparative re-sults show that the improved ETL method yields better performance than previous methods.%图数据库在解决多层关系查询、社区发现等问题时性能优于关系数据库.然而目前大量的数据以关系数据的形式存储,如何高效完整地进行关系数据到图数据的ETL,即抽取、转换、加载,是图数据库应用领域研究的重要问题.国内外对该问题有了一些研究,但存在转换后的图数据质量不高、转换效率低、转换结果不利于分布式存储等问题.因此,提出基于子模式的关系数据到图数据ETL方法,改进原有ETL方法的流程和算法.该方法将关系数据库模式拆分为若干个子模式,并行进行ETL.不仅提高了ETL的效率,转换结果能满足图数据的分布式存储要求,也可以作为Spark GraphX计算框架的基础数据.最后,使用Java EE和Neo4j开发了原型系统,并进行了实验验证.结果表明,改进后的ETL方法获得了较已有方法更好的转化性能.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号