首页> 外国专利> Method for writing a plurality of small files of 2 MB or less to HDFS including a data merge module and an HBase cache module based on Hadoop

Method for writing a plurality of small files of 2 MB or less to HDFS including a data merge module and an HBase cache module based on Hadoop

机译:基于Hadoop将多个2MB以下的小文件写入包括数据合并模块和HBase缓存模块的HDFS的方法

摘要

The invention discloses a Hadoop-based massive small file writing method which is suitable for an HDFS system with a data merging module and an HBase cache module. The method includes a step of receiving a small file writing command input by a user, a step of querying the HBase cache module according to a user ID and a small file file name, and uploading first file content written into a small file and updating the HBase cache module with the first file content if the first file content is queried, a step of querying a database of the HDFS system again if the first file content is not queried,and uploading second file content written into the small file and updating the database with the second file content if the second file content is queried, otherwise calling an API of an Hadoop archive tool to access a corresponding HAR file and uploading the HAR file written into the small file and updating the database with the HAR file. According to the writing method of the invention, the reading efficiency of the small file can be improved.
机译:本发明公开了一种基于Hadoop的海量小文件写入方法,适用于具有数据合并模块和HBase缓存模块的HDFS系统。该方法包括以下步骤:接收用户输入的小文件写入命令;根据用户ID和小文件文件名查询HBase缓存模块;上传写入小文件中的第一文件内容并更新小文件。 HBase缓存模块,如果查询到第一文件内容,则具有第一文件内容;如果未查询到第一文件内容,则再次查询HDFS系统的数据库,并上传写入小文件中的第二文件内容并更新数据库的步骤如果查询第二文件内容,则使用第二文件内容;否则,调用Hadoop存档工具的API来访问相应的HAR文件,并上传写入小文件的HAR文件,并使用HAR文件更新数据库。根据本发明的写入方法,可以提高小文件的读取效率。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号