首页> 外国专利> Method for Extracting InputFormat for Binary Format Data in Hadoop MapReduce and Binary Data Analysis Using the Same

Method for Extracting InputFormat for Binary Format Data in Hadoop MapReduce and Binary Data Analysis Using the Same

机译:Hadoop MapReduce中二进制格式数据的InputFormat提取方法以及使用该方法的二进制数据分析

摘要

The present invention includes the steps of receiving the length of the record (A) of the binary data; (B) Hadoop distributed file system, the closest point to the starting block of n is a multiple of the length of the record from the data block must be processed as a starting point of the block of data stored in (HDFS) and setting the previous InputSplit boundary of their InputSplit defining a InputSplit by; (C) generating a RecordReader and returns it to perform work by the length of the record to read from the starting point for their entire area InpuSplit defined above; And (D) a step of extracting said record in the form of a (Key, Value) through RecordReader (LongWritable, BytesWritable); Hadoop for processing the binary data with the distribution of the fixed-length records, characterized in that comprises a input format and in MapReduce, to an analysis method for binary data using the input format. According to the input format of the present invention, since the binary data of a fixed length to be processed in a distributed Hadoop environment without changing the data format operation processing is possible, requiring less storage space compared to other types of data and enables faster processing speed The. ;
机译:本发明包括接收二进制数据的记录(A)的长度的步骤。 (B)Hadoop分布式文件系统,最接近n起始块的点是数据块中记录长度的倍数,必须将其作为(HDFS)中存储的数据块的起始点进行处理并设置InputSplit的先前InputSplit边界,定义了InputSplit by; (C)生成一个RecordReader并返回它以按照记录的长度执行工作,该记录的长度是从上述定义的InpuSplit的整个区域的起点开始读取的; (D)通过RecordReader(LongWritable,BytesWritable)以(键,值)形式提取所述记录的步骤; Hadoop,用于处理具有固定长度记录分布的二进制数据,其特征在于包括一种输入格式,并且在MapReduce中,转变为一种使用该输入格式的二进制数据分析方法。根据本发明的输入格式,由于可以在不改变数据格式的操作处理的情况下在分布式Hadoop环境中处理固定长度的二进制数据,因此与其他类型的数据相比需要更少的存储空间并且可以更快地进行处理。速度了。 ;

著录项

  • 公开/公告号KR101218087B1

    专利类型

  • 公开/公告日2013-01-09

    原文格式PDF

  • 申请/专利权人

    申请/专利号KR20110005424

  • 发明设计人 이영석;이연희;

    申请日2011-01-19

  • 分类号G06F15/16;

  • 国家 KR

  • 入库时间 2022-08-21 16:25:56

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号