When designed the HDFS, it was usually only considered how to handle large files better, and HDFS was not optimized for massive small files. When used HDFS to manage massive small files such as fin-gerprint datafiles there were some difficulties. For example, overloading of the NameNode and the perform-ances of upload and query were not satisfied. The serialization technology named SequenceFile to merge small files was used and some targeted optimization about the merging of small files, the storage of metadata and the caching strategies were considered. Experimental results showed that the proposed scheme could effectively deal with the problem of NameNode memory′s overloading. The upload and query performances about massive small files sucn as fingerprint datafiles were also improved.%HDFS设计之初只考虑到如何更好地处理大文件,并没有针对海量小文件进行优化,因此,当使用HDFS管理海量指纹数据小文件时会出现 NameNode 内存负载过重、上传及查询性能过低等问题。采用SequenceFile序列化技术进行小文件的合并,并且对于小文件合并、元数据存储、缓存策略等进行了针对性优化。实验证明,该优化方案可以有效地解决NameNode内存负载过重的问题,并且海量指纹数据小文件的上传和查询性能得到了提高。
展开▼