首页> 中文期刊> 《中文信息学报》 >文本处理中的MapReduce技术

文本处理中的MapReduce技术

         

摘要

With the development of the internet, the text processing area is challenged to deal with web scale dataset. It is intractable for traditional approaches computing effectively on peta-scale data volumes. MapReduce emerged to address this issue with distributed and parallel processing methods, which has been widely recognized and studied both in the academic and in industry. In natural language processing, machine learning, large-scale graph processing and statistical machine translation, there have been many successful application of this technique. In this paper we first give a brief introduction to MapReduce, revealing its advantages, limitations, and differences with traditional techniques. Then we present a classification and summary to MapReduce applications in some aspects of text processing. Finally, we introduce the system and performance research of MapReduce and analyze possible applications of MapReduce in the future.%用于文本处理的很多数据集已经达到TB、PB甚至更大规模,传统的单机方法难以对这些数据进行有效处理.近年来出现的MapReduce计算框架能够以简洁的形式和分布式的方案来解决大规模数据的并行处理问题,得到了学术界和工业界的广泛认可和使用.目前,MapReduce已经被用于自然语言处理、机器学习及大规模图处理等领域.该文首先对MapReduce做了简单的介绍,并分析了其特点、优势还有不足;然后对MapReduce近年来在文本处理各个方面的应用进行分类总结和整理;最后对MapReduce的系统和性能方面的研究也做了一些介绍与展望.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号