首页> 外文会议>International Conference on Data Analytics for Business and Industry: Way Towards a Sustainable Economy >Comparing Sequential and Parallel Word Counting Methods in Finding Commonly Used Words by Different Authors
【24h】

Comparing Sequential and Parallel Word Counting Methods in Finding Commonly Used Words by Different Authors

机译:比较顺序和并行词计数方法在查找不同作者的常用单词时

获取原文

摘要

Parallel computing is exposing to everyday life and becomes a target for parallelism experiments. This paper is a compression between sequential and parallel methods in word counting to find the commonly used words by different authors in a selected public dataset that includes around 500,000 quotes. Python multiprocessing library with a pool of worker processes was used for this purpose. The sequential process (Sequential Method-SM) is grouped into sub-processes "operations" to evaluate the execution time in each operation, and then the operation that consumes the largest portion of execution time is parallelized with multiprocess pool function (Partially Parallelized Method-PPM). After that, the whole process is parallelized to evaluate the overall performance (Fully Parallelized Method-FPM). Results show improved efficiency in the overall parallelized process by 48%, and the most used words by selected authors appear relevant to the authors' profession.
机译:平行计算暴露于日常生活,成为平行实验的目标。本文是单词计数中的顺序和并行方法之间的压缩,以便在所选公共数据集中找到不同作者的常用单词,其中包含大约500,000个引号。 Python具有工人流程池的多处理库用于此目的。顺序处理(顺序方法-SM)被分组为子进程“操作”以评估每个操作中的执行时间,然后使用多处理池功能并行化执行时间的最大部分的执行时间(部分并行化方法 - ppm)。之后,整个过程并行化以评估整体性能(完全并行化方法-FPM)。结果显示整个并行化进程的效率提高了48%,所选作者最使用的单词与作者的专业相关。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号