首页> 外文会议>Signal Processing and Communications Applications Conference >Fast text classification with Naive Bayes method on Apache Spark
【24h】

Fast text classification with Naive Bayes method on Apache Spark

机译:在Apache Spark上使用Naive Bayes方法进行快速文本分类

获取原文

摘要

The increase in the number of devices and users online with the transition of Internet of Things (IoT), increases the amount of large data exponentially. Classification of ascending data, deletion of irrelevant data, and meaning extraction have reached vital importance in today's standards. Analysis can be done in various variations such as Classification of text on text data, analysis of spam, personality analysis. In this study, fast text classification was performed with machine learning on Apache Spark using the Naive Bayes method. Spark architecture uses a distributed in-memory data collection instead of a distributed data structure presented in Hadoop architecture to provide fast storage and analysis of data. Analyzes were made on the interpretation data of the Reddit which is open source social news site by using the Naive Bayes method. The results are presented in tables and graphs.
机译:随着物联网(IoT)的过渡,在线设备和用户数量的增加,使大数据量呈指数增长。在当今的标准中,升序数据的分类,不相关数据的删除以及含义提取已变得至关重要。可以进行各种变化的分析,例如文本数据上的文本分类,垃圾邮件分析,性格分析。在这项研究中,使用朴素贝叶斯方法在Apache Spark上通过机器学习对文本进行了快速分类。 Spark架构使用分布式内存中数据收集而不是Hadoop架构中提供的分布式数据结构来提供快速的数据存储和分析。使用朴素贝叶斯方法对开源社交新闻网站Reddit的解释数据进行了分析。结果显示在表格和图表中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号