首页> 外文会议>International Convention on Information and Communication Technology, Electronics and Microelectronics >An overview and comparison of free Python libraries for data mining and big data analysis
【24h】

An overview and comparison of free Python libraries for data mining and big data analysis

机译:用于数据挖掘和大数据分析的免费Python库的概述和比较

获取原文

摘要

The popularity of Python is growing, especially in the field of data science. Consequently, there is an increasing number of free libraries available for usage. The aim of this review paper is to describe and compare the characteristics of different data mining and big data analysis libraries in Python. There is currently no paper dealing with the subject and describing pros and cons of all these libraries. Here we consider more than 20 libraries and separate them into six groups: core libraries, data preparation, data visualization, machine learning, deep learning and big data. Beside functionalities of a certain library, important factors for comparison are the number of contributors developing and maintaining the library and the size of the community. Bigger communities mean larger chances for easily finding solution to a certain problem. We currently recommend: pandas for data preparation; Matplotlib, seaborn or Plotly for data visualization; scikit-learn for machine leraning; TensorFlow, Keras and PyTorch for deep learning; and Hadoop Streaming and PySpark for big data.
机译:Python的流行正在增长,特别是在数据科学领域。因此,越来越多的免费库可供使用。本文的目的是描述和比较Python中不同数据挖掘和大数据分析库的特征。当前没有涉及该主题并描述所有这些库的优缺点的论文。在这里,我们考虑了20多个库,并将它们分为六类:核心库,数据准备,数据可视化,机器学习,深度学习和大数据。除了某个图书馆的功能之外,进行比较的重要因素是开发和维护图书馆的贡献者数量以及社区规模。更大的社区意味着更容易找到特定问题的解决方案的机会。我们目前建议:熊猫用于数据准备; Matplotlib,seaborn或Plotly用于数据可视化; scikit-learn用于机器学习; TensorFlow,Keras和PyTorch进行深度学习; Hadoop Streaming和PySpark处理大数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号