首页> 外文期刊>Astronomy and astrophysics >Vaex: big data exploration in the era of Gaia ? ??
【24h】

Vaex: big data exploration in the era of Gaia ? ??

机译:Vaex:盖亚时代的大数据探索 ??

获取原文
           

摘要

We present a new Python library, called vaex , intended to handle extremely large tabular datasets such as astronomical catalogues like the Gaia catalogue, N -body simulations, or other datasets which can be structured in rows and columns. Fast computations of statistics on regular N -dimensional grids allows analysis and visualization in the order of a billion rows per second, for a high-end desktop computer. We use streaming algorithms, memory mapped files, and a zero memory copy policy to allow exploration of datasets larger than memory, for example out-of-core algorithms. Vaex allows arbitrary (mathematical) transformations using normal Python expressions and (a subset of) numpy functions which are “lazily” evaluated and computed when needed in small chunks, which avoids wasting of memory. Boolean expressions (which are also lazily evaluated) can be used to explore subsets of the data, which we call selections. Vaex uses a similar DataFrame API as Pandas, a very popular library, which helps migration from Pandas. Visualization is one of the key points of vaex , and is done using binned statistics in 1d (e.g. histogram), in 2d (e.g. 2d histograms with colourmapping) and 3d (using volume rendering). Vaex is split in in several packages: vaex-core for the computational part, vaex-viz for visualization mostly based on matplotlib, vaex-jupyter for visualization in the Jupyter notebook/lab based in IPyWidgets, vaex-server for the (optional) client-server communication, vaex-ui for the Qt based interface, vaex-hdf5 for HDF5 based memory mapped storage, vaex-astro for astronomy related selections, transformations, and memory mapped (column based) FITS storage.
机译:我们提供了一个名为vaex的新Python库,用于处理超大型表格数据集,例如Gaia目录等天文目录,N体模拟或其他可以按行和列进行结构化的数据集。对于高端台式计算机,基于常规N维网格的统计信息的快速计算可实现每秒十亿行的分析和可视化。我们使用流算法,内存映射文件和零内存复制策略来允许探索大于内存的数据集,例如核外算法。 Vaex允许使用正常的Python表达式和numpy函数(的子集)进行任意(数学)转换,这些函数在需要时以小块形式被“懒惰”地评估和计算,从而避免浪费内存。布尔表达式(也懒惰地求值)可用于探索数据的子集,我们称其为选择。 Vaex使用与Pandas类似的DataFrame API,Pandas是一个非常流行的库,可帮助从Pandas迁移。可视化是vaex的关键点之一,它使用1d(例如直方图),2d(例如带有色标的2d直方图)和3d(使用体积渲染)的合并统计来完成。 Vaex分为几个软件包:vaex-core用于计算部分,vaex-viz用于可视化,主要基于matplotlib,vaex-jupyter用于基于IPyWidgets的Jupyter笔记本/实验室中的可视化,vaex-server用于(可选)客户端服务器通信,用于基于Qt的界面的vaex-ui,用于基于HDF5的内存映射存储的vaex-hdf5,用于与天文学相关的选择,转换和内存映射(基于列)的FITS存储的vaex-astro。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号