Vaex: big data exploration in the era of Gaia <xref ref-type='fn' rid='FN1'>?</xref> <xref ref-type='fn' rid='FN2'>??</xref>

Maarten A. Breddels; Jovan Veljanoski

首页> 外文期刊>Astronomy and astrophysics >Vaex: big data exploration in the era of Gaia ? ??

【24h】

Vaex: big data exploration in the era of Gaia ? ??

机译：Vaex：盖亚时代的大数据探索？ ??

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present a new Python library, called vaex , intended to handle extremely large tabular datasets such as astronomical catalogues like the Gaia catalogue, N -body simulations, or other datasets which can be structured in rows and columns. Fast computations of statistics on regular N -dimensional grids allows analysis and visualization in the order of a billion rows per second, for a high-end desktop computer. We use streaming algorithms, memory mapped files, and a zero memory copy policy to allow exploration of datasets larger than memory, for example out-of-core algorithms. Vaex allows arbitrary (mathematical) transformations using normal Python expressions and (a subset of) numpy functions which are “lazily” evaluated and computed when needed in small chunks, which avoids wasting of memory. Boolean expressions (which are also lazily evaluated) can be used to explore subsets of the data, which we call selections. Vaex uses a similar DataFrame API as Pandas, a very popular library, which helps migration from Pandas. Visualization is one of the key points of vaex , and is done using binned statistics in 1d (e.g. histogram), in 2d (e.g. 2d histograms with colourmapping) and 3d (using volume rendering). Vaex is split in in several packages: vaex-core for the computational part, vaex-viz for visualization mostly based on matplotlib, vaex-jupyter for visualization in the Jupyter notebook/lab based in IPyWidgets, vaex-server for the (optional) client-server communication, vaex-ui for the Qt based interface, vaex-hdf5 for HDF5 based memory mapped storage, vaex-astro for astronomy related selections, transformations, and memory mapped (column based) FITS storage.

机译：我们提供了一个名为vaex的新Python库，用于处理超大型表格数据集，例如Gaia目录等天文目录，N体模拟或其他可以按行和列进行结构化的数据集。对于高端台式计算机，基于常规N维网格的统计信息的快速计算可实现每秒十亿行的分析和可视化。我们使用流算法，内存映射文件和零内存复制策略来允许探索大于内存的数据集，例如核外算法。 Vaex允许使用正常的Python表达式和numpy函数（的子集）进行任意（数学）转换，这些函数在需要时以小块形式被“懒惰”地评估和计算，从而避免浪费内存。布尔表达式（也懒惰地求值）可用于探索数据的子集，我们称其为选择。 Vaex使用与Pandas类似的DataFrame API，Pandas是一个非常流行的库，可帮助从Pandas迁移。可视化是vaex的关键点之一，它使用1d（例如直方图），2d（例如带有色标的2d直方图）和3d（使用体积渲染）的合并统计来完成。 Vaex分为几个软件包：vaex-core用于计算部分，vaex-viz用于可视化，主要基于matplotlib，vaex-jupyter用于基于IPyWidgets的Jupyter笔记本/实验室中的可视化，vaex-server用于（可选）客户端服务器通信，用于基于Qt的界面的vaex-ui，用于基于HDF5的内存映射存储的vaex-hdf5，用于与天文学相关的选择，转换和内存映射（基于列）的FITS存储的vaex-astro。

著录项

来源
《Astronomy and astrophysics》 |2018年第1期|共13页
作者
Maarten A. Breddels; Jovan Veljanoski;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类天文学;
关键词
methods: numericalmethods: statistical;

机译：方法：数值方法：统计;

相似文献

外文文献
中文文献
专利

1. Revisiting the pulsational characteristics of the exoplanet host star β Pictoris ? ?? ??? [J] . K. Zwintz, D. R. Reese, C. Neiner, Astronomy and astrophysics . 2019,第8期

机译：重新审视系外行星宿主星的脉动特征β Pictoris ？ ?? ???
2. Search for H ₃ + isotopologues toward CRL 2136 IRS 1 ★ ★★ ★★★ [J] . Miwa Goto, Thomas R. Geballe, Jorma Harju, Astronomy and astrophysics . 2019,第19期

机译：搜索针对CRL 2136 IRS 1 ★ 3 + 个同位素=“ fn” rid =“ FN2”>★★ ★★★
3. Dense cores and star formation in the giant molecular cloud Vela C ★ ★★ ★★★ [J] . F. Massi, A. Weiss, D. Elia, Astronomy and astrophysics . 2019,第19期

机译：巨型分子云Vela C中的密集核和恒星形成★ ★★< / xref> ★★★
4. Exploring Operational Frequency Ranges for Actively-Tuned Single-Mass, Multiple-Frequency Vibration Absorber**>This work was supported in part by the Czech Science F [C] . Michael Valáśek, Nejat Olgac, Zdenek Neusser Indian Control Conference . 2019

机译：探索主动调谐的单质量，多频率振动吸收器的工作频率范围 * * >这项工作得到了捷克科学基金会的部分支持
5. Mouse Leukemia: A Model of a Multiple-Gene Disease12 [O] . 1972

机译：小鼠白血病：一种多基因疾病的模型 1 2

Vaex: big data exploration in the era of Gaia ? ??

摘要

著录项

相似文献

相关主题

期刊订阅