Finding approximate solutions to combinatorial problems with very large data sets using BIRCH

Harrington J.; Salibián-Barrera M.

首页> 外文期刊>Computational statistics & data analysis >Finding approximate solutions to combinatorial problems with very large data sets using BIRCH

【24h】

Finding approximate solutions to combinatorial problems with very large data sets using BIRCH

机译：使用BIRCH查找具有非常大的数据集的组合问题的近似解决方案

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Computing estimators with good robustness properties generally requires solving highly complex optimization problems. The current state-of-the-art algorithms to find approximate solutions to these problems need to access the data set a large number to times and become unfeasible when the data do not fit in memory. In this paper the BIRCH algorithm is adapted to calculate approximate solutions to problems in this class. For data sets that fit in memory, this approach is able to find approximate Least Trimmed Squares (LTS) and Minimum Covariance Determinant (MCD) estimators that compare very well with those returned by the fast-LTS and fast-MCD algorithms, and in some cases is able to find a better solution (in terms of value of the objective function) than those returned by the fast- algorithms. This methodology can also be applied to the Linear Grouping Algorithm and its robust variant for very large datasets. Finally, results from a simulation study indicate that this algorithm performs comparably well to fast-LTS in simple situations (large data sets with a small number of covariates and small proportion of outliers) and does much better than fast-LTS in more challenging situations without requiring extra computational time. These findings seem to confirm that this approach provides the first computationally feasible and reliable approximating algorithm in the literature to compute the LTS and MCD estimators for data sets that do not fit in memory.

机译：具有良好鲁棒性的计算估计器通常需要解决高度复杂的优化问题。寻找这些问题的近似解决方案的当前最新算法需要大量访问数据集，并且在数据不适合内存时变得不可行。在本文中，BIRCH算法适用于计算此类问题的近似解。对于适合内存的数据集，此方法能够找到近似的最小二乘平方（LTS）和最小协方差决定因素（MCD）估计量，这些估计值与快速LTS和快速MCD算法返回的估计值相比非常好。案例能够找到比快速算法返回的解决方案更好的解决方案（就目标函数的价值而言）。该方法还可以应用于线性分组算法及其针对大型数据集的鲁棒变体。最后，仿真研究的结果表明，该算法在简单情况下（大型数据集，协变量数量少，离群值比例小），其性能与快速LTS相当，并且在没有挑战的情况下，与快速LTS相比，性能要好得多需要额外的计算时间。这些发现似乎证实了这种方法在文献中提供了第一个计算上可行且可靠的近似算法，以计算不适合内存的数据集的LTS和MCD估计量。

著录项

来源
《Computational statistics & data analysis》 |2010年第3期|共13页
作者
Harrington J.; Salibián-Barrera M.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类概率论与数理统计;
关键词

相似文献

外文文献
中文文献
专利

1. Finding approximate solutions to combinatorial problems with very large data sets using BIRCH [J] . Harrington J., Salibián-Barrera M. Computational statistics & data analysis . 2010,第3期

机译：使用BIRCH查找具有非常大的数据集的组合问题的近似解决方案
2. Generalization of the Method of Linear Integral Equations Having Fundamental Significance for Gravimetry and Magnetometry: 3. Finding of High-Precision Approximate Solutions of the Multitude of Linear Algebraic Equations with Exact Data in the Right Part [J] . V. N. Strakhov Doklady Earth Sciences . 2004,第6期

机译：线性积分方程法对重力法和磁力法具有重要意义的方法的一般化：3.在右边部分找到具有精确数据的多个线性代数方程的高精度近似解
3. Lower semicontinuity and upper semicontinuity of the solution sets and approximate solution sets of parametric multivalued quasivariational inequalities [J] . Khanh PQ, Luu LM Journal of Optimization Theory and Applications . 2007,第3期

机译：参数多值拟变分不等式解集和近似解集的下半连续性和上半连续性
4. Finding approximate analytical solutions of differential equations using Neural Networks with self-adaptive training sets [C] . Felix Hamza-Lup, Ionut E. Iacob, James Orgeron International Conference on Electronics, Computers and Artificial Intelligence . 2021

机译：用自适应训练集查找微分方程的近似分析解
5. Adaptively finding and combining first-order rules for large, skewed data sets. [D] . Oliphant, Louis Tyrrell. 2009

机译：自适应地查找和组合大的偏斜数据集的一阶规则。
6. On the Approximate Solutions in Integers of a Set of Linear Equations [O] . H. F. Blichfeldt 1921

机译：关于一组线性方程的整数的近似解
7. An iterative algorithm to approximate a common element of the set of common fixed points for a finite family of strict pseudo-contractions and of the set of solutions for a modified system of variational inequalities [O] . Atid Kangtunyakarn 2013

机译：一种迭代算法，用于逼近有限伪严格约束族的公共不动点集的公共元素以及经过修改的变分不等式系统的解集
8. Finding Approximate Analytic Solutions To Differential Equations Using Genetic Programming [R] . Burgess, G. 1999

机译：利用遗传规划求微分方程的近似解析解

Finding approximate solutions to combinatorial problems with very large data sets using BIRCH

摘要

著录项

相似文献

相关主题

期刊订阅