首页> 外文学位 >Large-Scale Methods for Nonlinear Manifold Learning.
【24h】

Large-Scale Methods for Nonlinear Manifold Learning.

机译:非线性流形学习的大规模方法。

获取原文
获取原文并翻译 | 示例

摘要

High-dimensional data representation is an important problem in many different areas of science. Nowadays, it is becoming crucial to interpret the data of varying dimensionality correctly. Dimensionality reduction methods process the data in order to help visualize the data, reduce its complexity, or find latent representation of the original problem. The algorithms of nonlinear dimensionality reduction (also known as manifold learning) are used to decrease the dimensionality of the problem while preserving the general structure of the data. Both spectral methods (such as Laplacian Eigenmaps or ISOMAP) and nonlinear embedding algorithms (NLE, such as t-SNE or Elastic Embedding) have shown to provide very good nonlinear embedding of high-dimensional data sets. However, those methods are notorious for very slow optimization, practically preventing them from being used when a data set is bigger than few thousand points.;In my thesis we investigate several techniques to improve different stages of nonlinear dimensionally algorithms. First, we analyze the entropic affinities as a better way to build a similarity matrix. We explore its properties and propose a nearly-optimal algorithm to construct them. Second, we present a novel faster method to optimize NLE by using second-order information during the optimization. Third, for spectral methods, we investigate landmark-based optimization that cleverly substitutes original large-scale problem with a much smaller easy-to-solve subproblem. Finally, we apply Fast Multipole Methods approximation that allows fast computation of the gradient and the objective function of NLE and reduces their computational complexity from O (N2) to O( N).;Each of the proposed methods accelerate the optimization dramatically by one or two orders of magnitude compared to the existing techniques, effectively allowing corresponding methods to run on a dataset with millions of points.
机译:在许多不同的科学领域中,高维数据表示是一个重要的问题。如今,正确解释不同维度的数据变得至关重要。降维方法可处理数据,以帮助可视化数据,降低其复杂性或寻找原始问题的潜在表示。非线性降维算法(也称为流形学习)用于降低问题的维数,同时保留数据的一般结构。光谱方法(例如Laplacian特征图或ISOMAP)和非线性嵌入算法(NLE,例如t-SNE或弹性嵌入)都显示出可以很好地对高维数据集进行非线性嵌入。但是,这些方法臭名昭著,因为它们的优化速度非常慢,实际上在数据集超过几千个点时会阻止它们使用。;在我的论文中,我们研究了几种改进非线性尺寸算法不同阶段的技术。首先,我们将熵亲和力作为建立相似度矩阵的一种更好方法进行分析。我们探索其特性,并提出一种构造它们的最佳算法。其次,我们提出了一种在优化过程中通过使用二阶信息优化NLE的新方法。第三,对于频谱方法,我们研究了基于地标的优化方法,该方法巧妙地用较小的易于解决的子问题替代了原始的大规模问题。最后,我们应用快速多极子方法逼近,可以快速计算NLE的梯度和目标函数,并将其计算复杂度从O(N2)降低到O(N)。与现有技术相比,两个数量级,有效地允许相应的方法在具有数百万个点的数据集上运行。

著录项

  • 作者

    Vladymyrov, Maksym.;

  • 作者单位

    University of California, Merced.;

  • 授予单位 University of California, Merced.;
  • 学科 Computer science.;Computer engineering.
  • 学位 Ph.D.
  • 年度 2014
  • 页码 141 p.
  • 总页数 141
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号