首页> 外文学位 >Fast nonparametric machine learning algorithms for high-dimensional massive data and applications.

【24h】

Fast nonparametric machine learning algorithms for high-dimensional massive data and applications.

机译：用于高维海量数据和应用程序的快速非参数机器学习算法。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Nonparametric methods have become increasingly popular in statistics and probabilistic AI communities. One well-known nonparametric method is "nearest-neighbor." It uses the observations in the training set T closest in input space to a query q to form the prediction of q. Specifically, when k of the observations in T are considered, it is called k-nearest-neighbor (or k-NN).; Despite its simplicity, k-NN and its variants have been successful in many machine learning problems, including pattern recognition, text categorization, information retrieval, computational statistics, database and data mining. It is also used for estimating sample distributions and Bayes error.; However, k-NN and many related nonparametric methods remain hampered by their computational complexity. Many spatial methods, such as metric-trees, have been proposed to alleviate the computational cost, but the effectiveness of these methods decreases as the number of dimensions of feature vectors increases. From another direction, researchers are trying to develop ways to find approximate answers. The premise of this research is that in many cases it is not necessary to insist on the exact answers; instead, determining an approximate answer should be sufficient. In fact, some approximate methods show good performance in a number of applications, and some methods enjoy very good theoretical soundness. However, when facing hundreds or thousands dimensions, many algorithms do not work well in reality.; I propose four new spatial methods for fast k-NN and its variants, namely KNS2, KNS3, IOC and spill-tree. The first three algorithms are designed to speed up k-NN classification problems, and they all share the same insight that finding the majority class among the k-NN of q need not to explicitly find those k-nearest-neighbors. Spill-tree is designed for approximate k-NN search. By adapting metric-trees to a more flexible data structure, spill-tree is able to adapt to the distribution of data and it scales well even for huge high-dimensional data sets. Significant efficiency improvement has been observed comparing to LSH (localify sensitive hashing), the state of art approximate k-NN algorithm. We applied spill-tree to real-world three applications: shot video segmentation, drug activity detection and image clustering, which I will explain in the thesis.

机译：非参数方法在统计和概率AI社区中越来越流行。一种众所周知的非参数方法是“最近邻居”。它使用训练集T中输入空间中最接近查询q的观察值来形成q的预测。具体而言，当考虑T中的k个观测值时，称为k最近邻（或k-NN）。尽管简单，但k-NN及其变体在许多机器学习问题中都取得了成功，包括模式识别，文本分类，信息检索，计算统计，数据库和数据挖掘。它也用于估计样本分布和贝叶斯误差。但是，k-NN和许多相关的非参数方法仍然受其计算复杂性的困扰。已经提出了许多空间方法，例如度量树，以减轻计算成本，但是这些方法的有效性随着特征向量的维数增加而降低。从另一个方向来看，研究人员正在尝试找到寻找近似答案的方法。这项研究的前提是，在许多情况下，没有必要坚持确切的答案。相反，确定一个大概的答案就足够了。实际上，一些近似方法在许多应用中显示出良好的性能，并且某些方法在理论上具有很好的可靠性。但是，面对数百或数千个维度时，许多算法在现实中效果不佳。我为快速k-NN及其变体提出了四种新的空间方法，即KNS2，KNS3，IOC和溢出树。前三种算法旨在加快k-NN分类问题的速度，它们都具有相同的见解，即在q的k-NN中找到多数类不需要明确地找到那些k-最近邻居。溢出树旨在用于近似k-NN搜索。通过使度量树适应更灵活的数据结构，溢出树能够适应数据的分布，并且即使对于庞大的高维数据集也可以很好地扩展。与LSH（本地化敏感哈希）技术（最先进的近似k-NN算法）相比，效率得到了显着提高。我们将溢出树应用于现实世界中的三个应用程序：镜头视频分割，毒品活动检测和图像聚类，我将在论文中进行解释。

著录项

作者
Liu, Ting.;
展开▼
作者单位

Carnegie Mellon University.;

展开▼
授予单位 Carnegie Mellon University.;
学科 Computer Science.
学位 Ph.D.
年度 2006
页码 124 p.
总页数 124
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Teaching a Machine to Feel Postoperative Pain: Combining High-Dimensional Clinical Data with Machine Learning Algorithms to Forecast Acute Postoperative Pain [J] . Tighe Patrick J., Harle Christopher A., Hurley Robert W., Pain medicine : . 2015,第7期

机译：教机器感觉术后疼痛：结合高维临床数据和机器学习算法来预测急性术后疼痛
2. Teaching a Machine to Feel Postoperative Pain: Combining High-Dimensional Clinical Data with Machine Learning Algorithms to Forecast Acute Postoperative Pain [J] . Tighe Patrick J., Harle Christopher A., Hurley Robert W., Pain medicine : . 2015,第7期

机译：教导机器感到术后疼痛：将高维临床数据与机器学习算法相结合预测急性术后疼痛
3. Stochastic parallel extreme artificial hydrocarbon networks: An implementation for fast and robust supervised machine learning high-dimensional data [J] . Hiram Ponce, Paulo V. de Campos Souza, Augusto Junio Guimaraes, Engineering Applications of Artificial Intelligence . 2020,第Mara期

机译：随机并行极端人工碳氢化合物网络：一种用于快速，鲁棒的有监督的机器学习高维数据的实现
4. A Data Structure to Speed-Up Machine Learning Algorithms on Massive Datasets [C] . Francisco Padillo, J.M. Luna, Alberto Cano, International conference on hybrid artificial intelligent systems . 2016

机译：大规模数据集上加速机器学习算法的数据结构
5. Fast Machine Learning Algorithms for Massive Datasets with Applications in the Biomedical Domain [D] . Sadrfaridpour, Ehsan. 2020

机译：用于生物医学域中的大规模数据集的快速机器学习算法
6. Teaching a Machine to Feel Postoperative Pain: Combining High-Dimensional Clinical Data with Machine Learning Algorithms to Forecast Acute Postoperative Pain [O] . Patrick J. Tighe, Christopher A. Harle, Robert W. Hurley, -1

机译：教机器感觉到术后疼痛：结合高维临床数据和机器学习算法来预测急性术后疼痛
7. Fast Parallel Machine Learning Algorithms for Large Datasets Using Graphic Processing Unit [O] . Li Qi 2011

机译：使用图形处理单元的大型数据集快速并行机器学习算法
8. Fast Nonparametric Machine Learning Algorithms for High-Dimensional Massive Data and Applications [R] . Liu, T. 2006

机译：高维海量数据及其应用的快速非参数机器学习算法

Fast nonparametric machine learning algorithms for high-dimensional massive data and applications.

摘要

著录项

相似文献

相关主题

期刊订阅