首页> 外文学位 >Hierarchical learning of discriminative features and classifiers for large-scale visual recognition.
【24h】

Hierarchical learning of discriminative features and classifiers for large-scale visual recognition.

机译:用于大规模视觉识别的区分性特征和分类器的分层学习。

获取原文
获取原文并翻译 | 示例

摘要

Enabling computers to recognize objects present in images has been a long standing but tremendously challenging problem in the field of computer vision for decades. Beyond the difficulties resulting from huge appearance variations, large-scale visual recognition poses unprecedented challenges when the number of visual categories being considered becomes thousands, and the amount of images increases to millions. This dissertation contributes to addressing a number of the challenging issues in large-scale visual recognition.;First, we develop an automatic image-text alignment method to collect massive amounts of labeled images from the Web for training visual concept classifiers. Specifically, we first crawl a large number of cross-media Web pages containing Web images and their auxiliary texts, and then segment them into a collection of image-text pairs. We then show that near-duplicate image clustering according to visual similarity can significantly reduce the uncertainty on the relatedness of Web images' semantics to their auxiliary text terms or phrases. Finally, we empirically demonstrate that random walk over a newly proposed phrase correlation network can help to achieve more precise image-text alignment by refining the relevance scores between Web images and their auxiliary text terms.;Second, we propose a visual tree model to reduce the computational complexity of a large-scale visual recognition system by hierarchically organizing and learning the classifiers for a large number of visual categories in a tree structure. Compared to previous tree models, such as the label tree, our visual tree model does not require training a huge amount of classifiers in advance which is computationally expensive. However, we experimentally show that the proposed visual tree achieves results that are comparable or even better to other tree models in terms of recognition accuracy and efficiency.;Third, we present a joint dictionary learning (JDL) algorithm which exploits the inter-category visual correlations to learn more discriminative dictionaries for image content representation. Given a group of visually correlated categories, JDL simultaneously learns one common dictionary and multiple category-specific dictionaries to explicitly separate the shared visual atoms from the category-specific ones. We accordingly develop three classification schemes to make full use of the dictionaries learned by JDL for visual content representation in the task of image categorization. Experiments on two image data sets which respectively contain 17 and 1,000 categories demonstrate the effectiveness of the proposed algorithm.;In the last part of the dissertation, we develop a novel data-driven algorithm to quantitatively characterize the semantic gaps of different visual concepts for learning complexity estimation and inference model selection. The semantic gaps are estimated directly in the visual feature space since the visual feature space is the common space for concept classifier training and automatic concept detection. We show that the quantitative characterization of the semantic gaps helps to automatically select more effective inference models for classifier training, which further improves the recognition accuracy rates.
机译:数十年来,使计算机能够识别图像中存在的对象一直是长期存在的但具有极大挑战性的问题。除了外观变化巨大带来的困难外,当考虑的视觉类别数量成千上万,并且图像数量增加到数百万时,大规模的视觉识别带来了前所未有的挑战。本论文为解决大规模视觉识别中的许多难题提供了帮助。首先,我们开发了一种自动的图像文本对齐方法,可以从Web上收集大量标记图像,以训练视觉概念分类器。具体来说,我们首先爬网包含Web图像及其辅助文本的大量跨媒体Web页面,然后将它们分段为图像-文本对的集合。然后,我们表明,根据视觉相似性进行的近乎重复的图像聚类可以显着减少Web图像语义与辅助文本术语或短语之间的相关性不确定性。最后,我们通过经验证明,通过改进Web图像与其辅助文本项之间的相关性得分,在新提出的短语相关网络上随机游走可以帮助实现更精确的图像-文本对齐。第二,我们提出了一种视觉树模型来减少通过分层组织和学习树状结构中大量视觉类别的分类器,可以解决大型视觉识别系统的计算复杂性。与以前的树模型(例如标签树)相比,我们的视觉树模型不需要预先训练大量的分类器,这在计算上是昂贵的。然而,我们通过实验表明,提出的视觉树在识别准确度和效率方面达到了与其他树模型相当甚至更好的结果。第三,我们提出了一种利用类别间视觉的联合字典学习(JDL)算法。相关性,以了解更多用于图像内容表示的判别词典。给定一组视觉上相关的类别,JDL同时学习一个通用词典和多个类别特定的字典,以将共享的可视原子与类别特定的可视原子明确分开。因此,我们开发了三种分类方案,以在图像分类任务中充分利用JDL所学的字典来进行视觉内容表示。在分别包含17个类别和1,000个类别的两个图像数据集上进行的实验证明了该算法的有效性。在论文的最后部分,我们开发了一种新颖的数据驱动算法来定量刻画学习中不同视觉概念的语义鸿沟复杂度估计和推理模型选择。由于视觉特征空间是用于概念分类器训练和自动概念检测的公共空间,因此直接在视觉特征空间中估计语义间隙。我们表明语义间隙的定量表征有助于自动选择更有效的推理模型进行分类器训练,从而进一步提高识别准确率。

著录项

  • 作者

    Zhou, Ning.;

  • 作者单位

    The University of North Carolina at Charlotte.;

  • 授予单位 The University of North Carolina at Charlotte.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2014
  • 页码 210 p.
  • 总页数 210
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号