首页> 外文学位 >Exploring image and video by classification and clustering on global and local visual features.
【24h】

Exploring image and video by classification and clustering on global and local visual features.

机译:通过对全局和局部视觉特征进行分类和聚类来探索图像和视频。

获取原文
获取原文并翻译 | 示例

摘要

Images and Videos are complex 2-dimensional spatially correlated data patterns or 3-dimensional spatial-temporally correlated data volumes. Associating the correlations between visual data signals (acquired by imaging sensors) and high-level semantic human knowledge is the core challenging problem of supervised pattern recognition and computer vision. Finding the underlying correlations among large amounts of image or video data themselves is another unsupervised data self-structuring issue. From the previous literature and our own research work using computing machines as tools, there are a lot of efforts trying to address these two tasks statistically, by making good use of recently developed supervised (a.k.a. Classification) and Unsupervised (a.k.a. Clustering) statistical machine learning paradigms.; In this dissertation, we are interested on studying four specific computer vision problems involving unsupervised visual data partitioning, discriminative multiple-class classification and online adaptive appearance learning, using statistical machine learning techniques. Our four tasks are based on extracting both global and local visual appearance patterns in general image and video domains. First, we develop a new clustering algorithm to exploit temporal video structures into piecewise elements (a.k.a. video shot segmentation) by combining central and subspace constraints for a unified solution. The proposed algorithm is also demonstrated its applicability to illumination-invariant face clustering. Second, we detect and recognize the spatial-temporal video subvolumes as action units using a trained 3D-surface action model via multi-scale temporal searching, The dynamic 3D-surface based action model is built up as an empirical distribution over the basic static posture elements in the spirit of texton representation. Thus the action matching process is based on the similarity measurement among histograms. The basic posture units are considered as intermediate visual representations learned by a three-staged clustering algorithm figure-segmented image sequences. Third, we train a discriminative-probabilistic multi-modal density classifier to evaluate the responses of 20 semantic material classes from a large collection of challenging home photos. Then the task of learning photo categories is based on the global image features extracted from the material class-specific density response maps over spatial domain. We adopt the classifier combination technique of a set of random weak discriminators to handle the complex multi-modal photo-feature distributions in high dimensional parameter space. Fourth, we propose a unified nonparametric approach for three applications: location based dynamic template video tracking in low to medium resolution, segmentation based object-level image matching across viewpoints, and binary foreground/background segmentation tracking. The main contributions exist in three areas: (1) we demonstrate that an online classification framework allows very flexible image density matching function constructions to address the general data-driven classification problem; (2) we devise an effective dynamic appearance modeling algorithm requiring only simple nonparametric computations (mean, median, standard deviation) for easy implementation; (3) we present a random patch based computational representation for classifying image segments of object-specific matching and tracking which is highly descriptive and discriminative compared with general image segment descriptors. This proposed approach has been extensively demonstrated of being able to maintain an effective object-level appearance models quite robustly over time under a variety of challenging conditions, such as severe changing, occluding and deformable appearance templates and moving cameras.
机译:图像和视频是复杂的二维空间相关数据模式或三维空间时间相关数据量。将视觉数据信号(由成像传感器获取)与高级语义人类知识之间的相关性关联是监督模式识别和计算机视觉的核心挑战性问题。在大量图像或视频数据本身之间找到潜在的相关性是另一个不受监督的数据自构造问题。从以前的文献以及我们自己使用计算机器作为工具的研究工作来看,通过充分利用最新开发的监督(即分类)和无监督(即聚类)统计机器学习,人们在统计上解决这两个任务的工作很多。范例。在本文中,我们有兴趣研究使用统计机器学习技术研究四个特定的计算机视觉问题,这些问题涉及无监督的视觉数据分区,判别性多类分类和在线自适应外观学习。我们的四个任务基于提取一般图像和视频域中的全局和局部视觉外观模式。首先,我们开发了一种新的聚类算法,通过将中心空间和子空间约束组合为一个统一的解决方案,以将时间视频结构开发为分段元素(也称为视频镜头分割)。所提出的算法也证明了其在光照不变的面部聚类中的适用性。其次,我们使用经过训练的3D表面动作模型通过多尺度时间搜索将时空视频子体积检测为动作单元,并将基于动态3D表面的动作模型建立为基本静态姿势上的经验分布本着Texton表示精神的元素。因此,动作匹配过程基于直方图之间的相似性度量。基本姿势单元被认为是通过三阶段聚类算法图形分割的图像序列学习的中间视觉表示。第三,我们训练一个判别概率多模式密度分类器,以评估来自大量具有挑战性的家庭照片的20种语义材料类别的响应。然后,学习照片类别的任务基于从空间域上特定于材料类别的密度响应图提取的全局图像特征。我们采用一组随机弱鉴别器的分类器组合技术来处理高维参数空间中的复杂多模态光特征分布。第四,我们针对以下三种应用提出了一种统一的非参数方法:低至中分辨率的基于位置的动态模板视频跟踪,跨视点的基于分段的对象级图像匹配以及二进制前景/背景分段跟踪。主要的贡献存在于三个领域:(1)我们证明了在线分类框架允许非常灵活的图像密度匹配功能构造来解决一般的数据驱动分类问题; (2)我们设计了一种有效的动态外观建模算法,仅需简单的非参数计算(均值,中位数,标准差)即可实现; (3)我们提出了一种基于随机补丁的计算表示,用于对特定对象匹配和跟踪的图像片段进行分类,与常规图像片段描述符相比,它具有较高的描述性和区分性。这种提议的方法已经得到了广泛的证明,能够在各种挑战性条件下(例如剧烈变化,遮挡和变形的外观模板和移动相机)随着时间的推移非常有效地维护有效的对象级外观模型。

著录项

  • 作者

    Lu, Le.;

  • 作者单位

    The Johns Hopkins University.;

  • 授予单位 The Johns Hopkins University.;
  • 学科 Artificial Intelligence.; Computer Science.
  • 学位 Ph.D.
  • 年度 2007
  • 页码 157 p.
  • 总页数 157
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 人工智能理论;自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号