首页> 外文学位 >Robust Pose Invariant Face Recognition Using 3D Thin Plate Spline Spatial Transformer Networks
【24h】

Robust Pose Invariant Face Recognition Using 3D Thin Plate Spline Spatial Transformer Networks

机译:使用3D薄板样条空间变压器网络的稳健姿态不变人脸识别

获取原文
获取原文并翻译 | 示例

摘要

In recent years, face recognition has advanced with incredible speed thanks to the advent of deep learning, large scale datasets, and the improvement in GPU computing. While many of these methods claim to be able to match faces from images captured in-the-wild, they still seem to perform poorly when trying to match non-frontal faces to frontal ones which is the practical scenario faced by law enforcement everyday in the processing of criminal cases. Trying to learn these large pose variations implicitly is a very hard problem, both from a deep neural network modeling perspective and from the lack of structured datasets used in training and evaluating these models. As they are often made up of celebrity images found online, they contain a large bias in the types of images present in both the datasets used for training and evaluating new methods. Perhaps the largest bias is in the distribution of the pose of the faces. Most celebrity images are captured from a frontal or near-frontal view which have traditionally been the easiest poses for face recognition. Most importantly, as both training and evaluation datasets share this bias, this has led to artificially high results being reported.;The goal of this thesis is to design a system to be able to take advantage of the large amount of data already available and still be able to perform robust face recognition across large pose variations. We propose that the most efficient way to do this is to transform and reduce the entire pose distribution to just the frontal faces by re-rendering the off-angle faces from a frontal viewpoint. By doing this, the mismatch between the training, evaluation, and real-world multi-modal distributions on pose will be eliminated. To solve this problem we must explicitly understand and model the 3D face structure of faces since faces are not planar objects. This 3D model of the face must be able to be generated from a single, 2D image since that is all that is usually available in a recognition scenario. This is also the hardest scenario and is often overlooked by the use of temporal fusion to perform some kind of data reconstruction. By improving performance of the models in this worst case scenario, we can always further improve by utilizing temporal information later but maintain a high accuracy on single images.;To achieve this, we first design a new method of 3D facial alignment and modeling from a single 2D image using our 3D Thin Plate Spline Spatial Transformer Networks (3DTPS-STN). We evaluate this method against several previous methods on the Annotated Facial Landmarks in the Wild (AFLW) dataset and the synthetic AFLW2000-3D dataset and show that our method achieves very high performance on these at a much faster speed. We also confirm the intuition that most recognition datasets in use have a heavy bias towards frontal faces using the implicit knowledge of the pose extracted from the 3D modeling. We then show how we can use the 3D models created by the 3DTPS-STN method to frontalize the face from any angle and, by a careful selection of the face region, generate a more stable face image across all poses. We then train a 28 layer ResNet, a common face recognition framework, on these faces and show that this model can outperform all comparable models on the CMU Multi-PIE dataset and also show a detailed analysis on other datasets.
机译:近年来,由于深度学习的出现,大规模数据集以及GPU计算的改进,人脸识别以惊人的速度发展。尽管这些方法中的许多方法都声称能够从野外捕获的图像中匹配人脸,但是当尝试将非正面人脸与正面人脸进行匹配时,它们似乎仍然表现不佳,这是执法部门每天在现实中面临的实际情况。处理刑事案件。从深层神经网络建模的角度以及缺乏训练和评估这些模型所使用的结构化数据集的角度来看,试图隐式地学习这些大的姿势变化是一个非常困难的问题。由于它们通常由网上找到的名人图像组成,因此它们在用于训练和评估新方法的数据集中都存在很大的图像类型偏差。可能最大的偏差在于脸部姿势的分布。大多数名人图像是从正面或近正面的视图中捕获的,这些图像传统上是人脸识别最简单的姿势。最重要的是,由于训练数据集和评估数据集都存在这种偏差,因此导致了人为地报告了高水平的结果。本论文的目的是设计一个能够利用大量现有数据并且仍然可以利用的数据的系统能够在较大的姿势变化中执行可靠的人脸识别。我们建议,最有效的方法是通过从正面的角度重新渲染倾斜的脸部,将整个姿势分布变换为仅正面的脸部并将其减少。通过这样做,将消除姿势上的训练,评估和实际多模式分布之间的不匹配。为了解决这个问题,我们必须明确地了解3D人脸结构并对其建模,因为人脸不是平面对象。脸部的3D模型必须能够从单个2D图像生成,因为在识别场景中通常可以使用全部2D图像。这也是最困难的情况,使用时间融合来执行某种数据重构通常会被忽略。通过在最坏的情况下提高模型的性能,我们总是可以在以后利用时间信息来进一步改进,但在单个图像上保持较高的准确性。要实现这一点,我们首先设计了一种新的3D人脸对齐和建模方法使用我们的3D薄板样条空间变压器网络(3DTPS-STN)获得单个2D图像。我们根据“带注释的人脸地标”(​​AFLW)数据集和合成的AFLW2000-3D数据集上的几种先前方法对这种方法进行了评估,结果表明我们的方法以更快的速度在这些方面获得了很高的性能。我们还证实了这样一种直觉,即使用从3D建模中提取的姿势的隐式知识,使用中的大多数识别数据集对正面的偏见很大。然后,我们说明如何使用3DTPS-STN方法创建的3D模型从任何角度对人脸进行正面化,并通过仔细选择人脸区域,在所有姿势中生成更稳定的人脸图像。然后,我们在这些面孔上训练了28层ResNet(一个常见的面孔识别框架),并证明了该模型可以胜过CMU Multi-PIE数据集上的所有可比模型,并且还可以显示其他数据集上的详细分析。

著录项

  • 作者

    Bhagavatula, Chandrasekhar.;

  • 作者单位

    Carnegie Mellon University.;

  • 授予单位 Carnegie Mellon University.;
  • 学科 Artificial intelligence.
  • 学位 Ph.D.
  • 年度 2018
  • 页码 140 p.
  • 总页数 140
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号