首页> 外文会议>IEEE Winter Conference on Applications of Computer Vision >Learning From Less Data: A Unified Data Subset Selection and Active Learning Framework for Computer Vision
【24h】

Learning From Less Data: A Unified Data Subset Selection and Active Learning Framework for Computer Vision

机译:从更少的数据中学习:计算机视觉的统一数据子集选择和主动学习框架

获取原文

摘要

Supervised machine learning based state-of-the-art computer vision techniques are in general data hungry. Their data curation poses the challenges of expensive human labeling, inadequate computing resources and larger experiment turn around times. Training data subset selection and active learning techniques have been proposed as possible solutions to these challenges. A special class of subset selection functions naturally model notions of diversity, coverage and representation and can be used to eliminate redundancy thus lending themselves well for training data subset selection. They can also help improve the efficiency of active learning in further reducing human labeling efforts by selecting a subset of the examples obtained using the conventional uncertainty sampling based techniques. In this work, we empirically demonstrate the effectiveness of two diversity models, namely the Facility-Location and Dispersion models for training-data subset selection and reducing labeling effort. We demonstrate this across the board for a variety of computer vision tasks including Gender Recognition, Face Recognition, Scene Recognition, Object Detection and Object Recognition. Our results show that diversity based subset selection done in the right way can increase the accuracy by upto 5 - 10% over existing baselines, particularly in settings in which less training data is available. This allows the training of complex machine learning models like Convolutional Neural Networks with much less training data and labeling costs while incurring minimal performance loss.
机译:基于监督机器学习的最新计算机视觉技术通常需要大量数据。他们的数据管理带来了昂贵的人工标签,计​​算资源不足以及较大的实验周转时间的挑战。已经提出了训练数据子集选择和主动学习技术作为应对这些挑战的可能解决方案。一类特殊的子集选择功能自然可以对多样性,覆盖范围和表示形式的概念进行建模,并且可以用来消除冗余,因此很适合用于训练数据子集选择。它们还可以通过选择使用基于常规不确定性采样的技术获得的示例子集来帮助提高主动学习的效率,从而进一步减少人工标注的工作量。在这项工作中,我们通过经验证明了两种多样性模型的有效性,即用于训练数据子集选择和减少标记工作的设施位置模型和分散模型。我们针对各种计算机视觉任务(包括性别识别,面部识别,场景识别,对象检测和对象识别)进行了全面演示。我们的结果表明,以正确的方式进行基于多样性的子集选择可以使准确性比现有基准提高多达5-10%,特别是在可获得较少训练数据的环境中。这样就可以用更少的训练数据和标签成本来训练复杂的机器学习模型(例如卷积神经网络),而又将性能损失降到最低。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号