首页> 外文会议>International Conference on Field Programmable Logic and Applications >Optimizing hardware design for Human Action Recognition
【24h】

Optimizing hardware design for Human Action Recognition

机译:优化人类行动识别的硬件设计

获取原文

摘要

Human action recognition (HAR) is an important topic in computer vision having a wide range of applications: health care, assisted living, surveillance, security, gaming, etc. Despite significant amount of work having been conducted in this area in recent years, the execution speed still limits real-time applications. Moreover, it is highly desirable to have the compute-intensive feature extraction stage done right at the output of the camera to extract and transfer only action feature in multi-camera network setting and hence reduce network bandwidth requirement. In this work, we first evaluate the possibility to perform feature extraction under reduced precision fixed-point arithmetic to ease hardware resource requirements. We compared the Histogram of Oriented Gradient in 3D (HOG3D) feature extraction with state-of-the-art Convolutional Neural Networks (CNNs) methods and shown the later to be 75× slower than the former. Our experiment shows that by re-training the classifier with reduced data precision, the classification performs as well as the original double-precision floating-point. Based on this result, we implement an FPGA-based HAR feature extraction for near camera processing using fixed-point data representation and arithmetic. This implementation, using a single Xilinx Virtex 6 FPGA, achieves about 70× speedup over multicore CPU. Furthermore, a GPU implementation of HAR is introduced with 80× speedup over CPU (on an Nvidia Tesla K20). Last but not least, a power comparison is presented for the three platforms.
机译:人类行动认可(HAR)是具有广泛应用的计算机愿景中的重要主题:尽管近年来这一领域在这一领域进行了大量工作,但仍有大量工作的医疗保健,辅助生活,监控,安全,游戏等执行速度仍会限制实时应用程序。此外,非常希望在相机的输出处具有右完成的计算密集型特征提取阶段,以仅在多摄像机网络设置中仅提取和转移动作特征,从而降低网络带宽要求。在这项工作中,首先评估在减少的精度定点算术下执行功能提取的可能性,以简化硬件资源要求。我们将面向梯度的直方图与最先进的卷积神经网络(CNNS)方法进行了比较了3D(HOG3D)特征提取的直方图,并且显示后来比前者慢75倍。我们的实验表明,通过重新训练分类器,通过减少数据精度,分类执行以及原始的双精度浮点。基于此结果,我们使用定点数据表示和算术来实现用于近相机处理的FPGA的HAR特征提取。使用单个Xilinx Virtex 6 FPGA实现此实现,通过多核CPU实现大约70倍的加速。此外,通过CPU的80×加速器引入HAR的GPU实现(在NVIDIA TESLA K20上)。最后但并非最不重要的是,为三个平台提供了权力比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号