首页> 外文学位 >Programmable Manycore Accelerator for Machine Learning, Convolution Neural Network and Binary Neural Network

【24h】

Programmable Manycore Accelerator for Machine Learning, Convolution Neural Network and Binary Neural Network

机译：面向机器学习，卷积神经网络和二进制神经网络的可编程Manycore加速器

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Lightweight Machine Learning (ML) and Convolution Neural Network (CNN) can offer solutions for wearable cognitive devices and the resource-constrained Internet of Things (IoT) platforms. However, the implementation of ML and CNN kernels are computationally intensive and faces memory storage issues on tiny embedded platforms. In recent years, heterogeneous hardware and acceleration, where compute intensive tasks are performed on kernel specific cores, have gained attention with growing interest in the industry to develop tiny lightweight manycore accelerators that address these issues. In this thesis, we propose two extended versions of an existing manycore architecture ``PENC: Power Efficient Nano Cluster" which can efficiently implement common ML and CNN kernels with much-reduced computation and memory complexity. First, we propose ``PACENet: Programmable many-core ACcElerator'', which has CNN specific instructions for frequently used kernels such as convolution, activation functions such as ReLU (RELU) and Max-pool (MP), and machine learning specific instructions for Manhattan distance calculation (MNT). Secondly, we propose ``BiNMAC: Binarized Neural network Manycore ACcelerator'' that implements the binary neural network. Reducing weights to binary format will not only reduce memory access bottleneck but also reduce computations since most arithmetic operations are replaced with bit-wise operations. To add binarized CNN capability, we implemented instructions such as Batch XOR, and XNOR, PCNT (population count), PCH (patch selection) and BCAST (a communication-based instruction) in the existing instruction set hardware. Both PACENet and BiNMAC cores were fully synthesized and placed and routed using TSMC 65~nm CMOS technology. Each single processing core of PACENet occupies 98.7um2 area and consumes 32.2mW power operating at 1GHz frequency and 1V, while BiNMAC single core occupies 97.9um2 area and consumes 31.1mW power. Compared to existing PENC manycore architecture, PACENet achieves 13.3% area reduction and 14.1% power reduction at 1GHz frequency. Compared to the original PENC architecture, BiNMAC achieves 17.1% area reduction and 13.2% power reduction at 1GHz frequency. To conclude this work, we also evaluated the performance of PACENet and BiNMAC accelerators with respect to personalized biomedical applications namely stress detection and seizure detection, and computer vision namely object detection application. The stress detection and seizure detection application are evaluated on ARL dataset and Boston hospital dataset for K-nearest neighbor algorithm. The proposed PACENet shows 59% increase in throughput, and 43.7% reduction in energy consumption for stress detection application, whereas, for seizure detection application, PACENet improves 60% throughput and 43.6% reduction in energy consumption in comparison to the PENC manycore. For computer vision application, we evaluated ResNet-20 network trained using CIFAR-10 dataset for both PACENet and BiNMAC accelerators. PACENet achieves 2.3x higher throughput per watt performance and requires 57.3% less reduction in energy consumption compared to the PENC manycore. For SensorNet implementation, the proposed BiNMAC achieves 1.8x higher throughput and consumes 13x less energy as compared to PENC manycore, while the ResNet-20 network implementation takes 36x higher throughput consuming 195x less energy.

机译：轻量级机器学习（ML）和卷积神经网络（CNN）可以为可穿戴认知设备和资源受限的物联网（IoT）平台提供解决方案。但是，ML和CNN内核的实现需要大量计算，并且在微型嵌入式平台上面临内存存储问题。近年来，异构硬件和加速，其中需要在内核特定的内核上执行计算密集型任务，已经引起了业界的广泛关注，人们日益关注开发可解决这些问题的轻巧轻巧的多核加速器。在本文中，我们提出了现有多核体系结构的两个扩展版本``PENC：高效的纳米簇''，可以有效地实现常见的ML和CNN内核，并且大大降低了计算和内存的复杂性。首先，我们提出了``PACENet：可编程多核ACcElerator''，它具有针对卷积等常用内核的CNN特定指令，诸如ReLU（RELU）和Max-pool（MP）之类的激活功能以及针对曼哈顿距离计算（MNT）的机器学习特定指令。，我们提出了实现二进制神经网络的“ BiNMAC：二进制神经网络Manycore ACcelerator”，将权重降低为二进制格式不仅会减少内存访问瓶颈，而且还会减少计算量，因为大多数算术运算已被按位运算代替。添加二值化的CNN功能后，我们实现了批处理XOR和XNOR，PCNT（填充计数），PCH（补丁选择）和BCAST（通信b ased指令）。 PACENet和BiNMAC内核均已完全合成，并使用TSMC 65〜nm CMOS技术进行布线。 PACENet的每个单处理核心占用98.7um2的面积，并以1GHz的频率和1V的功耗消耗32.2mW的功率，而BiNMAC单核则占用97.9um2的面积，消耗31.1mW的功率。与现有的PENC多核架构相比，PACENet在1GHz频率下实现了13.3％的面积减小和14.1％的功耗减小。与原始的PENC架构相比，BiNMAC在1GHz频率下实现了17.1％的面积减小和13.2％的功耗减小。为了完成这项工作，我们还评估了PACENet和BiNMAC加速器在个性化生物医学应用（即压力检测和癫痫发作检测）以及计算机视觉（即对象检测应用）方面的性能。在ARL数据集和波士顿医院数据集上评估了压力检测和癫痫发作检测应用程序的K近邻算法。拟议中的PACENet在压力检测应用中显示出吞吐量增加了59％，能耗降低了43.7％，而在癫痫发作检测应用中，与PENC manycore相比，PACENet将吞吐量提高了60％，能耗降低了43.6％。对于计算机视觉应用，我们评估了使用CIFAR-10数据集针对PACENet和BiNMAC加速器训练的ResNet-20网络。与PENC manycore相比，PACENet使每瓦性能的吞吐量提高了2.3倍，并且能耗降低了57.3％。对于SensorNet实施，与PENC manycore相比，拟议的BiNMAC实现了1.8倍的高吞吐量，并且能耗降低了13倍，而ResNet-20网络的实现将36倍的吞吐量提高了，能耗降低了195倍。

著录项

作者
Kulkarni, Adwaya Amey.;
展开▼
作者单位

University of Maryland, Baltimore County.;

展开▼
授予单位 University of Maryland, Baltimore County.;
学科 Computer engineering.;Artificial intelligence.
学位 M.S.
年度 2017
页码 112 p.
总页数 112
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Re: machine learning "red dot": open-source, cloud, deep convolutional neural networks in chest radiograph binary normality classification [J] . Halligan S., Plumb A. A. O. Clinical Radiology: Journal of the Royal College of Radiologists . 2019,第2期

机译：RE：机器学习“红点”：开源，云，深卷积神经网络在胸部X线射线二进制正常数分类
2. Re: machine learning "red dot": open-source, cloud, deep convolutional neural networks in chest radiograph binary normality classification. A reply [J] . Yates Elliot J., Yates Louise C., Harvey Hugh Clinical Radiology: Journal of the Royal College of Radiologists . 2019,第2期

机译：RE：机器学习“红点”：开源，云，深卷积神经网络在胸部射线照片二进制正常数分类。回答
3. Re: machine learning "red dot": open-source, cloud, deep convolutional neural networks in chest radiograph binary normality classification. Editor's reply [J] . Weston Mike Clinical Radiology: Journal of the Royal College of Radiologists . 2019,第2期

机译：RE：机器学习“红点”：开源，云，深卷积神经网络在胸部射线照片二进制正常数分类。编者的回复
4. Tetris: Re-architecting Convolutional Neural Network Computation for Machine Learning Accelerators [C] . Hang Lu, Xin Wei, Ning Lin, IEEE/ACM International Conference on Computer-Aided Design . 2018

机译：俄罗斯方块：机器学习加速器的卷积神经网络计算的重新构造
5. FPGA-based Accelerators for Convolutional Neural Networks on Embedded Devices [D] . Perera Miro, Jordi. 2020

机译：基于FPGA的嵌入式设备卷积神经网络的加速器
6. EEG signal analysis using classification techniques: Logistic regression artificial neural networks support vector machines and convolutional neural networks [O] . Maria Camila Guerrero, Juan Sebastián Parada, Helbert Eduardo Espitia 2021

机译：EEG信号分析使用分类技术：Logistic回归人工神经网络支持向量机和卷积神经网络
7. A GPU-Outperforming FPGA Accelerator Architecture for Binary Convolutional Neural Networks [O] . Li, Yixing, Liu, Zichuan, Xu, Kai, 2017

机译：一种GpU优于二进制的FpGa加速器架构卷积神经网络

Programmable Manycore Accelerator for Machine Learning, Convolution Neural Network and Binary Neural Network

摘要

著录项

相似文献

相关主题

期刊订阅