| | |
| | |
Stat |
Members: 3643 Articles: 2'487'895 Articles rated: 2609
29 March 2024 |
|
| | | |
|
Article overview
| |
|
FeCaffe: FPGA-enabled Caffe with OpenCL for Deep Learning Training and Inference on Intel Stratix 10 | Ke He
; Bo Liu
; Yu Zhang
; Andrew Ling
; Dian Gu
; | Date: |
18 Nov 2019 | Abstract: | Deep learning and Convolutional Neural Network (CNN) have becoming
increasingly more popular and important in both academic and industrial areas
in recent years cause they are able to provide better accuracy and result in
classification, detection and recognition areas, compared to traditional
approaches. Currently, there are many popular frameworks in the market for deep
learning development, such as Caffe, TensorFlow, Pytorch, and most of
frameworks natively support CPU and consider GPU as the mainline accelerator by
default. FPGA device, viewed as a potential heterogeneous platform, still
cannot provide a comprehensive support for CNN development in popular
frameworks, in particular to the training phase. In this paper, we firstly
propose the FeCaffe, i.e. FPGA-enabled Caffe, a hierarchical software and
hardware design methodology based on the Caffe to enable FPGA to support
mainline deep learning development features, e.g. training and inference with
Caffe. Furthermore, we provide some benchmarks with FeCaffe by taking some
classical CNN networks as examples, and further analysis of kernel execution
time in details accordingly. Finally, some optimization directions including
FPGA kernel design, system pipeline, network architecture, user case
application and heterogeneous platform levels, have been proposed gradually to
improve FeCaffe performance and efficiency. The result demonstrates the
proposed FeCaffe is capable of supporting almost full features during CNN
network training and inference respectively with high degree of design
flexibility, expansibility and reusability for deep learning development.
Compared to prior studies, our architecture can support more network and
training settings, and current configuration can achieve 6.4x and 8.4x average
execution time improvement for forward and backward respectively for LeNet. | Source: | arXiv, 1911.8905 | Services: | Forum | Review | PDF | Favorites |
|
|
No review found.
Did you like this article?
Note: answers to reviews or questions about the article must be posted in the forum section.
Authors are not allowed to review their own article. They can use the forum section.
browser claudebot
|
| |
|
|
|
| News, job offers and information for researchers and scientists:
| |