https://www.mdu.se/

mdu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
DeepKit: a multistage exploration framework for hardware implementation of deep learning
Mälardalen University, School of Innovation, Design and Engineering.
2023 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Deep Neural Networks (DNNs) are widely adopted to solve different problems ranging from speech recognition to image classification. DNNs demand a large amount of processing power, and their implementation on hardware, i.e., FPGA or ASIC, has received much attention. However, it is impossible to implement a DNN on hardware directly from its DNN descriptions, usually in Python language, libraries, and APIs. Therefore, it should be either implemented from scratch at Register Transfer Level (RTL), e.g., in VHDL or Verilog, or be transformed to a lower level implementation. One idea that has been recently considered is converting a DNN to C and then using High-Level Synthesis (HLS) to synthesize it on an FPGA. Nevertheless, there are various aspects to take into consideration during the transformation. In this thesis, we propose a multistage framework, DeepKit, that generates a synthesizable C implementation based on an input DNN architecture in a DNN description (Keras). Then, moving through the stages, various explorations and optimizations are performed with regard to accuracy, latency, resource utilization, and reliability. The framework is also implemented as a toolchain consisting of DeepHLS, AutoDeepHLS, DeepAxe, and DeepFlexiHLS, and results are provided for DNNs of various types and sizes.

Place, publisher, year, edition, pages
Västerås: Mälardalen university , 2023.
Series
Mälardalen University Press Dissertations, ISSN 1651-4238 ; 390
National Category
Embedded Systems
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:mdh:diva-64488ISBN: 978-91-7485-613-2 (print)OAI: oai:DiVA.org:mdh-64488DiVA, id: diva2:1803278
Public defence
2023-12-07, Delta, Mälardalens universitet, Västerås, 13:00 (English)
Opponent
Supervisors
Available from: 2023-10-09 Created: 2023-10-09 Last updated: 2023-11-16Bibliographically approved
List of papers
1. DeepHLS: A complete toolchain for automatic synthesis of deep neural networks to FPGA
Open this publication in new window or tab >>DeepHLS: A complete toolchain for automatic synthesis of deep neural networks to FPGA
2020 (English)In: ICECS 2020 - 27th IEEE International Conference on Electronics, Circuits and Systems, Proceedings, Institute of Electrical and Electronics Engineers Inc. , 2020, article id 9294881Conference paper, Published paper (Refereed)
Abstract [en]

Deep neural networks (DNN) have achieved quality results in various applications of computer vision, especially in image classification problems. DNNs are computational intensive, and nowadays, their acceleration on the FPGA has received much attention. Many methods to accelerate DNNs have been proposed. Despite their performance features like acceptable accuracy or low latency, their use is not widely accepted by software designers who usually do not have enough knowledge of the hardware details of the proposed accelerators. HLS tools are the major promising tools that can act as a bridge between software designers and hardware implementation. However, not only most HLS tools just support C and C++ descriptions as input, but also their result is very sensitive to the coding style. It makes it difficult for the software developers to adopt them, as DNNs are mostly described in high-level languages such as Tensorflow or Keras. In this paper, an integrated toolchain is presented that, in addition to converting the Keras DNN descriptions to a simple, flat, and synthesizable C output, provides other features such as accuracy verification, C level knobs to easily change the data types from floating-point to fixed-point with arbitrary bit width, and latency and area utilization adjustment using HLS knobs. 

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc., 2020
Keywords
Accelerator, CNN, Convolutional neural networks, Deep Neural Networks, High-level synthesis
National Category
Computer Systems
Identifiers
urn:nbn:se:mdh:diva-53227 (URN)10.1109/ICECS49266.2020.9294881 (DOI)000612696300097 ()2-s2.0-85099485200 (Scopus ID)9781728160443 (ISBN)
Conference
27th IEEE International Conference on Electronics, Circuits and Systems, ICECS 2020; Glasgow; United Kingdom; 23 November 2020 through 25 November 2020
Available from: 2021-01-28 Created: 2021-01-28 Last updated: 2023-10-09Bibliographically approved
2. DeepAxe: A Framework for Exploration of Approximation and Reliability Trade-offs in DNN Accelerators
Open this publication in new window or tab >>DeepAxe: A Framework for Exploration of Approximation and Reliability Trade-offs in DNN Accelerators
Show others...
2023 (English)In: Proceedings - International Symposium on Quality Electronic Design, ISQED, IEEE Computer Society , 2023Conference paper, Published paper (Refereed)
Abstract [en]

While the role of Deep Neural Networks (DNNs) in a wide range of safety-critical applications is expanding, emerging DNNs experience massive growth in terms of computation power. It raises the necessity of improving the reliability of DNN accelerators yet reducing the computational burden on the hardware platforms, i.e. reducing the energy consumption and execution time as well as increasing the efficiency of DNN accelerators. Therefore, the trade-off between hardware performance, i.e. area, power and delay, and the reliability of the DNN accelerator implementation becomes critical and requires tools for analysis.In this paper, we propose a framework DeepAxe for design space exploration for FPGA-based implementation of DNNs by considering the trilateral impact of applying functional approximation on accuracy, reliability and hardware performance. The framework enables selective approximation of reliability-critical DNNs, providing a set of Pareto-optimal DNN implementation design space points for the target resource utilization requirements. The design flow starts with a pre-trained network in Keras, uses an innovative high-level synthesis environment DeepHLS and results in a set of Pareto-optimal design space points as a guide for the designer. The framework is demonstrated on a case study of custom and state-of-the-art DNNs and datasets. 

Place, publisher, year, edition, pages
IEEE Computer Society, 2023
Keywords
approximate computing, deep neural networks, fault simulation, reliability, resiliency assessment
National Category
Computer Systems
Identifiers
urn:nbn:se:mdh:diva-63499 (URN)10.1109/ISQED57927.2023.10129353 (DOI)001013619400058 ()2-s2.0-85161606608 (Scopus ID)9798350334753 (ISBN)
Conference
24th International Symposium on Quality Electronic Design, ISQED 2023, San Francisco, 5 April 2023 through 7 April 2023
Available from: 2023-06-21 Created: 2023-06-21 Last updated: 2023-10-09Bibliographically approved
3. DeepFlexiHLS: Deep Neural Network Flexible High-Level Synthesis Directive Generator
Open this publication in new window or tab >>DeepFlexiHLS: Deep Neural Network Flexible High-Level Synthesis Directive Generator
2022 (English)In: 2022 IEEE Nordic Circuits and Systems Conference, NORCAS 2022 - Proceedings, Institute of Electrical and Electronics Engineers Inc. , 2022Conference paper, Published paper (Refereed)
Abstract [en]

Deep Neural Networks (DNNs) are now widely adopted to solve various problems ranging from speech recognition to image classification. Since DNNs demand a large amount of processing power, their implementation on hardware, i.e., FPGA or ASIC, has received much attention. High-level synthesis is widely used since it significantly boosts productivity and flexibility and requires minimal hardware knowledge. However, when HLS transforms a C implementation to a Register-Transfer Level one, the high parallelism capability of the FPGA is not well-utilized. HLS tools provide a feature called directives through which designers can guide the tool using some defined C pragma statements to improve performance. Nevertheless, finding appropriate directives is another challenge, which needs considerable expertise and experience. This paper proposes DeepFlexiHLS, a two-stage design space exploration flow to find a set of directives to achieve minimal latency. In the first stage, a partition-based method is used to find the directives corresponding to each partition. Aggregating all these directives leads to minimal latency. Experimental results show 54% more speed-up than similar work on VGG neural network. In the second stage, an estimator is implemented to find the latency and resource utilization of various combinations of the found directives. The results form a Pareto-frontier from which the designer can choose if FPGA resources are limited or are not to be entirely used by the DNN module.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc., 2022
Keywords
Accelerator, CNN, Deep Neural Network, Design Space Exploration, HLS, Field programmable gate arrays (FPGA), High level synthesis, Integrated circuit design, Speech recognition, High-level synthesis, Images classification, Improve performance, Large amounts, Network demands, Processing power, Register-transfer level, Two stage designs, Deep neural networks
National Category
Computer Sciences
Identifiers
urn:nbn:se:mdh:diva-61069 (URN)10.1109/NorCAS57515.2022.9934617 (DOI)000889469600019 ()2-s2.0-85142437239 (Scopus ID)9798350345506 (ISBN)
Conference
8th IEEE Nordic Circuits and Systems Conference, NORCAS 2022, 25 October 2022 through 26 October 2022
Available from: 2022-11-30 Created: 2022-11-30 Last updated: 2023-10-09Bibliographically approved
4. AutoDeepHLS: Deep Neural Network High-level Synthesis using fixed-point precision
Open this publication in new window or tab >>AutoDeepHLS: Deep Neural Network High-level Synthesis using fixed-point precision
2022 (English)In: 2022 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2022): INTELLIGENT TECHNOLOGY IN THE POST-PANDEMIC ERA, IEEE , 2022, p. 122-125Conference paper, Published paper (Refereed)
Abstract [en]

Deep Neural Networks (DNN) have received much attention in various applications such as visual recognition, self-driving cars, health care, etc. Hardware implementation, specifically using FPGA and ASIC due to their high performance and low power consumption, is considered an efficient method. However, implementation on these platforms is difficult for neural network designers since they usually have limited knowledge of hardware. High-Level Synthesis (HLS) tools can act as a bridge between high-level DNN designs and hardware implementation. Nevertheless, these tools usually need implementation at the C level, whereas the design of neural networks is usually performed at a higher level (such as Keras or TensorFlow). In this paper, we propose a fully automated flow for creating a C-level implementation that is synthesizable with HLS Tools. Various aspects such as performance, minimal access to memory elements, data type knobs, and design verification are considered. Our results show that the generated C implementation is much more HLS friendly than previous works. Furthermore, a complete flow is proposed to determine different fixed-point precisions for network elements. We show that our method results in 25% and 34% reduction in bit-width for LeNet and VGG, respectively, without any accuracy loss.

Place, publisher, year, edition, pages
IEEE, 2022
Keywords
Deep Neural Network, Accelerator, High-Level Synthesis, Fixed-Point, Quantization
National Category
Computer Systems
Identifiers
urn:nbn:se:mdh:diva-60590 (URN)10.1109/AICAS54282.2022.9869907 (DOI)000859273200032 ()2-s2.0-85139073458 (Scopus ID)978-1-6654-0996-4 (ISBN)
Conference
IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS) - Intelligent Technology in the Post-Pandemic Era, JUN 13-15, 2022, Incheon, SOUTH KOREA
Available from: 2022-11-09 Created: 2022-11-09 Last updated: 2023-10-09Bibliographically approved

Open Access in DiVA

fulltext(1214 kB)167 downloads
File information
File name FULLTEXT02.pdfFile size 1214 kBChecksum SHA-512
98120520d154d399a0e2d404bcd3d28a4e3792fed1405e56b4c3a82fe7c3b852d30c156816f2849d3ecb8336a2e0fd6fde06745209a3cbe17be05808d10ea6f5
Type fulltextMimetype application/pdf

Authority records

Riazati, Mohammad

Search in DiVA

By author/editor
Riazati, Mohammad
By organisation
School of Innovation, Design and Engineering
Embedded Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 167 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1274 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf