https://www.mdu.se/

mdu.sePublications
Change search
Refine search result
1 - 10 of 10
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Loni, Mohammad
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Mousavi, Hamid
    Mälardalen University.
    Riazati, Mohammad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sjödin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    TAS: Ternarized Neural Architecture Search for Resource-Constrained Edge Devices2022Conference paper (Refereed)
    Abstract [en]

    Ternary Neural Networks (TNNs) compress network weights and activation functions into 2-bit representation resulting in remarkable network compression and energy efficiency. However, there remains a significant gap in accuracy between TNNs and full-precision counterparts. Recent advances in Neural Architectures Search (NAS) promise opportunities in automated optimization for various deep learning tasks. Unfortunately, this area is unexplored for optimizing TNNs. This paper proposes TAS, a framework that drastically reduces the accuracy gap between TNNs and their full-precision counterparts by integrating quantization into the network design. We experienced that directly applying NAS to the ternary domain provides accuracy degradation as the search settings are customized for full-precision networks. To address this problem, we propose (i) a new cell template for ternary networks with maximum gradient propagation; and (ii) a novel learnable quantizer that adaptively relaxes the ternarization mechanism from the distribution of the weights and activation functions. Experimental results reveal that TAS delivers 2.64% higher accuracy and 2.8x memory saving over competing methods with the same bit-width resolution on the CIFAR-10 dataset. These results suggest that TAS is an effective method that paves the way for the efficient design of the next generation of quantized neural networks.

  • 2.
    Riazati, Mohammad
    Mälardalen University, School of Innovation, Design and Engineering.
    DeepKit: a multistage exploration framework for hardware implementation of deep learning2023Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Deep Neural Networks (DNNs) are widely adopted to solve different problems ranging from speech recognition to image classification. DNNs demand a large amount of processing power, and their implementation on hardware, i.e., FPGA or ASIC, has received much attention. However, it is impossible to implement a DNN on hardware directly from its DNN descriptions, usually in Python language, libraries, and APIs. Therefore, it should be either implemented from scratch at Register Transfer Level (RTL), e.g., in VHDL or Verilog, or be transformed to a lower level implementation. One idea that has been recently considered is converting a DNN to C and then using High-Level Synthesis (HLS) to synthesize it on an FPGA. Nevertheless, there are various aspects to take into consideration during the transformation. In this thesis, we propose a multistage framework, DeepKit, that generates a synthesizable C implementation based on an input DNN architecture in a DNN description (Keras). Then, moving through the stages, various explorations and optimizations are performed with regard to accuracy, latency, resource utilization, and reliability. The framework is also implemented as a toolchain consisting of DeepHLS, AutoDeepHLS, DeepAxe, and DeepFlexiHLS, and results are provided for DNNs of various types and sizes.

    Download full text (pdf)
    fulltext
  • 3.
    Riazati, Mohammad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Reliability and Performance in Heterogeneous Systems Generated by High-Level Synthesis2021Licentiate thesis, comprehensive summary (Other academic)
    Abstract [en]

    High-level synthesis (HLS) is now widely used to implement heterogeneous systems. It was invented to enable designers to use high-level languages such as C or C++. It makes it possible for the software developers to move their implementations to an FPGA or ASIC without having to know the hardware details. HLS tools only convert a high-level software program to a hardware implementation, and reliability and performance measures must be taken by the designer prior to feeding the program to the tool. In this thesis, we propose methods to improve the reliability and performance aspects of heterogeneous systems generated with the help of an HLS. We first propose methods to improve the reliability of the generated circuit either through utilizing pre-existing assertion statements for high-speed design testing and post-synthesis monitoring or by defining a generic redundancy method for self-healing hardware modules. Then, we propose an automatic toolchain to guide the HLS tool to generate a high-performance circuit. 

  • 4.
    Riazati, Mohammad
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sjödin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Lisper, Björn
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    AutoDeepHLS: Deep Neural Network High-level Synthesis using fixed-point precision2022In: 2022 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2022): INTELLIGENT TECHNOLOGY IN THE POST-PANDEMIC ERA, IEEE , 2022, p. 122-125Conference paper (Refereed)
    Abstract [en]

    Deep Neural Networks (DNN) have received much attention in various applications such as visual recognition, self-driving cars, health care, etc. Hardware implementation, specifically using FPGA and ASIC due to their high performance and low power consumption, is considered an efficient method. However, implementation on these platforms is difficult for neural network designers since they usually have limited knowledge of hardware. High-Level Synthesis (HLS) tools can act as a bridge between high-level DNN designs and hardware implementation. Nevertheless, these tools usually need implementation at the C level, whereas the design of neural networks is usually performed at a higher level (such as Keras or TensorFlow). In this paper, we propose a fully automated flow for creating a C-level implementation that is synthesizable with HLS Tools. Various aspects such as performance, minimal access to memory elements, data type knobs, and design verification are considered. Our results show that the generated C implementation is much more HLS friendly than previous works. Furthermore, a complete flow is proposed to determine different fixed-point precisions for network elements. We show that our method results in 25% and 34% reduction in bit-width for LeNet and VGG, respectively, without any accuracy loss.

  • 5. Riazati, Mohammad
    et al.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems. Tallinn University of Technology, Department of Computer Systems, Tallinn, Estonia.
    Sjödin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Lisper, Björn
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    DeepFlexiHLS: Deep Neural Network Flexible High-Level Synthesis Directive Generator2022In: 2022 IEEE Nordic Circuits and Systems Conference, NORCAS 2022 - Proceedings, Institute of Electrical and Electronics Engineers Inc. , 2022Conference paper (Refereed)
    Abstract [en]

    Deep Neural Networks (DNNs) are now widely adopted to solve various problems ranging from speech recognition to image classification. Since DNNs demand a large amount of processing power, their implementation on hardware, i.e., FPGA or ASIC, has received much attention. High-level synthesis is widely used since it significantly boosts productivity and flexibility and requires minimal hardware knowledge. However, when HLS transforms a C implementation to a Register-Transfer Level one, the high parallelism capability of the FPGA is not well-utilized. HLS tools provide a feature called directives through which designers can guide the tool using some defined C pragma statements to improve performance. Nevertheless, finding appropriate directives is another challenge, which needs considerable expertise and experience. This paper proposes DeepFlexiHLS, a two-stage design space exploration flow to find a set of directives to achieve minimal latency. In the first stage, a partition-based method is used to find the directives corresponding to each partition. Aggregating all these directives leads to minimal latency. Experimental results show 54% more speed-up than similar work on VGG neural network. In the second stage, an estimator is implemented to find the latency and resource utilization of various combinations of the found directives. The results form a Pareto-frontier from which the designer can choose if FPGA resources are limited or are not to be entirely used by the DNN module.

  • 6.
    Riazati, Mohammad
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems. Tallinn University of Technology, Tallinn, Estonia.
    Sjödin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Lisper, Björn
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    DeepHLS: A complete toolchain for automatic synthesis of deep neural networks to FPGA2020In: ICECS 2020 - 27th IEEE International Conference on Electronics, Circuits and Systems, Proceedings, Institute of Electrical and Electronics Engineers Inc. , 2020, article id 9294881Conference paper (Refereed)
    Abstract [en]

    Deep neural networks (DNN) have achieved quality results in various applications of computer vision, especially in image classification problems. DNNs are computational intensive, and nowadays, their acceleration on the FPGA has received much attention. Many methods to accelerate DNNs have been proposed. Despite their performance features like acceptable accuracy or low latency, their use is not widely accepted by software designers who usually do not have enough knowledge of the hardware details of the proposed accelerators. HLS tools are the major promising tools that can act as a bridge between software designers and hardware implementation. However, not only most HLS tools just support C and C++ descriptions as input, but also their result is very sensitive to the coding style. It makes it difficult for the software developers to adopt them, as DNNs are mostly described in high-level languages such as Tensorflow or Keras. In this paper, an integrated toolchain is presented that, in addition to converting the Keras DNN descriptions to a simple, flat, and synthesizable C output, provides other features such as accuracy verification, C level knobs to easily change the data types from floating-point to fixed-point with arbitrary bit width, and latency and area utilization adjustment using HLS knobs. 

  • 7.
    Riazati, Mohammad
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sjödin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Lisper, Björn
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    High-Level Synthesis Design Space Exploration for Highly Optimized Deep Neural Network ImplementationManuscript (preprint) (Other academic)
  • 8.
    Riazati, Mohammad
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sjödin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Lisper, Björn
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    SHiLA: Synthesizing High-Level Assertions for High-Speed Validation of High-Level Designs2020In: Proceedings - 2020 23rd International Symposium on Design and Diagnostics of Electronic Circuits and Systems, DDECS 2020, Institute of Electrical and Electronics Engineers Inc. , 2020, article id 9095728Conference paper (Refereed)
    Abstract [en]

    In the past, assertions were mostly used to validate the system through the design and simulation process. Later, a new method known as assertion synthesis was introduced, which enabled the designers to use the assertions for high-speed hardware emulation and safety and reliability insurance after tape-out. Although the synthesis of the assertions at the register transfer level is proposed and implemented in several works, none of them can be adopted for high-level assertions. In this paper, we propose the SHiLA framework and a detailed implementation guide by which assertion synthesis can also be applied to the high-level design processes. The proposed method, which is fully tool independent, is not only an enabler to highspeed assertion-Assisted simulation but can also be used in other scenarios that need assertion synthesis, as it has the minimum possible effect on the main design's performance.

  • 9.
    Riazati, Mohammad
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems. Mälardalens högskola , Vasteras, Sweden.
    Ghasempouri, T.
    Tallinna Tehnikaülikool, Tallinn, Estonia.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Raik, J.
    Tallinna Tehnikaülikool, Tallinn, Estonia.
    Sjodin, M.
    Mälardalens högskola , Vasteras, Sweden.
    Lisper, Björn
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Adjustable self-healing methodology for accelerated functions in heterogeneous systems2020In: Proceedings - Euromicro Conference on Digital System Design, DSD 2020, Institute of Electrical and Electronics Engineers Inc. , 2020, p. 638-645, article id 9217868Conference paper (Refereed)
    Abstract [en]

    Self-healing is a promising approach for designing reliable digital systems. It refers to the ability of a system to detect faults and automatically fixing them to avoid total failure. With the development of digital systems, heterogeneous systems, in which some parts of the system are executed on the programmable logic, and some other parts run on the processing elements (CPU), are becoming more prevalent. In this work, we propose an adjustable self-healing method that is applicable to heterogeneous systems with accelerated functions and enables the designers to add the self-healing feature to the design. In this method, by manipulating the software codes that are being executed on the processing element, we add the ability to verify the accelerated functions on the programmable logic and heal the possible failures to the system. This is done not only in a straightforward manner but also without being forced to choose a specific reliability-overhead point. The designer will have the option to select the optimum configuration for a desired reliability level. Experimental results on a large design including several accelerated functions are provided and show 42% improvement of reliability by having 27% overhead, as an example of the reliability-overhead point. 

  • 10.
    Taheri, M.
    et al.
    Tallinn University of Technology, Tallinn, Estonia.
    Riazati, Mohammad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Ahmadilivani, M. H.
    Tallinn University of Technology, Tallinn, Estonia.
    Jenihhin, M.
    Tallinn University of Technology, Tallinn, Estonia.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Raik, J.
    Tallinn University of Technology, Tallinn, Estonia.
    Sjodin, M.
    Lisper, Björn
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    DeepAxe: A Framework for Exploration of Approximation and Reliability Trade-offs in DNN Accelerators2023In: Proceedings - International Symposium on Quality Electronic Design, ISQED, IEEE Computer Society , 2023Conference paper (Refereed)
    Abstract [en]

    While the role of Deep Neural Networks (DNNs) in a wide range of safety-critical applications is expanding, emerging DNNs experience massive growth in terms of computation power. It raises the necessity of improving the reliability of DNN accelerators yet reducing the computational burden on the hardware platforms, i.e. reducing the energy consumption and execution time as well as increasing the efficiency of DNN accelerators. Therefore, the trade-off between hardware performance, i.e. area, power and delay, and the reliability of the DNN accelerator implementation becomes critical and requires tools for analysis.In this paper, we propose a framework DeepAxe for design space exploration for FPGA-based implementation of DNNs by considering the trilateral impact of applying functional approximation on accuracy, reliability and hardware performance. The framework enables selective approximation of reliability-critical DNNs, providing a set of Pareto-optimal DNN implementation design space points for the target resource utilization requirements. The design flow starts with a pre-trained network in Keras, uses an innovative high-level synthesis environment DeepHLS and results in a set of Pareto-optimal design space points as a guide for the designer. The framework is demonstrated on a case study of custom and state-of-the-art DNNs and datasets. 

1 - 10 of 10
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf