https://www.mdu.se/

mdu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Efficient Design of Scalable Deep Neural Networks for Resource-Constrained Edge Devices
Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems. (Heterogeneous systems - hardware software co-design)ORCID iD: 0000-0002-9704-7117
2022 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Deep Neural Networks (DNNs) are increasingly being processed on resource-constrained edge nodes (computer nodes used in, e.g., cyber-physical systems or at the edge of computational clouds) due to efficiency, connectivity, and privacy concerns. This thesis investigates and presents new techniques to design and deploy DNNs for resource-constrained edge nodes. We have identified two major bottlenecks that hinder the proliferation of DNNs on edge nodes: (i) the significant computational demand for designing DNNs that consumes a low amount of resources in terms of energy, latency, and memory footprint; and (ii) further conserving resources by quantizing the numerical calculations of a DNN provides remarkable accuracy degradation.

To address (i), we present novel methods for cost-efficient Neural Architecture Search (NAS) to automate the design of DNNs that should meet multifaceted goals such as accuracy and hardware performance. To address (ii), we extend our NAS approach to handle the quantization of numerical calculations by using only the numbers -1, 0, and 1 (so-called ternary DNNs), which achieves higher accuracy. Our experimental evaluation shows that the proposed NAS approach can provide a 5.25x reduction in design time and up to 44.4x reduction in network size compared to state-of-the-art methods. In addition, the proposed quantization approach delivers 2.64% higher accuracy and 2.8x memory saving compared to full-precision counterparts with the same bit-width resolution. These benefits are attained over a wide range of commercial-off-the-shelf edge nodes showing this thesis successfully provides seamless deployment of DNNs on resource-constrained edge nodes.

Place, publisher, year, edition, pages
Västerås: Mälardalens universitet, 2022.
Series
Mälardalen University Press Dissertations, ISSN 1651-4238 ; 363
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:mdh:diva-59946ISBN: 978-91-7485-563-0 (print)OAI: oai:DiVA.org:mdh-59946DiVA, id: diva2:1695852
Public defence
2022-10-13, Delta och online, Mälardalens universitet, Västerås, 13:30 (English)
Opponent
Supervisors
Projects
AutoDeep: Automatic Design of Safe, High-Performance and Compact Deep Learning Models for Autonomous VehiclesDPAC - Dependable Platforms for Autonomous systems and ControlAvailable from: 2022-09-15 Created: 2022-09-14 Last updated: 2022-11-08Bibliographically approved
List of papers
1. DeepMaker: A multi-objective optimization framework for deep neural networks in embedded systems
Open this publication in new window or tab >>DeepMaker: A multi-objective optimization framework for deep neural networks in embedded systems
Show others...
2020 (English)In: Microprocessors and microsystems, ISSN 0141-9331, E-ISSN 1872-9436, Vol. 73, article id 102989Article in journal (Refereed) Published
Abstract [en]

Deep Neural Networks (DNNs) are compute-intensive learning models with growing applicability in a wide range of domains. Due to their computational complexity, DNNs benefit from implementations that utilize custom hardware accelerators to meet performance and response time as well as classification accuracy constraints. In this paper, we propose DeepMaker framework that aims to automatically design a set of highly robust DNN architectures for embedded devices as the closest processing unit to the sensors. DeepMaker explores and prunes the design space to find improved neural architectures. Our proposed framework takes advantage of a multi-objective evolutionary approach that exploits a pruned design space inspired by a dense architecture. DeepMaker considers the accuracy along with the network size factor as two objectives to build a highly optimized network fitting with limited computational resource budgets while delivers an acceptable accuracy level. In comparison with the best result on the CIFAR-10 dataset, a generated network by DeepMaker presents up to a 26.4x compression rate while loses only 4% accuracy. Besides, DeepMaker maps the generated CNN on the programmable commodity devices, including ARM Processor, High-Performance CPU, GPU, and FPGA. 

Place, publisher, year, edition, pages
Elsevier B.V., 2020
National Category
Computer Systems
Identifiers
urn:nbn:se:mdh:diva-46792 (URN)10.1016/j.micpro.2020.102989 (DOI)000520940000032 ()2-s2.0-85077516447 (Scopus ID)
Available from: 2020-01-23 Created: 2020-01-23 Last updated: 2022-11-25Bibliographically approved
2. TOT-Net: An Endeavor Toward Optimizing Ternary Neural Networks
Open this publication in new window or tab >>TOT-Net: An Endeavor Toward Optimizing Ternary Neural Networks
Show others...
2019 (English)In: 22nd Euromicro Conference on Digital System Design DSD 2019, 2019, p. 305-312, article id 8875067Conference paper, Published paper (Refereed)
Abstract [en]

High computation demands and big memory resources are the major implementation challenges of Convolutional Neural Networks (CNNs) especially for low-power and resource-limited embedded devices. Many binarized neural networks are recently proposed to address these issues. Although they have significantly decreased computation and memory footprint, they have suffered from accuracy loss especially for large datasets. In this paper, we propose TOT-Net, a ternarized neural network with [-1, 0, 1] values for both weights and activation functions that has simultaneously achieved a higher level of accuracy and less computational load. In fact, first, TOT-Net introduces a simple bitwise logic for convolution computations to reduce the cost of multiply operations. To improve the accuracy, selecting proper activation function and learning rate are influential, but also difficult. As the second contribution, we propose a novel piece-wise activation function, and optimized learning rate for different datasets. Our findings first reveal that 0.01 is a preferable learning rate for the studied datasets. Third, by using an evolutionary optimization approach, we found novel piece-wise activation functions customized for TOT-Net. According to the experimental results, TOT-Net achieves 2.15%, 8.77%, and 5.7/5.52% better accuracy compared to XNOR-Net on CIFAR-10, CIFAR-100, and ImageNet top-5/top-1 datasets, respectively.

Keywords
convolutional neural networks, ternary neural network, activation function, optimization
National Category
Engineering and Technology Computer Systems
Identifiers
urn:nbn:se:mdh:diva-45042 (URN)10.1109/DSD.2019.00052 (DOI)000722275400043 ()2-s2.0-85074915397 (Scopus ID)
Conference
22nd Euromicro Conference on Digital System Design DSD 2019, 28 Aug 2019, Chalkidiki, Greece
Projects
DPAC - Dependable Platforms for Autonomous systems and ControlDeepMaker: Deep Learning Accelerator on Commercial Programmable Devices
Available from: 2019-08-23 Created: 2019-08-23 Last updated: 2022-11-08Bibliographically approved
3. TAS: Ternarized Neural Architecture Search for Resource-Constrained Edge Devices
Open this publication in new window or tab >>TAS: Ternarized Neural Architecture Search for Resource-Constrained Edge Devices
Show others...
2022 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Ternary Neural Networks (TNNs) compress network weights and activation functions into 2-bit representation resulting in remarkable network compression and energy efficiency. However, there remains a significant gap in accuracy between TNNs and full-precision counterparts. Recent advances in Neural Architectures Search (NAS) promise opportunities in automated optimization for various deep learning tasks. Unfortunately, this area is unexplored for optimizing TNNs. This paper proposes TAS, a framework that drastically reduces the accuracy gap between TNNs and their full-precision counterparts by integrating quantization into the network design. We experienced that directly applying NAS to the ternary domain provides accuracy degradation as the search settings are customized for full-precision networks. To address this problem, we propose (i) a new cell template for ternary networks with maximum gradient propagation; and (ii) a novel learnable quantizer that adaptively relaxes the ternarization mechanism from the distribution of the weights and activation functions. Experimental results reveal that TAS delivers 2.64% higher accuracy and 2.8x memory saving over competing methods with the same bit-width resolution on the CIFAR-10 dataset. These results suggest that TAS is an effective method that paves the way for the efficient design of the next generation of quantized neural networks.

Keywords
Quantization, Ternary Neural Network, Neural Architecture Search, Embedded Systems
National Category
Engineering and Technology Computer Systems
Identifiers
urn:nbn:se:mdh:diva-56761 (URN)10.23919/date54114.2022.9774615 (DOI)000819484300207 ()2-s2.0-85130852561 (Scopus ID)978-3-9819263-6-1 (ISBN)
Conference
Design, Automation and Test in Europe ConferenceDesign, Automation and Test in Europe Conference (DATE) 2022, ANTWERP, BELGIUM
Projects
DPAC - Dependable Platforms for Autonomous systems and ControlAutoDeep: Automatic Design of Safe, High-Performance and Compact Deep Learning Models for Autonomous Vehicles
Available from: 2021-12-16 Created: 2021-12-16 Last updated: 2024-01-04Bibliographically approved
4. FastStereoNet: A Fast Neural Architecture Search for Improving the Inference of Disparity Estimation on Resource-Limited Platforms
Open this publication in new window or tab >>FastStereoNet: A Fast Neural Architecture Search for Improving the Inference of Disparity Estimation on Resource-Limited Platforms
Show others...
2022 (English)In: IEEE Transactions on Systems, Man & Cybernetics. Systems, ISSN 2168-2216, E-ISSN 2168-2232, Vol. 52, no 8, p. 5222-5234Article in journal (Refereed) Published
Abstract [en]

Convolutional neural networks (CNNs) provide the best accuracy for disparity estimation. However, CNNs are computationally expensive, making them unfavorable for resource-limited devices with real-time constraints. Recent advances in neural architectures search (NAS) promise opportunities in automated optimization for disparity estimation. However, the main challenge of the NAS methods is the significant amount of computing time to explore a vast search space [e.g., 1.6x10(29)] and costly training candidates. To reduce the NAS computational demand, many proxy-based NAS methods have been proposed. Despite their success, most of them are designed for comparatively small-scale learning tasks. In this article, we propose a fast NAS method, called FastStereoNet, to enable resource-aware NAS within an intractably large search space. FastStereoNet automatically searches for hardware-friendly CNN architectures based on late acceptance hill climbing (LAHC), followed by simulated annealing (SA). FastStereoNet also employs a fine-tuning with a transferred weights mechanism to improve the convergence of the search process. The collection of these ideas provides competitive results in terms of search time and strikes a balance between accuracy and efficiency. Compared to the state of the art, FastStereoNet provides 5.25x reduction in search time and 44.4x reduction in model size. These benefits are attained while yielding a comparable accuracy that enables seamless deployment of disparity estimation on resource-limited devices. Finally, FastStereoNet significantly improves the perception quality of disparity estimation deployed on field-programmable gate array and Intel Neural Compute Stick 2 accelerator in a significantly less onerous manner.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2022
Keywords
Disparity estimation, machine vision, neural architecture search, optimization, transfer learning
National Category
Computer Systems
Identifiers
urn:nbn:se:mdh:diva-56844 (URN)10.1109/TSMC.2021.3123136 (DOI)000732342800001 ()2-s2.0-85120087918 (Scopus ID)
Available from: 2021-12-30 Created: 2021-12-30 Last updated: 2022-11-08Bibliographically approved

Open Access in DiVA

fulltext(17120 kB)662 downloads
File information
File name FULLTEXT02.pdfFile size 17120 kBChecksum SHA-512
4a42720181fc68ecba1fdda8c3d5b7bf63cb1464bf92e3c5f1b6a9e43076f59cbb6550a9c64090f022ed0e252488fa8469a832d0b63a099c8d2e23761ae850bb
Type fulltextMimetype application/pdf

Authority records

Loni, Mohammad

Search in DiVA

By author/editor
Loni, Mohammad
By organisation
Embedded Systems
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 662 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 665 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf