https://www.mdu.se/

mdu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
DeepMaker: Customizing the Architecture of Convolutional Neural Networks for Resource-Constrained Platforms
Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems. MDH. (Heterogeneous systems - hardware software co-design)ORCID iD: 0000-0002-9704-7117
2020 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

Convolutional Neural Networks (CNNs) suffer from energy-hungry implementation due to requiring huge amounts of computations and significant memory consumption. This problem will be more highlighted by the proliferation of CNNs on resource-constrained platforms in, e.g., embedded systems. In this thesis, we focus on decreasing the computational cost of CNNs in order to be appropriate for resource-constrained platforms. The thesis work proposes two distinct methods to tackle the challenges: optimizing CNN architecture while considering network accuracy and network complexity, and proposing an optimized ternary neural network to compensate the accuracy loss of network quantization methods. We evaluated the impact of our solutions on Commercial-Off-The-Shelf (COTS) platforms where the results show considerable improvement in network accuracy and energy efficiency.

Abstract [sv]

Convolutional Neural Networks (CNNs) lider av energihungriga implementationer på grund av att de kräver enorm beräkningskapacitet och har en betydande minneskonsumtion. Detta problem kommer att framhävas mer när allt fler CNN implementeras på resursbegränsade plattformar i inbyggda datorsystem. I denna uppsats fokuserar vi på att minska resursåtgången för CNN, i termer av behövda beräkningar och behövt minne, för att vara lämplig för resursbegränsade plattformar. Vi föreslår två metoder för att hantera utmaningarna; optimera CNN-arkitektur där man balanserar nätverksnoggrannhet och nätverkskomplexitet, och föreslår ett optimerat ternärt neuralt nätverk för att kompensera noggrannhetsförluster som kan uppstå vid nätverkskvantiseringsmetoder. Vi utvärderade effekterna av våra lösningar på kommersiellt använda plattformar (COTS) där resultaten visar betydande förbättringar i nätverksnoggrannhet och energieffektivitet.

Place, publisher, year, edition, pages
Västerås: Mälardalen University , 2020.
Series
Mälardalen University Press Licentiate Theses, ISSN 1651-9256 ; 299
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:mdh:diva-52113ISBN: 978-91-7485-490-9 (print)OAI: oai:DiVA.org:mdh-52113DiVA, id: diva2:1484552
Presentation
2020-12-04, U2-024 (+ Online/Zoom), Mälardalens högskola, Västerås, 11:30 (English)
Opponent
Supervisors
Projects
DeepMaker: Deep Learning Accelerator on Commercial Programmable DevicesDPAC - Dependable Platforms for Autonomous systems and ControlFAST-ARTS: Fast and Sustainable Analysis Techniques for Advanced Real-Time SystemsAvailable from: 2020-11-10 Created: 2020-10-29 Last updated: 2020-11-13Bibliographically approved
List of papers
1. Designing Compact Convolutional Neural Network for Embedded Stereo Vision Systems
Open this publication in new window or tab >>Designing Compact Convolutional Neural Network for Embedded Stereo Vision Systems
Show others...
2018 (English)In: IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip MCSoC-2018, 2018, p. 244-251, article id 8540240Conference paper, Published paper (Refereed)
National Category
Engineering and Technology
Identifiers
urn:nbn:se:mdh:diva-40892 (URN)10.1109/MCSoC2018.2018.00049 (DOI)000519938300035 ()2-s2.0-85059750226 (Scopus ID)9781538666890 (ISBN)
Conference
IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip MCSoC-2018, 12 Sep 2018, Hanoi, Vietnam
Projects
DPAC - Dependable Platforms for Autonomous systems and ControlDeepMaker: Deep Learning Accelerator on Commercial Programmable Devices
Available from: 2018-09-18 Created: 2018-09-18 Last updated: 2022-11-08Bibliographically approved
2. NeuroPower: Designing Energy Efficient Convolutional Neural Network Architecture for Embedded Systems
Open this publication in new window or tab >>NeuroPower: Designing Energy Efficient Convolutional Neural Network Architecture for Embedded Systems
Show others...
2019 (English)In: Lecture Notes in Computer Science, Volume 11727, Munich, Germany: Springer , 2019, p. 208-222Conference paper, Published paper (Refereed)
Abstract [en]

Convolutional Neural Networks (CNNs) suffer from energy-hungry implementation due to their computation and memory intensive processing patterns. This problem is even more significant by the proliferation of CNNs on embedded platforms. To overcome this problem, we offer NeuroPower as an automatic framework that designs a highly optimized and energy efficient set of CNN architectures for embedded systems. NeuroPower explores and prunes the design space to find improved set of neural architectures. Toward this aim, a multi-objective optimization strategy is integrated to solve Neural Architecture Search (NAS) problem by near-optimal tuning network hyperparameters. The main objectives of the optimization algorithm are network accuracy and number of parameters in the network. The evaluation results show the effectiveness of NeuroPower on energy consumption, compacting rate and inference time compared to other cutting-edge approaches. In comparison with the best results on CIFAR-10/CIFAR-100 datasets, a generated network by NeuroPower presents up to 2.1x/1.56x compression rate, 1.59x/3.46x speedup and 1.52x/1.82x power saving while loses 2.4%/-0.6% accuracy, respectively.

Place, publisher, year, edition, pages
Munich, Germany: Springer, 2019
Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; 11727
Keywords
Convolutional neural networks (CNNs), Neural Architecture Search (NAS), Embedded Systems, Multi-Objective Optimization
National Category
Engineering and Technology Computer Systems
Identifiers
urn:nbn:se:mdh:diva-45043 (URN)10.1007/978-3-030-30487-4_17 (DOI)000546494000017 ()2-s2.0-85072863572 (Scopus ID)9783030304867 (ISBN)
Conference
The 28th International Conference on Artificial Neural Networks ICANN 2019, 17 Sep 2019, Munich, Germany
Projects
DPAC - Dependable Platforms for Autonomous systems and ControlDeepMaker: Deep Learning Accelerator on Commercial Programmable Devices
Available from: 2019-08-23 Created: 2019-08-23 Last updated: 2022-11-25Bibliographically approved
3. DeepMaker: A multi-objective optimization framework for deep neural networks in embedded systems
Open this publication in new window or tab >>DeepMaker: A multi-objective optimization framework for deep neural networks in embedded systems
Show others...
2020 (English)In: Microprocessors and microsystems, ISSN 0141-9331, E-ISSN 1872-9436, Vol. 73, article id 102989Article in journal (Refereed) Published
Abstract [en]

Deep Neural Networks (DNNs) are compute-intensive learning models with growing applicability in a wide range of domains. Due to their computational complexity, DNNs benefit from implementations that utilize custom hardware accelerators to meet performance and response time as well as classification accuracy constraints. In this paper, we propose DeepMaker framework that aims to automatically design a set of highly robust DNN architectures for embedded devices as the closest processing unit to the sensors. DeepMaker explores and prunes the design space to find improved neural architectures. Our proposed framework takes advantage of a multi-objective evolutionary approach that exploits a pruned design space inspired by a dense architecture. DeepMaker considers the accuracy along with the network size factor as two objectives to build a highly optimized network fitting with limited computational resource budgets while delivers an acceptable accuracy level. In comparison with the best result on the CIFAR-10 dataset, a generated network by DeepMaker presents up to a 26.4x compression rate while loses only 4% accuracy. Besides, DeepMaker maps the generated CNN on the programmable commodity devices, including ARM Processor, High-Performance CPU, GPU, and FPGA. 

Place, publisher, year, edition, pages
Elsevier B.V., 2020
National Category
Computer Systems
Identifiers
urn:nbn:se:mdh:diva-46792 (URN)10.1016/j.micpro.2020.102989 (DOI)000520940000032 ()2-s2.0-85077516447 (Scopus ID)
Available from: 2020-01-23 Created: 2020-01-23 Last updated: 2022-11-25Bibliographically approved
4. TOT-Net: An Endeavor Toward Optimizing Ternary Neural Networks
Open this publication in new window or tab >>TOT-Net: An Endeavor Toward Optimizing Ternary Neural Networks
Show others...
2019 (English)In: 22nd Euromicro Conference on Digital System Design DSD 2019, 2019, p. 305-312, article id 8875067Conference paper, Published paper (Refereed)
Abstract [en]

High computation demands and big memory resources are the major implementation challenges of Convolutional Neural Networks (CNNs) especially for low-power and resource-limited embedded devices. Many binarized neural networks are recently proposed to address these issues. Although they have significantly decreased computation and memory footprint, they have suffered from accuracy loss especially for large datasets. In this paper, we propose TOT-Net, a ternarized neural network with [-1, 0, 1] values for both weights and activation functions that has simultaneously achieved a higher level of accuracy and less computational load. In fact, first, TOT-Net introduces a simple bitwise logic for convolution computations to reduce the cost of multiply operations. To improve the accuracy, selecting proper activation function and learning rate are influential, but also difficult. As the second contribution, we propose a novel piece-wise activation function, and optimized learning rate for different datasets. Our findings first reveal that 0.01 is a preferable learning rate for the studied datasets. Third, by using an evolutionary optimization approach, we found novel piece-wise activation functions customized for TOT-Net. According to the experimental results, TOT-Net achieves 2.15%, 8.77%, and 5.7/5.52% better accuracy compared to XNOR-Net on CIFAR-10, CIFAR-100, and ImageNet top-5/top-1 datasets, respectively.

Keywords
convolutional neural networks, ternary neural network, activation function, optimization
National Category
Engineering and Technology Computer Systems
Identifiers
urn:nbn:se:mdh:diva-45042 (URN)10.1109/DSD.2019.00052 (DOI)000722275400043 ()2-s2.0-85074915397 (Scopus ID)
Conference
22nd Euromicro Conference on Digital System Design DSD 2019, 28 Aug 2019, Chalkidiki, Greece
Projects
DPAC - Dependable Platforms for Autonomous systems and ControlDeepMaker: Deep Learning Accelerator on Commercial Programmable Devices
Available from: 2019-08-23 Created: 2019-08-23 Last updated: 2022-11-08Bibliographically approved
5. DenseDisp: Resource-Aware Disparity Map Estimation by Compressing Siamese Neural Architecture
Open this publication in new window or tab >>DenseDisp: Resource-Aware Disparity Map Estimation by Compressing Siamese Neural Architecture
Show others...
2020 (English)In: IEEE WORLD CONGRESS ON COMPUTATIONAL INTELLIGENCE (WCCI) 2020 IEEE WCCI, Glasgow, United Kingdom, 2020Conference paper, Published paper (Refereed)
Abstract [en]

Stereo vision cameras are flexible sensors due to providing heterogeneous information such as color, luminance, disparity map (depth), and shape of the objects. Today, Convolutional Neural Networks (CNNs) present the highest accuracy for the disparity map estimation [1]. However, CNNs require considerable computing capacity to process billions of floating-point operations in a real-time fashion. Besides, commercial stereo cameras produce huge size images (e.g., 10 Megapixels [2]), which impose a new computational cost to the system. The problem will be pronounced if we target resource-limited hardware for the implementation. In this paper, we propose DenseDisp, an automatic framework that designs a Siamese neural architecture for disparity map estimation in a reasonable time. DenseDisp leverages a meta-heuristic multi-objective exploration to discover hardware-friendly architectures by considering accuracy and network FLOPS as the optimization objectives. We explore the design space with four different fitness functions to improve the accuracy-FLOPS trade-off and convergency time of the DenseDisp. According to the experimental results, DenseDisp provides up to 39.1x compression rate while losing around 5% accuracy compared to the state-of-the-art results.

Place, publisher, year, edition, pages
Glasgow, United Kingdom: , 2020
Keywords
Stereo Vision, Deep Learning, Multi-Objective, Optimization, Neural Architecture Search
National Category
Engineering and Technology Computer Systems
Identifiers
urn:nbn:se:mdh:diva-49331 (URN)000703998201002 ()2-s2.0-85055448153 (Scopus ID)
Conference
IEEE WORLD CONGRESS ON COMPUTATIONAL INTELLIGENCE (WCCI) 2020 IEEE WCCI, 19 Jul 2020, Glasgow, United Kingdom
Projects
DeepMaker: Deep Learning Accelerator on Commercial Programmable Devices
Available from: 2020-07-09 Created: 2020-07-09 Last updated: 2023-05-10Bibliographically approved

Open Access in DiVA

fulltext(2748 kB)378 downloads
File information
File name FULLTEXT02.pdfFile size 2748 kBChecksum SHA-512
ddea802ff14e3a780cf2f84b11837b66cf3a4bd109b04bdadc86e7c84059cdc582a8112878a3f3ad25bf84f5e3f530fd90300c3f305671025a9c3d627a01b1d6
Type fulltextMimetype application/pdf

Authority records

Loni, Mohammad

Search in DiVA

By author/editor
Loni, Mohammad
By organisation
Embedded Systems
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 379 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1071 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf