mdh.sePublications
Change search
Link to record
Permanent link

Direct link
BETA
Daneshtalab, Masoud
Publications (10 of 32) Show all publications
Loni, M., Sinaei, S., Zoljodi, A., Daneshtalab, M. & Sjödin, M. (2020). DeepMaker: A multi-objective optimization framework for deep neural networks in embedded systems. Microprocessors and microsystems, 73, Article ID 102989.
Open this publication in new window or tab >>DeepMaker: A multi-objective optimization framework for deep neural networks in embedded systems
Show others...
2020 (English)In: Microprocessors and microsystems, ISSN 0141-9331, E-ISSN 1872-9436, Vol. 73, article id 102989Article in journal (Refereed) Published
Abstract [en]

Deep Neural Networks (DNNs) are compute-intensive learning models with growing applicability in a wide range of domains. Due to their computational complexity, DNNs benefit from implementations that utilize custom hardware accelerators to meet performance and response time as well as classification accuracy constraints. In this paper, we propose DeepMaker framework that aims to automatically design a set of highly robust DNN architectures for embedded devices as the closest processing unit to the sensors. DeepMaker explores and prunes the design space to find improved neural architectures. Our proposed framework takes advantage of a multi-objective evolutionary approach that exploits a pruned design space inspired by a dense architecture. DeepMaker considers the accuracy along with the network size factor as two objectives to build a highly optimized network fitting with limited computational resource budgets while delivers an acceptable accuracy level. In comparison with the best result on the CIFAR-10 dataset, a generated network by DeepMaker presents up to a 26.4x compression rate while loses only 4% accuracy. Besides, DeepMaker maps the generated CNN on the programmable commodity devices, including ARM Processor, High-Performance CPU, GPU, and FPGA. 

Place, publisher, year, edition, pages
Elsevier B.V., 2020
National Category
Computer Systems
Identifiers
urn:nbn:se:mdh:diva-46792 (URN)10.1016/j.micpro.2020.102989 (DOI)2-s2.0-85077516447 (Scopus ID)
Available from: 2020-01-23 Created: 2020-01-23 Last updated: 2020-01-23Bibliographically approved
Mahdiani, H., Khadem, A., Ghanbari, A., Modarressi, M., Fattahi-Bayat, F. & Daneshtalab, M. (2020). ΔnN: Power-Efficient Neural Network Acceleration Using Differential Weights. IEEE Micro, 40(1), 67-74
Open this publication in new window or tab >>ΔnN: Power-Efficient Neural Network Acceleration Using Differential Weights
Show others...
2020 (English)In: IEEE Micro, ISSN 0272-1732, E-ISSN 1937-4143, Vol. 40, no 1, p. 67-74Article in journal (Refereed) Published
Abstract [en]

The enormous and ever-increasing complexity of state-of-the-art neural networks has impeded the deployment of deep learning on resource-limited embedded and mobile devices. To reduce the complexity of neural networks, this article presents Delta NN, a power-efficient architecture that leverages a combination of the approximate value locality of neuron weights and algorithmic structure of neural networks. Delta NN keeps each weight as its difference (Delta) to the nearest smaller weight: each weight reuses the calculations of the smaller weight, followed by a calculation on the Delta value to make up the difference. We also round up/down the Delta to the closest power of two numbers to further reduce complexity. The experimental results show that Delta NN boosts the average performance by 14%-37% and reduces the average power consumption by 17%-49% over some state-of-the-art neural network designs.

Place, publisher, year, edition, pages
IEEE COMPUTER SOC, 2020
National Category
Communication Systems
Identifiers
urn:nbn:se:mdh:diva-47029 (URN)10.1109/MM.2019.2948345 (DOI)000508573000010 ()2-s2.0-85073748116 (Scopus ID)
Available from: 2020-02-13 Created: 2020-02-13 Last updated: 2020-02-20Bibliographically approved
Majd, A., Loni, M., Sahebi, G., Daneshtalab, M. & Troubitsyna, E. (2019). A Cloud Based Super-Optimization Method to Parallelize the Sequential Code’s Nested Loops. In: IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip MCSoC-2019: . Paper presented at IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip MCSoC-2019, 01 Oct 2019, Singapore, Sweden.
Open this publication in new window or tab >>A Cloud Based Super-Optimization Method to Parallelize the Sequential Code’s Nested Loops
Show others...
2019 (English)In: IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip MCSoC-2019, 2019Conference paper, Published paper (Refereed)
Abstract [en]

Advances in hardware architecture regarding multi-core processors make parallel computing ubiquitous. To achieve the maximum utilization of multi-core processors, parallel programming techniques are required. However, there are several challenges standing in front of parallel programming. These problems are mainly divided into three major groups. First, although recent advancements in parallel programming languages (e.g. MPI, OpenCL, etc.) assist developers, still parallel programming is not desirable for most programmers. The second one belongs to the massive volume of old software and applications, which have been written in serial mode. However, converting millions of line of serial codes to parallel codes is highly time-consuming and requiring huge verification effort. Third, the production of software and applications in parallel mode is very expensive since it needs knowledge and expertise. Super-optimization provided by super compilers is the process of automatically determine the dependent and independent instructions to find any data dependency and loop-free sequence of instructions. Super compiler then runs these instructions on different processors in the parallel mode, if it is possible. Super-optimization is a feasible solution for helping the programmer to get relaxed from parallel programming workload. Since the most complexity of the sequential codes is in the nested loops, we try to parallelize the nested loops by using the idea of super-optimization. One of the underlying stages in the super-optimization is scheduling tiled space for iterating nested loops. Since the problem is NP-Hard, using the traditional optimization methods are not feasible. In this paper, we propose a cloud-based super-optimization method as Software-as-a-Service (SaaS) to reduce the cost of parallel programming. In addition, it increases the utilization of the processing capacity of the multi-core processor. As the result, an intermediate programmer can use the whole processing capacity of his/her system without knowing anything about writing parallel codes or super compiler functions by sending the serial code to a cloud server and receiving the parallel version of the code from the cloud server. In this paper, an evolutionary algorithm is leveraged to solve the scheduling problem of tiles. Our proposed super-optimization method will serve as software and provided as a hybrid (public and private) deployment model.

National Category
Engineering and Technology Computer Systems
Identifiers
urn:nbn:se:mdh:diva-45148 (URN)10.1109/MCSoC.2019.00047 (DOI)2-s2.0-85076164097 (Scopus ID)9781728148823 (ISBN)
Conference
IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip MCSoC-2019, 01 Oct 2019, Singapore, Sweden
Projects
DeepMaker: Deep Learning Accelerator on Commercial Programmable Devices
Available from: 2019-09-05 Created: 2019-09-05 Last updated: 2020-02-20Bibliographically approved
Kakakhel, S. R., Westerlund, T., Daneshtalab, M., Zou, Z., Plosila, J. & Tenhunen, H. (2019). A qualitative comparison model for application layer IoT protocols. In: 2019 4th International Conference on Fog and Mobile Edge Computing, FMEC 2019: . Paper presented at 2019 4th International Conference on Fog and Mobile Edge Computing, FMEC 2019 (pp. 210-215). Institute of Electrical and Electronics Engineers Inc.
Open this publication in new window or tab >>A qualitative comparison model for application layer IoT protocols
Show others...
2019 (English)In: 2019 4th International Conference on Fog and Mobile Edge Computing, FMEC 2019, Institute of Electrical and Electronics Engineers Inc. , 2019, p. 210-215Conference paper, Published paper (Refereed)
Abstract [en]

Protocols enable things to connect and communicate, thus making the Internet of Things possible. The performance aspect of the Internet of Things protocols, vital to its widespread utilization, have received much attention. However, one aspect of IoT protocols, essential to its adoption in the real world, is a protocols' feature set. Comparative analysis based on competing features and properties are rarely if ever, discussed in the literature. In this paper, we define 19 attributes in 5 categories that are essential for IoT stakeholders to consider. These attributes are then used to contrast four IoT protocols, MQTT, HTTP, CoAP and XMPP. Furthermore, we discuss scenarios where an assessment based on comparative strengths and weaknesses would be beneficial. The provided comparison model can be easily extended to include protocols like MQTT-SN, AMQP and DDS. 

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc., 2019
Keywords
CoAP, HTTP, IoT Protocols, MQTT, qualitative comparison, XMPP
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:mdh:diva-45367 (URN)10.1109/FMEC.2019.8795324 (DOI)000503441300031 ()2-s2.0-85071699598 (Scopus ID)9781728117966 (ISBN)
Conference
2019 4th International Conference on Fog and Mobile Edge Computing, FMEC 2019
Available from: 2019-10-03 Created: 2019-10-03 Last updated: 2020-01-16Bibliographically approved
Yazdanpanah, F., AfsharMazayejani, R., Alaei, M., Rezaei, A. & Daneshtalab, M. (2019). An energy-efficient partition-based XYZ-planar routing algorithm for a wireless network-on-chip. Journal of Supercomputing, 75(2), 837-861
Open this publication in new window or tab >>An energy-efficient partition-based XYZ-planar routing algorithm for a wireless network-on-chip
Show others...
2019 (English)In: Journal of Supercomputing, ISSN 0920-8542, E-ISSN 1573-0484, Vol. 75, no 2, p. 837-861Article in journal (Refereed) Published
Abstract [en]

In the current many-core architectures, network-on-chips (NoCs) have been efficiently utilized as communication backbones for enabling massive parallelism and high degree of integration on a chip. In spite of the advantages of conventional NoCs, wired multi-hop links impose limitations on their performance by long delay and much power consumption especially in large systems. To overcome these limitations, different solutions such as using wireless interconnections have been proposed. Utilizing long-range, high bandwidth and low power wireless links can lead to solve the problems corresponding to wired links. Meanwhile, the grid-like mesh is the most stable topology in conventional NoC designs. That is why most of the wireless network-on-chip (WNoC) architectures have been designed based on this topology. The goals of this article are to challenge mesh topology and to demonstrate the efficiency of honeycomb-based WNoC architectures. In this article, we propose HoneyWiN, hybrid wired/wireless NoC architecture with honeycomb topology. Also, a partition-based XYZ-planar routing algorithm for energy conservation is proposed. In order to demonstrate the advantages of the proposed architecture, first, an analytical comparison of HoneyWiN with a mesh-based WNoC, as the baseline architecture, is carried out. In order to compare the proposed architecture, we implement our partition-based routing algorithm in the form of 2-axes coordinate system in the baseline architecture. Simulation results show that HoneyWiN reduces about 17% of energy consumption while increases the throughput by 10% compared to the mesh-based WNoC. Then, HoneyWiN is compared with four state-of-the-art mesh-based NoC architectures. In all of the evaluations, HoneyWiN provides higher performance in term of delay, throughput and energy consumption. Overall, the results indicate that HoneyWiN is very effective in improving throughput, increasing speed and reducing energy consumption.

Place, publisher, year, edition, pages
SPRINGER, 2019
Keywords
Wireless NoC, Honeycomb topology, XYZ-planar routing, Partitioning
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:mdh:diva-42949 (URN)10.1007/s11227-018-2617-x (DOI)000460063500019 ()2-s2.0-85053846712 (Scopus ID)
Available from: 2019-03-22 Created: 2019-03-22 Last updated: 2019-07-01Bibliographically approved
Loni, M., Hamouachy, F., Casarrubios, C., Daneshtalab, M. & Nolin, M. (2019). AutoRIO: An Indoor Testbed for Developing Autonomous Vehicles. In: International Japan-Africa Conference on Electronics, Communications and Computations JAC-ECC: . Paper presented at International Japan-Africa Conference on Electronics, Communications and Computations JAC-ECC, 16 Dec 2018, Alexandria, Egypt (pp. 69-72).
Open this publication in new window or tab >>AutoRIO: An Indoor Testbed for Developing Autonomous Vehicles
Show others...
2019 (English)In: International Japan-Africa Conference on Electronics, Communications and Computations JAC-ECC, 2019, p. 69-72Conference paper, Published paper (Refereed)
Abstract [en]

Autonomous vehicles have a great influence on our life. These vehicles are more convenient, more energy efficient providing higher safety level and cheaper driving solutions. In addition, decreasing the generation of CO 2 , and the risk vehicular accidents are other benefits of autonomous vehicles. However, leveraging a full autonomous system is challenging and the proposed solutions are newfound. Providing a testbed for evaluating new algorithms is beneficial for researchers and hardware developers to verify the real impact of their solutions. The existence of testing environment is a low-cost infrastructure leading to increase the time-to-market of novel ideas. In this paper, we propose Auto Rio, a cutting-edge indoor testbed for developing autonomous vehicles.

National Category
Engineering and Technology Computer Systems
Identifiers
urn:nbn:se:mdh:diva-42236 (URN)10.1109/JEC-ECC.2018.8679543 (DOI)000465120800017 ()2-s2.0-85064611063 (Scopus ID)9781538692301 (ISBN)
Conference
International Japan-Africa Conference on Electronics, Communications and Computations JAC-ECC, 16 Dec 2018, Alexandria, Egypt
Projects
DPAC - Dependable Platforms for Autonomous systems and Control
Available from: 2018-12-28 Created: 2018-12-28 Last updated: 2019-05-09Bibliographically approved
Baloch, N. K., Baig, M. I. & Daneshtalab, M. (2019). Defender: A Low Overhead and Efficient Fault-Tolerant Mechanism for Reliable on-Chip Router. IEEE Access, 7, 142843-142854
Open this publication in new window or tab >>Defender: A Low Overhead and Efficient Fault-Tolerant Mechanism for Reliable on-Chip Router
2019 (English)In: IEEE Access, E-ISSN 2169-3536, Vol. 7, p. 142843-142854Article in journal (Refereed) Published
Abstract [en]

The ever-shrinking size of a transistor has made Network on Chip (NoC) susceptible to faults. A single error in the NoC can disrupt the entire communication. In this paper, we introduce Defender, a fault-tolerant router architecture, that is capable of tolerating permanent faults in all the parts of the router. We intend to employ structural modifications in baseline router design to achieve fault tolerance. In Defender we provide the fault tolerance to the input ports and routing computation unit by grouping the neighboring ports together. Default winner strategy is used to provide fault resilience to the virtual channel arbiters and switch allocators. Multiple routes are provided to the crossbar to tolerate the faults. Defender provides improved fault tolerance to all stages of routers as compared to the currently prevailing fault tolerant router architectures. Reliability analysis using silicon protection factor (SPF) and Mean Time to Failure (MTTF) metrics confirms that our proposed design Defender is 10.78 times more reliable than baseline unprotected router and then the current state of the art architectures.

Place, publisher, year, edition, pages
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 2019
Keywords
Circuit faults, Fault tolerance, Fault tolerant systems, Routing, Computer architecture, Switches, Network-on-Chip, router architecture, permanent fault tolerance, silicon protection factor, mean time to failure
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:mdh:diva-46346 (URN)10.1109/ACCESS.2019.2944490 (DOI)000497156000218 ()2-s2.0-85077807614 (Scopus ID)
Available from: 2019-12-13 Created: 2019-12-13 Last updated: 2020-01-23Bibliographically approved
Ghaderi, A., Daneshtalab, M., Ashjaei, S. M., Loni, M., Mubeen, S. & Sjödin, M. (2019). Design challenges in hardware development of time-sensitive networking: A research plan. In: CEUR Workshop Proceedings, Volume 2457: . Paper presented at 2019 Cyber-Physical Systems PhD Workshop, CPSWS 2019; Alghero; Italy; 23 September 2019. CEUR-WS, 2457
Open this publication in new window or tab >>Design challenges in hardware development of time-sensitive networking: A research plan
Show others...
2019 (English)In: CEUR Workshop Proceedings, Volume 2457, CEUR-WS , 2019, Vol. 2457Conference paper, Published paper (Refereed)
Abstract [en]

Time-Sensitive Networking (TSN) is a set of ongoing projects within the IEEE standardization to guarantee timeliness and low-latency communication based on switched Ethernet for industrial applications. The huge demand is mainly coming from industries where intensive data transmission is required, such as in the modern vehicles where cameras, lidars and high-bandwidth modern sensors are connected. The TSN standards are evolving over time, hence the hardware needs to change depending upon the modifications. In addition, high performance hardware is required to obtain a full benefit from the standards. In this paper, we present a research plan for developing novel techniques to support a parameterized and modular hardware IP core of the multi-stage TSN switch fabric in VHSIC (Very High Speed Integrated Circuit) Hardware Description Language (VHDL), which can be deployed in any Field-Programmable-Gate-Array (FPGA) devices. We present the challenges on the way towards the mentioned goal. 

Place, publisher, year, edition, pages
CEUR-WS, 2019
Series
CEUR Workshop Proceedings, ISSN 1613-0073 ; 2457
Keywords
FPGA, Memory management, Predictability, Time-sensitive network, Cyber Physical System, Embedded systems, Field programmable gate arrays (FPGA), Integrated circuit design, Vehicle transmissions, Design challenges, Hardware development, High-performance hardware, Low-latency communication, Switched ethernet, Very high speed integrated circuits, Computer hardware description languages
National Category
Computer Engineering Embedded Systems
Identifiers
urn:nbn:se:mdh:diva-45837 (URN)2-s2.0-85073187187 (Scopus ID)
Conference
2019 Cyber-Physical Systems PhD Workshop, CPSWS 2019; Alghero; Italy; 23 September 2019
Available from: 2019-10-25 Created: 2019-10-25 Last updated: 2019-12-18Bibliographically approved
Salimi, M., Majd, A., Loni, M., Seceleanu, T., Seceleanu, C., Sirjani, M., . . . Troubitsyna, E. (2019). Multi-objective optimization of real-time task scheduling problem for distributed environments. In: ACM International Conference Proceeding Series: . Paper presented at 6th Conference on the Engineering of Computer-Based Systems, ECBS 2019, 2 September 2019 through 3 September 2019. Association for Computing Machinery, Article ID a13.
Open this publication in new window or tab >>Multi-objective optimization of real-time task scheduling problem for distributed environments
Show others...
2019 (English)In: ACM International Conference Proceeding Series, Association for Computing Machinery , 2019, article id a13Conference paper, Published paper (Refereed)
Abstract [en]

Real-world applications are composed of multiple tasks which usually have intricate data dependencies. To exploit distributed processing platforms, task allocation and scheduling, that is assigning tasks to processing units and ordering inter-processing unit data transfers, plays a vital role. However, optimally scheduling tasks on processing units and finding an optimized network topology is an NP-complete problem. The problem becomes more complicated when the tasks have real-time deadlines for termination. Exploring the whole search space in order to find the optimal solution is not feasible in a reasonable amount of time, therefore meta-heuristics are often used to find a near-optimal solution. We propose here a multi-population evolutionary approach for near-optimal scheduling optimization, that guarantees end-to-end deadlines of tasks in distributed processing environments. We analyze two different exploration scenarios including single and multi-objective exploration. The main goal of the single objective exploration algorithm is to achieve the minimal number of processing units for all the tasks, whereas a multi-objective optimization tries to optimize two conflicting objectives simultaneously considering the total number of processing units and end-to-end finishing time for all the jobs. The potential of the proposed approach is demonstrated by experiments based on a use case for mapping a number of jobs covering industrial automation systems, where each of the jobs consists of a number of tasks in a distributed environment.

Place, publisher, year, edition, pages
Association for Computing Machinery, 2019
Keywords
Distributed Task Scheduling, Evolutionary Computing, Multi-Objective Optimization, Real-Time Processing, Automation, Computational complexity, Data handling, Data transfer, Finishing, Image coding, Job shop scheduling, Multitasking, Optimal systems, Scheduling, Scheduling algorithms, Conflicting objectives, Distributed environments, Distributed processing, Distributed tasks, Industrial automation system, Realtime processing, Task allocation and scheduling, Multiobjective optimization
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:mdh:diva-46527 (URN)10.1145/3352700.3352713 (DOI)2-s2.0-85075887884 (Scopus ID)9781450376365 (ISBN)
Conference
6th Conference on the Engineering of Computer-Based Systems, ECBS 2019, 2 September 2019 through 3 September 2019
Available from: 2019-12-17 Created: 2019-12-17 Last updated: 2019-12-19Bibliographically approved
Loni, M., Zoljodi, A., Seenan, S., Daneshtalab, M. & Nolin, M. (2019). NeuroPower: Designing Energy Efficient Convolutional Neural Network Architecture for Embedded Systems. In: Lecture Notes in Computer Science, Volume 11727: . Paper presented at The 28th International Conference on Artificial Neural Networks ICANN 2019, 17 Sep 2019, Munich, Germany (pp. 208-222). Munich, Germany: Springer
Open this publication in new window or tab >>NeuroPower: Designing Energy Efficient Convolutional Neural Network Architecture for Embedded Systems
Show others...
2019 (English)In: Lecture Notes in Computer Science, Volume 11727, Munich, Germany: Springer , 2019, p. 208-222Conference paper, Published paper (Refereed)
Abstract [en]

Convolutional Neural Networks (CNNs) suffer from energy-hungry implementation due to their computation and memory intensive processing patterns. This problem is even more significant by the proliferation of CNNs on embedded platforms. To overcome this problem, we offer NeuroPower as an automatic framework that designs a highly optimized and energy efficient set of CNN architectures for embedded systems. NeuroPower explores and prunes the design space to find improved set of neural architectures. Toward this aim, a multi-objective optimization strategy is integrated to solve Neural Architecture Search (NAS) problem by near-optimal tuning network hyperparameters. The main objectives of the optimization algorithm are network accuracy and number of parameters in the network. The evaluation results show the effectiveness of NeuroPower on energy consumption, compacting rate and inference time compared to other cutting-edge approaches. In comparison with the best results on CIFAR-10/CIFAR-100 datasets, a generated network by NeuroPower presents up to 2.1x/1.56x compression rate, 1.59x/3.46x speedup and 1.52x/1.82x power saving while loses 2.4%/-0.6% accuracy, respectively.

Place, publisher, year, edition, pages
Munich, Germany: Springer, 2019
Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; 11727
Keywords
Convolutional neural networks (CNNs), Neural Architecture Search (NAS), Embedded Systems, Multi-Objective Optimization
National Category
Engineering and Technology Computer Systems
Identifiers
urn:nbn:se:mdh:diva-45043 (URN)10.1007/978-3-030-30487-4_17 (DOI)2-s2.0-85072863572 (Scopus ID)9783030304867 (ISBN)
Conference
The 28th International Conference on Artificial Neural Networks ICANN 2019, 17 Sep 2019, Munich, Germany
Projects
DPAC - Dependable Platforms for Autonomous systems and ControlDeepMaker: Deep Learning Accelerator on Commercial Programmable Devices
Available from: 2019-08-23 Created: 2019-08-23 Last updated: 2019-10-17Bibliographically approved
Organisations

Search in DiVA

Show all publications