mdh.sePublications
Change search
Link to record
Permanent link

Direct link
BETA
Daneshtalab, Masoud
Publications (10 of 27) Show all publications
Majd, A., Loni, M., Sahebi, G., Daneshtalab, M. & Troubitsyna, E. (2019). A Cloud Based Super-Optimization Method to Parallelize the Sequential Code’s Nested Loops. In: IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip MCSoC-2019: . Paper presented at IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip MCSoC-2019, 01 Oct 2019, Singapore, Sweden.
Open this publication in new window or tab >>A Cloud Based Super-Optimization Method to Parallelize the Sequential Code’s Nested Loops
Show others...
2019 (English)In: IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip MCSoC-2019, 2019Conference paper, Published paper (Refereed)
Abstract [en]

Advances in hardware architecture regarding multi-core processors make parallel computing ubiquitous. To achieve the maximum utilization of multi-core processors, parallel programming techniques are required. However, there are several challenges standing in front of parallel programming. These problems are mainly divided into three major groups. First, although recent advancements in parallel programming languages (e.g. MPI, OpenCL, etc.) assist developers, still parallel programming is not desirable for most programmers. The second one belongs to the massive volume of old software and applications, which have been written in serial mode. However, converting millions of line of serial codes to parallel codes is highly time-consuming and requiring huge verification effort. Third, the production of software and applications in parallel mode is very expensive since it needs knowledge and expertise. Super-optimization provided by super compilers is the process of automatically determine the dependent and independent instructions to find any data dependency and loop-free sequence of instructions. Super compiler then runs these instructions on different processors in the parallel mode, if it is possible. Super-optimization is a feasible solution for helping the programmer to get relaxed from parallel programming workload. Since the most complexity of the sequential codes is in the nested loops, we try to parallelize the nested loops by using the idea of super-optimization. One of the underlying stages in the super-optimization is scheduling tiled space for iterating nested loops. Since the problem is NP-Hard, using the traditional optimization methods are not feasible. In this paper, we propose a cloud-based super-optimization method as Software-as-a-Service (SaaS) to reduce the cost of parallel programming. In addition, it increases the utilization of the processing capacity of the multi-core processor. As the result, an intermediate programmer can use the whole processing capacity of his/her system without knowing anything about writing parallel codes or super compiler functions by sending the serial code to a cloud server and receiving the parallel version of the code from the cloud server. In this paper, an evolutionary algorithm is leveraged to solve the scheduling problem of tiles. Our proposed super-optimization method will serve as software and provided as a hybrid (public and private) deployment model.

National Category
Engineering and Technology Computer Systems
Identifiers
urn:nbn:se:mdh:diva-45148 (URN)
Conference
IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip MCSoC-2019, 01 Oct 2019, Singapore, Sweden
Projects
DeepMaker: Deep Learning Accelerator on Commercial Programmable Devices
Available from: 2019-09-05 Created: 2019-09-05 Last updated: 2019-09-05Bibliographically approved
Yazdanpanah, F., AfsharMazayejani, R., Alaei, M., Rezaei, A. & Daneshtalab, M. (2019). An energy-efficient partition-based XYZ-planar routing algorithm for a wireless network-on-chip. Journal of Supercomputing, 75(2), 837-861
Open this publication in new window or tab >>An energy-efficient partition-based XYZ-planar routing algorithm for a wireless network-on-chip
Show others...
2019 (English)In: Journal of Supercomputing, ISSN 0920-8542, E-ISSN 1573-0484, Vol. 75, no 2, p. 837-861Article in journal (Refereed) Published
Abstract [en]

In the current many-core architectures, network-on-chips (NoCs) have been efficiently utilized as communication backbones for enabling massive parallelism and high degree of integration on a chip. In spite of the advantages of conventional NoCs, wired multi-hop links impose limitations on their performance by long delay and much power consumption especially in large systems. To overcome these limitations, different solutions such as using wireless interconnections have been proposed. Utilizing long-range, high bandwidth and low power wireless links can lead to solve the problems corresponding to wired links. Meanwhile, the grid-like mesh is the most stable topology in conventional NoC designs. That is why most of the wireless network-on-chip (WNoC) architectures have been designed based on this topology. The goals of this article are to challenge mesh topology and to demonstrate the efficiency of honeycomb-based WNoC architectures. In this article, we propose HoneyWiN, hybrid wired/wireless NoC architecture with honeycomb topology. Also, a partition-based XYZ-planar routing algorithm for energy conservation is proposed. In order to demonstrate the advantages of the proposed architecture, first, an analytical comparison of HoneyWiN with a mesh-based WNoC, as the baseline architecture, is carried out. In order to compare the proposed architecture, we implement our partition-based routing algorithm in the form of 2-axes coordinate system in the baseline architecture. Simulation results show that HoneyWiN reduces about 17% of energy consumption while increases the throughput by 10% compared to the mesh-based WNoC. Then, HoneyWiN is compared with four state-of-the-art mesh-based NoC architectures. In all of the evaluations, HoneyWiN provides higher performance in term of delay, throughput and energy consumption. Overall, the results indicate that HoneyWiN is very effective in improving throughput, increasing speed and reducing energy consumption.

Place, publisher, year, edition, pages
SPRINGER, 2019
Keywords
Wireless NoC, Honeycomb topology, XYZ-planar routing, Partitioning
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:mdh:diva-42949 (URN)10.1007/s11227-018-2617-x (DOI)000460063500019 ()2-s2.0-85053846712 (Scopus ID)
Available from: 2019-03-22 Created: 2019-03-22 Last updated: 2019-07-01Bibliographically approved
Loni, M., Hamouachy, F., Casarrubios, C., Daneshtalab, M. & Nolin, M. (2019). AutoRIO: An Indoor Testbed for Developing Autonomous Vehicles. In: International Japan-Africa Conference on Electronics, Communications and Computations JAC-ECC: . Paper presented at International Japan-Africa Conference on Electronics, Communications and Computations JAC-ECC, 16 Dec 2018, Alexandria, Egypt (pp. 69-72).
Open this publication in new window or tab >>AutoRIO: An Indoor Testbed for Developing Autonomous Vehicles
Show others...
2019 (English)In: International Japan-Africa Conference on Electronics, Communications and Computations JAC-ECC, 2019, p. 69-72Conference paper, Published paper (Refereed)
Abstract [en]

Autonomous vehicles have a great influence on our life. These vehicles are more convenient, more energy efficient providing higher safety level and cheaper driving solutions. In addition, decreasing the generation of CO 2 , and the risk vehicular accidents are other benefits of autonomous vehicles. However, leveraging a full autonomous system is challenging and the proposed solutions are newfound. Providing a testbed for evaluating new algorithms is beneficial for researchers and hardware developers to verify the real impact of their solutions. The existence of testing environment is a low-cost infrastructure leading to increase the time-to-market of novel ideas. In this paper, we propose Auto Rio, a cutting-edge indoor testbed for developing autonomous vehicles.

National Category
Engineering and Technology Computer Systems
Identifiers
urn:nbn:se:mdh:diva-42236 (URN)10.1109/JEC-ECC.2018.8679543 (DOI)000465120800017 ()2-s2.0-85064611063 (Scopus ID)9781538692301 (ISBN)
Conference
International Japan-Africa Conference on Electronics, Communications and Computations JAC-ECC, 16 Dec 2018, Alexandria, Egypt
Projects
DPAC - Dependable Platforms for Autonomous systems and Control
Available from: 2018-12-28 Created: 2018-12-28 Last updated: 2019-05-09Bibliographically approved
Salimi, M., Majd, A., Loni, M., Seceleanu, T., Seceleanu, C., Sirjani, M., . . . Troubitsyna, E. (2019). Multi-objective Optimization of Real-Time Task Scheduling Problem for Distributed Environments. In: 6th Conference on the Engineering of Computer Based Systems ECBS 2019: . Paper presented at 6th Conference on the Engineering of Computer Based Systems ECBS 2019, 02 Sep 2019, Bucharest, Romania.
Open this publication in new window or tab >>Multi-objective Optimization of Real-Time Task Scheduling Problem for Distributed Environments
Show others...
2019 (English)In: 6th Conference on the Engineering of Computer Based Systems ECBS 2019, 2019Conference paper, Published paper (Refereed)
National Category
Engineering and Technology Computer Systems
Identifiers
urn:nbn:se:mdh:diva-45052 (URN)978-1-4503-7636-5 (ISBN)
Conference
6th Conference on the Engineering of Computer Based Systems ECBS 2019, 02 Sep 2019, Bucharest, Romania
Projects
DPAC - Dependable Platforms for Autonomous systems and Control
Available from: 2019-08-23 Created: 2019-08-23 Last updated: 2019-08-23Bibliographically approved
Loni, M., Zoljodi, A., Seenan, S., Daneshtalab, M. & Nolin, M. (2019). NeuroPower: Designing Energy Efficient Convolutional Neural Network Architecture for Embedded Systems. In: The 28th International Conference on Artificial Neural Networks ICANN 2019: . Paper presented at The 28th International Conference on Artificial Neural Networks ICANN 2019, 17 Sep 2019, Munich, Germany. Munich, Germany: Springer
Open this publication in new window or tab >>NeuroPower: Designing Energy Efficient Convolutional Neural Network Architecture for Embedded Systems
Show others...
2019 (English)In: The 28th International Conference on Artificial Neural Networks ICANN 2019, Munich, Germany: Springer , 2019Conference paper, Published paper (Refereed)
Abstract [en]

Convolutional Neural Networks (CNNs) suffer from energy-hungry implementation due to their computation and memory intensive processing patterns. This problem is even more significant by the proliferation of CNNs on embedded platforms. To overcome this problem, we offer NeuroPower as an automatic framework that designs a highly optimized and energy efficient set of CNN architectures for embedded systems. NeuroPower explores and prunes the design space to find improved set of neural architectures. Toward this aim, a multi-objective optimization strategy is integrated to solve Neural Architecture Search (NAS) problem by near-optimal tuning network hyperparameters. The main objectives of the optimization algorithm are network accuracy and number of parameters in the network. The evaluation results show the effectiveness of NeuroPower on energy consumption, compacting rate and inference time compared to other cutting-edge approaches. In comparison with the best results on CIFAR-10/CIFAR-100 datasets, a generated network by NeuroPower presents up to 2.1x/1.56x compression rate, 1.59x/3.46x speedup and 1.52x/1.82x power saving while loses 2.4%/-0.6% accuracy, respectively.

Place, publisher, year, edition, pages
Munich, Germany: Springer, 2019
Keywords
Convolutional neural networks (CNNs), Neural Architecture Search (NAS), Embedded Systems, Multi-Objective Optimization
National Category
Engineering and Technology Computer Systems
Identifiers
urn:nbn:se:mdh:diva-45043 (URN)
Conference
The 28th International Conference on Artificial Neural Networks ICANN 2019, 17 Sep 2019, Munich, Germany
Projects
DPAC - Dependable Platforms for Autonomous systems and ControlDeepMaker: Deep Learning Accelerator on Commercial Programmable Devices
Available from: 2019-08-23 Created: 2019-08-23 Last updated: 2019-08-23Bibliographically approved
Maleki, N., Loni, M., Daneshtalab, M., Conti, M. & Fotouhi, H. (2019). SoFA: A Spark-oriented Fog Architecture. In: IEEE 45th Annual Conference of the Industrial Electronics Society IECON'19: . Paper presented at IEEE 45th Annual Conference of the Industrial Electronics Society IECON'19, 14 Oct 2019, Lisbon, Portugal.
Open this publication in new window or tab >>SoFA: A Spark-oriented Fog Architecture
Show others...
2019 (English)In: IEEE 45th Annual Conference of the Industrial Electronics Society IECON'19, 2019Conference paper, Published paper (Refereed)
Abstract [en]

Fog computing offers a wide range of service levels including low bandwidth usage, low response time, support of heterogeneous applications, and high energy efficiency. Therefore, real-time embedded applications could potentially benefit from Fog infrastructure. However, providing high system utilization is an important challenge of Fog computing especially for processing embedded applications. In addition, although Fog computing extends cloud computing by providing more energy efficiency, it still suffers from remarkable energy consumption, which is a limitation for embedded systems. To overcome the above limitations, in this paper, we propose SoFA, a Spark-oriented Fog architecture that leverages Spark functionalities to provide higher system utilization, energy efficiency, and scalability. Compared to the common Fog computing platforms where edge devices are only responsible for processing data received from their IoT nodes, SoFA leverages the remaining processing capacity of all other edge devices. To attain this purpose, SoFA provides a distributed processing paradigm by the help of Spark to utilize the whole processing capacity of all the available edge devices leading to increase energy efficiency and system utilization. In other words, SoFA proposes a near- sensor processing solution in which the edge devices act as the Fog nodes. In addition, SoFA provides scalability by taking advantage of Spark functionalities. According to the experimental results, SoFA is a power-efficient and scalable solution desirable for embedded platforms by providing up to 3.1x energy efficiency for the Word-Count benchmark compared to the common Fog processing platform.

Keywords
Fog ComputingDistributed ProcessingSparkProgrammingIoTEnergy Efficiency
National Category
Engineering and Technology Computer Systems
Identifiers
urn:nbn:se:mdh:diva-45053 (URN)
Conference
IEEE 45th Annual Conference of the Industrial Electronics Society IECON'19, 14 Oct 2019, Lisbon, Portugal
Projects
Future factories in the CloudDeepMaker: Deep Learning Accelerator on Commercial Programmable DevicesMobiFog: mobility management in Fog-assisted IoT networksHealth5G: Future eHealth powered by 5GFlexiHealth: flexible softwarized networks for digital healthcare
Available from: 2019-08-22 Created: 2019-08-22 Last updated: 2019-08-22Bibliographically approved
Nazari, N., Loni, M., E. Salehi, M., Daneshtalab, M. & Nolin, M. (2019). TOT-Net: An Endeavor Toward Optimizing Ternary Neural Networks. In: 22nd Euromicro Conference on Digital System Design DSD 2019: . Paper presented at 22nd Euromicro Conference on Digital System Design DSD 2019, 28 Aug 2019, Chalkidiki, Greece.
Open this publication in new window or tab >>TOT-Net: An Endeavor Toward Optimizing Ternary Neural Networks
Show others...
2019 (English)In: 22nd Euromicro Conference on Digital System Design DSD 2019, 2019Conference paper, Published paper (Refereed)
Abstract [en]

High computation demands and big memory resources are the major implementation challenges of Convolutional Neural Networks (CNNs) especially for low-power and resource-limited embedded devices. Many binarized neural networks are recently proposed to address these issues. Although they have significantly decreased computation and memory footprint, they have suffered from accuracy loss especially for large datasets. In this paper, we propose TOT-Net, a ternarized neural network with [-1, 0, 1] values for both weights and activation functions that has simultaneously achieved a higher level of accuracy and less computational load. In fact, first, TOT-Net introduces a simple bitwise logic for convolution computations to reduce the cost of multiply operations. To improve the accuracy, selecting proper activation function and learning rate are influential, but also difficult. As the second contribution, we propose a novel piece-wise activation function, and optimized learning rate for different datasets. Our findings first reveal that 0.01 is a preferable learning rate for the studied datasets. Third, by using an evolutionary optimization approach, we found novel piece-wise activation functions customized for TOT-Net. According to the experimental results, TOT-Net achieves 2.15%, 8.77%, and 5.7/5.52% better accuracy compared to XNOR-Net on CIFAR-10, CIFAR-100, and ImageNet top-5/top-1 datasets, respectively.

Keywords
convolutional neural networks, ternary neural network, activation function, optimization
National Category
Engineering and Technology Computer Systems
Identifiers
urn:nbn:se:mdh:diva-45042 (URN)
Conference
22nd Euromicro Conference on Digital System Design DSD 2019, 28 Aug 2019, Chalkidiki, Greece
Projects
DPAC - Dependable Platforms for Autonomous systems and ControlDeepMaker: Deep Learning Accelerator on Commercial Programmable Devices
Available from: 2019-08-23 Created: 2019-08-23 Last updated: 2019-08-23Bibliographically approved
Akbari, N., Modarressi, M., Daneshtalab, M. & Loni, M. (2018). A Customized Processing-in-Memory Architecture for Biological Sequence Alignment. In: Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors: . Paper presented at 29th IEEE International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2018, 10 July 2018 through 12 July 2018. Institute of Electrical and Electronics Engineers Inc., Article ID 8445124.
Open this publication in new window or tab >>A Customized Processing-in-Memory Architecture for Biological Sequence Alignment
2018 (English)In: Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors, Institute of Electrical and Electronics Engineers Inc. , 2018, article id 8445124Conference paper, Published paper (Refereed)
Abstract [en]

Sequence alignment is the most widely used operation in bioinformatics. With the exponential growth of the biological sequence databases, searching a database to find the optimal alignment for a query sequence (that can be at the order of hundreds of millions of characters long) would require excessive processing power and memory bandwidth. Sequence alignment algorithms can potentially benefit from the processing power of massive parallel processors due their simple arithmetic operations, coupled with the inherent fine-grained and coarse-grained parallelism that they exhibit. However, the limited memory bandwidth in conventional computing systems prevents exploiting the maximum achievable speedup. In this paper, we propose a processing-in-memory architecture as a viable solution for the excessive memory bandwidth demand of bioinformatics applications. The design is composed of a set of simple and lightweight processing elements, customized to the sequence alignment algorithm, integrated at the logic layer of an emerging 3D DRAM architecture. Experimental results show that the proposed architecture results in up to 2.4x speedup and 41% reduction in power consumption, compared to a processor-side parallel implementation. 

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc., 2018
Keywords
Accelerator, Processing-in-memory, Sequence Alignment, Alignment, Bandwidth, Bioinformatics, Computation theory, Dynamic random access storage, Parallel processing systems, Particle accelerators, Query processing, 3d dram architectures, Bioinformatics applications, Biological sequence alignment, Massive parallel processors, Parallel implementations, Processing in memory, Proposed architectures, Sequence alignments, Memory architecture
National Category
Embedded Systems
Identifiers
urn:nbn:se:mdh:diva-41018 (URN)10.1109/ASAP.2018.8445124 (DOI)000447635800027 ()2-s2.0-85053445393 (Scopus ID)9781538674796 (ISBN)
Conference
29th IEEE International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2018, 10 July 2018 through 12 July 2018
Available from: 2018-09-27 Created: 2018-09-27 Last updated: 2018-12-27Bibliographically approved
Ebrahimi, M. & Daneshtalab, M. (2018). A General Methodology on Designing Acyclic Channel Dependency Graphs in Interconnection Networks. IEEE Micro, 38(3), 79-85
Open this publication in new window or tab >>A General Methodology on Designing Acyclic Channel Dependency Graphs in Interconnection Networks
2018 (English)In: IEEE Micro, ISSN 0272-1732, E-ISSN 1937-4143, Vol. 38, no 3, p. 79-85Article in journal (Refereed) Published
Abstract [en]

For the past three decades, the interconnection network has been developed based on two major theories, one by Dally and the other by Duato. In this article, we introduce EbDa with a simplified theoretical basis, which directly allows for designing an acyclic channel dependency graph and verifying algorithms on their freedom from deadlock. EbDa is composed of three theorems that enable extracting all allowable turns without dealing with turn models.

Place, publisher, year, edition, pages
IEEE Computer Society, 2018
Keywords
channel dependency graph, deadlock avoidance, hardware, interconnection networks, routing algorithms, Computer hardware, Program processors, Dependency graphs, General methodologies, Turn model, Interconnection networks (circuit switching)
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:mdh:diva-39365 (URN)10.1109/MM.2018.032271064 (DOI)000432316500010 ()2-s2.0-85046996689 (Scopus ID)
Available from: 2018-05-31 Created: 2018-05-31 Last updated: 2018-10-01Bibliographically approved
Loni, M., Daneshtalab, M. & Sjödin, M. (2018). ADONN: Adaptive Design of Optimized Deep Neural Networks for Embedded Systems. In: 21st Euromicro Conference on Digital System Design DSD'18: . Paper presented at 21st Euromicro Conference on Digital System Design DSD'18, 29 Aug 2018, Prague, Czech Republic (pp. 397-404). , Article ID 8491845.
Open this publication in new window or tab >>ADONN: Adaptive Design of Optimized Deep Neural Networks for Embedded Systems
2018 (English)In: 21st Euromicro Conference on Digital System Design DSD'18, 2018, p. 397-404, article id 8491845Conference paper, Published paper (Refereed)
Abstract [en]

Nowadays, many modern applications, e.g. autonomous system, and cloud data services need to capture and process a big amount of raw data at runtime, that ultimately necessitates a high-performance computing model. Deep Neural Network (DNN) has already revealed its learning capabilities in runtime data processing for modern applications. However, DNNs are becoming more deep sophisticated models for gaining higher accuracy which require a remarkable computing capacity. Considering high-performance cloud infrastructure as a supplier of required computational throughput is often not feasible. Instead, we intend to find a near-sensor processing solution which will lower the need for network bandwidth and increase privacy and power efficiency, as well as guaranteeing worst-case response-times. Toward this goal, we introduce ADONN framework, which aims to automatically design a highly robust DNN architecture for embedded devices as the closest processing unit to the sensors. ADONN adroitly searches the design space to find improved neural architectures. Our proposed framework takes advantage of a multi-objective evolutionary approach, which exploits a pruned design space inspired by a dense architecture. Unlike recent works that mainly have tried to generate highly accurate networks, ADONN also considers the network size factor as the second objective to build a highly optimized network fitting with limited computational resource budgets while delivers comparable accuracy level. In comparison with the best result on CIFAR-10 dataset, a generated network by ADONN presents up to 26.4 compression rate while loses only 4% accuracy. In addition, ADONN maps the generated DNN on the commodity programmable devices including ARM Processor, Hiph-Performance CPU, GPU, and FPGA.

Keywords
Neural Architectural Search, Approximation Computing, Neural Processing Unit, Multi-Objective Optimization
National Category
Computer Systems
Identifiers
urn:nbn:se:mdh:diva-40876 (URN)10.1109/DSD.2018.00074 (DOI)2-s2.0-85056450132 (Scopus ID)9781538673768 (ISBN)
Conference
21st Euromicro Conference on Digital System Design DSD'18, 29 Aug 2018, Prague, Czech Republic
Projects
DPAC - Dependable Platforms for Autonomous systems and ControlFAST-ARTS: Fast and Sustainable Analysis Techniques for Advanced Real-Time SystemsDeepMaker: Deep Learning Accelerator on Commercial Programmable Devices
Available from: 2018-09-20 Created: 2018-09-20 Last updated: 2018-11-29Bibliographically approved
Organisations

Search in DiVA

Show all publications