mdh.sePublications
Change search
Link to record
Permanent link

Direct link
BETA
Daneshtalab, Masoud
Publications (10 of 30) Show all publications
Majd, A., Loni, M., Sahebi, G., Daneshtalab, M. & Troubitsyna, E. (2019). A Cloud Based Super-Optimization Method to Parallelize the Sequential Code’s Nested Loops. In: IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip MCSoC-2019: . Paper presented at IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip MCSoC-2019, 01 Oct 2019, Singapore, Sweden.
Open this publication in new window or tab >>A Cloud Based Super-Optimization Method to Parallelize the Sequential Code’s Nested Loops
Show others...
2019 (English)In: IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip MCSoC-2019, 2019Conference paper, Published paper (Refereed)
Abstract [en]

Advances in hardware architecture regarding multi-core processors make parallel computing ubiquitous. To achieve the maximum utilization of multi-core processors, parallel programming techniques are required. However, there are several challenges standing in front of parallel programming. These problems are mainly divided into three major groups. First, although recent advancements in parallel programming languages (e.g. MPI, OpenCL, etc.) assist developers, still parallel programming is not desirable for most programmers. The second one belongs to the massive volume of old software and applications, which have been written in serial mode. However, converting millions of line of serial codes to parallel codes is highly time-consuming and requiring huge verification effort. Third, the production of software and applications in parallel mode is very expensive since it needs knowledge and expertise. Super-optimization provided by super compilers is the process of automatically determine the dependent and independent instructions to find any data dependency and loop-free sequence of instructions. Super compiler then runs these instructions on different processors in the parallel mode, if it is possible. Super-optimization is a feasible solution for helping the programmer to get relaxed from parallel programming workload. Since the most complexity of the sequential codes is in the nested loops, we try to parallelize the nested loops by using the idea of super-optimization. One of the underlying stages in the super-optimization is scheduling tiled space for iterating nested loops. Since the problem is NP-Hard, using the traditional optimization methods are not feasible. In this paper, we propose a cloud-based super-optimization method as Software-as-a-Service (SaaS) to reduce the cost of parallel programming. In addition, it increases the utilization of the processing capacity of the multi-core processor. As the result, an intermediate programmer can use the whole processing capacity of his/her system without knowing anything about writing parallel codes or super compiler functions by sending the serial code to a cloud server and receiving the parallel version of the code from the cloud server. In this paper, an evolutionary algorithm is leveraged to solve the scheduling problem of tiles. Our proposed super-optimization method will serve as software and provided as a hybrid (public and private) deployment model.

National Category
Engineering and Technology Computer Systems
Identifiers
urn:nbn:se:mdh:diva-45148 (URN)
Conference
IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip MCSoC-2019, 01 Oct 2019, Singapore, Sweden
Projects
DeepMaker: Deep Learning Accelerator on Commercial Programmable Devices
Available from: 2019-09-05 Created: 2019-09-05 Last updated: 2019-09-05Bibliographically approved
Kakakhel, S. R., Westerlund, T., Daneshtalab, M., Zou, Z., Plosila, J. & Tenhunen, H. (2019). A qualitative comparison model for application layer IoT protocols. In: 2019 4th International Conference on Fog and Mobile Edge Computing, FMEC 2019: . Paper presented at 2019 4th International Conference on Fog and Mobile Edge Computing, FMEC 2019 (pp. 210-215). Institute of Electrical and Electronics Engineers Inc.
Open this publication in new window or tab >>A qualitative comparison model for application layer IoT protocols
Show others...
2019 (English)In: 2019 4th International Conference on Fog and Mobile Edge Computing, FMEC 2019, Institute of Electrical and Electronics Engineers Inc. , 2019, p. 210-215Conference paper, Published paper (Refereed)
Abstract [en]

Protocols enable things to connect and communicate, thus making the Internet of Things possible. The performance aspect of the Internet of Things protocols, vital to its widespread utilization, have received much attention. However, one aspect of IoT protocols, essential to its adoption in the real world, is a protocols' feature set. Comparative analysis based on competing features and properties are rarely if ever, discussed in the literature. In this paper, we define 19 attributes in 5 categories that are essential for IoT stakeholders to consider. These attributes are then used to contrast four IoT protocols, MQTT, HTTP, CoAP and XMPP. Furthermore, we discuss scenarios where an assessment based on comparative strengths and weaknesses would be beneficial. The provided comparison model can be easily extended to include protocols like MQTT-SN, AMQP and DDS. 

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc., 2019
Keywords
CoAP, HTTP, IoT Protocols, MQTT, qualitative comparison, XMPP
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:mdh:diva-45367 (URN)10.1109/FMEC.2019.8795324 (DOI)2-s2.0-85071699598 (Scopus ID)9781728117966 (ISBN)
Conference
2019 4th International Conference on Fog and Mobile Edge Computing, FMEC 2019
Available from: 2019-10-03 Created: 2019-10-03 Last updated: 2019-10-03Bibliographically approved
Yazdanpanah, F., AfsharMazayejani, R., Alaei, M., Rezaei, A. & Daneshtalab, M. (2019). An energy-efficient partition-based XYZ-planar routing algorithm for a wireless network-on-chip. Journal of Supercomputing, 75(2), 837-861
Open this publication in new window or tab >>An energy-efficient partition-based XYZ-planar routing algorithm for a wireless network-on-chip
Show others...
2019 (English)In: Journal of Supercomputing, ISSN 0920-8542, E-ISSN 1573-0484, Vol. 75, no 2, p. 837-861Article in journal (Refereed) Published
Abstract [en]

In the current many-core architectures, network-on-chips (NoCs) have been efficiently utilized as communication backbones for enabling massive parallelism and high degree of integration on a chip. In spite of the advantages of conventional NoCs, wired multi-hop links impose limitations on their performance by long delay and much power consumption especially in large systems. To overcome these limitations, different solutions such as using wireless interconnections have been proposed. Utilizing long-range, high bandwidth and low power wireless links can lead to solve the problems corresponding to wired links. Meanwhile, the grid-like mesh is the most stable topology in conventional NoC designs. That is why most of the wireless network-on-chip (WNoC) architectures have been designed based on this topology. The goals of this article are to challenge mesh topology and to demonstrate the efficiency of honeycomb-based WNoC architectures. In this article, we propose HoneyWiN, hybrid wired/wireless NoC architecture with honeycomb topology. Also, a partition-based XYZ-planar routing algorithm for energy conservation is proposed. In order to demonstrate the advantages of the proposed architecture, first, an analytical comparison of HoneyWiN with a mesh-based WNoC, as the baseline architecture, is carried out. In order to compare the proposed architecture, we implement our partition-based routing algorithm in the form of 2-axes coordinate system in the baseline architecture. Simulation results show that HoneyWiN reduces about 17% of energy consumption while increases the throughput by 10% compared to the mesh-based WNoC. Then, HoneyWiN is compared with four state-of-the-art mesh-based NoC architectures. In all of the evaluations, HoneyWiN provides higher performance in term of delay, throughput and energy consumption. Overall, the results indicate that HoneyWiN is very effective in improving throughput, increasing speed and reducing energy consumption.

Place, publisher, year, edition, pages
SPRINGER, 2019
Keywords
Wireless NoC, Honeycomb topology, XYZ-planar routing, Partitioning
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:mdh:diva-42949 (URN)10.1007/s11227-018-2617-x (DOI)000460063500019 ()2-s2.0-85053846712 (Scopus ID)
Available from: 2019-03-22 Created: 2019-03-22 Last updated: 2019-07-01Bibliographically approved
Loni, M., Hamouachy, F., Casarrubios, C., Daneshtalab, M. & Nolin, M. (2019). AutoRIO: An Indoor Testbed for Developing Autonomous Vehicles. In: International Japan-Africa Conference on Electronics, Communications and Computations JAC-ECC: . Paper presented at International Japan-Africa Conference on Electronics, Communications and Computations JAC-ECC, 16 Dec 2018, Alexandria, Egypt (pp. 69-72).
Open this publication in new window or tab >>AutoRIO: An Indoor Testbed for Developing Autonomous Vehicles
Show others...
2019 (English)In: International Japan-Africa Conference on Electronics, Communications and Computations JAC-ECC, 2019, p. 69-72Conference paper, Published paper (Refereed)
Abstract [en]

Autonomous vehicles have a great influence on our life. These vehicles are more convenient, more energy efficient providing higher safety level and cheaper driving solutions. In addition, decreasing the generation of CO 2 , and the risk vehicular accidents are other benefits of autonomous vehicles. However, leveraging a full autonomous system is challenging and the proposed solutions are newfound. Providing a testbed for evaluating new algorithms is beneficial for researchers and hardware developers to verify the real impact of their solutions. The existence of testing environment is a low-cost infrastructure leading to increase the time-to-market of novel ideas. In this paper, we propose Auto Rio, a cutting-edge indoor testbed for developing autonomous vehicles.

National Category
Engineering and Technology Computer Systems
Identifiers
urn:nbn:se:mdh:diva-42236 (URN)10.1109/JEC-ECC.2018.8679543 (DOI)000465120800017 ()2-s2.0-85064611063 (Scopus ID)9781538692301 (ISBN)
Conference
International Japan-Africa Conference on Electronics, Communications and Computations JAC-ECC, 16 Dec 2018, Alexandria, Egypt
Projects
DPAC - Dependable Platforms for Autonomous systems and Control
Available from: 2018-12-28 Created: 2018-12-28 Last updated: 2019-05-09Bibliographically approved
Ghaderi, A., Daneshtalab, M., Ashjaei, S. M., Loni, M., Mubeen, S. & Sjödin, M. (2019). Design challenges in hardware development of time-sensitive networking: A research plan. In: CEUR Workshop Proceedings: . Paper presented at 2019 Cyber-Physical Systems PhD Workshop, CPSWS 2019; Alghero; Italy; 23 September 2019. CEUR-WS, 2457
Open this publication in new window or tab >>Design challenges in hardware development of time-sensitive networking: A research plan
Show others...
2019 (English)In: CEUR Workshop Proceedings, CEUR-WS , 2019, Vol. 2457Conference paper, Published paper (Refereed)
Abstract [en]

Time-Sensitive Networking (TSN) is a set of ongoing projects within the IEEE standardization to guarantee timeliness and low-latency communication based on switched Ethernet for industrial applications. The huge demand is mainly coming from industries where intensive data transmission is required, such as in the modern vehicles where cameras, lidars and high-bandwidth modern sensors are connected. The TSN standards are evolving over time, hence the hardware needs to change depending upon the modifications. In addition, high performance hardware is required to obtain a full benefit from the standards. In this paper, we present a research plan for developing novel techniques to support a parameterized and modular hardware IP core of the multi-stage TSN switch fabric in VHSIC (Very High Speed Integrated Circuit) Hardware Description Language (VHDL), which can be deployed in any Field-Programmable-Gate-Array (FPGA) devices. We present the challenges on the way towards the mentioned goal. 

Place, publisher, year, edition, pages
CEUR-WS, 2019
Keywords
FPGA, Memory management, Predictability, Time-sensitive network, Cyber Physical System, Embedded systems, Field programmable gate arrays (FPGA), Integrated circuit design, Vehicle transmissions, Design challenges, Hardware development, High-performance hardware, Low-latency communication, Switched ethernet, Very high speed integrated circuits, Computer hardware description languages
National Category
Computer Engineering Embedded Systems
Identifiers
urn:nbn:se:mdh:diva-45837 (URN)2-s2.0-85073187187 (Scopus ID)
Conference
2019 Cyber-Physical Systems PhD Workshop, CPSWS 2019; Alghero; Italy; 23 September 2019
Available from: 2019-10-25 Created: 2019-10-25 Last updated: 2019-10-25
Salimi, M., Majd, A., Loni, M., Seceleanu, T., Seceleanu, C., Sirjani, M., . . . Troubitsyna, E. (2019). Multi-objective Optimization of Real-Time Task Scheduling Problem for Distributed Environments. In: 6th Conference on the Engineering of Computer Based Systems ECBS 2019: . Paper presented at 6th Conference on the Engineering of Computer Based Systems ECBS 2019, 02 Sep 2019, Bucharest, Romania.
Open this publication in new window or tab >>Multi-objective Optimization of Real-Time Task Scheduling Problem for Distributed Environments
Show others...
2019 (English)In: 6th Conference on the Engineering of Computer Based Systems ECBS 2019, 2019Conference paper, Published paper (Refereed)
National Category
Engineering and Technology Computer Systems
Identifiers
urn:nbn:se:mdh:diva-45052 (URN)978-1-4503-7636-5 (ISBN)
Conference
6th Conference on the Engineering of Computer Based Systems ECBS 2019, 02 Sep 2019, Bucharest, Romania
Projects
DPAC - Dependable Platforms for Autonomous systems and Control
Available from: 2019-08-23 Created: 2019-08-23 Last updated: 2019-08-23Bibliographically approved
Loni, M., Zoljodi, A., Seenan, S., Daneshtalab, M. & Nolin, M. (2019). NeuroPower: Designing Energy Efficient Convolutional Neural Network Architecture for Embedded Systems. In: Lecture Notes in Computer Science, Volume 11727: . Paper presented at The 28th International Conference on Artificial Neural Networks ICANN 2019, 17 Sep 2019, Munich, Germany (pp. 208-222). Munich, Germany: Springer
Open this publication in new window or tab >>NeuroPower: Designing Energy Efficient Convolutional Neural Network Architecture for Embedded Systems
Show others...
2019 (English)In: Lecture Notes in Computer Science, Volume 11727, Munich, Germany: Springer , 2019, p. 208-222Conference paper, Published paper (Refereed)
Abstract [en]

Convolutional Neural Networks (CNNs) suffer from energy-hungry implementation due to their computation and memory intensive processing patterns. This problem is even more significant by the proliferation of CNNs on embedded platforms. To overcome this problem, we offer NeuroPower as an automatic framework that designs a highly optimized and energy efficient set of CNN architectures for embedded systems. NeuroPower explores and prunes the design space to find improved set of neural architectures. Toward this aim, a multi-objective optimization strategy is integrated to solve Neural Architecture Search (NAS) problem by near-optimal tuning network hyperparameters. The main objectives of the optimization algorithm are network accuracy and number of parameters in the network. The evaluation results show the effectiveness of NeuroPower on energy consumption, compacting rate and inference time compared to other cutting-edge approaches. In comparison with the best results on CIFAR-10/CIFAR-100 datasets, a generated network by NeuroPower presents up to 2.1x/1.56x compression rate, 1.59x/3.46x speedup and 1.52x/1.82x power saving while loses 2.4%/-0.6% accuracy, respectively.

Place, publisher, year, edition, pages
Munich, Germany: Springer, 2019
Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; 11727
Keywords
Convolutional neural networks (CNNs), Neural Architecture Search (NAS), Embedded Systems, Multi-Objective Optimization
National Category
Engineering and Technology Computer Systems
Identifiers
urn:nbn:se:mdh:diva-45043 (URN)10.1007/978-3-030-30487-4_17 (DOI)2-s2.0-85072863572 (Scopus ID)9783030304867 (ISBN)
Conference
The 28th International Conference on Artificial Neural Networks ICANN 2019, 17 Sep 2019, Munich, Germany
Projects
DPAC - Dependable Platforms for Autonomous systems and ControlDeepMaker: Deep Learning Accelerator on Commercial Programmable Devices
Available from: 2019-08-23 Created: 2019-08-23 Last updated: 2019-10-17Bibliographically approved
Maleki, N., Loni, M., Daneshtalab, M., Conti, M. & Fotouhi, H. (2019). SoFA: A Spark-oriented Fog Architecture. In: IEEE 45th Annual Conference of the Industrial Electronics Society IECON'19: . Paper presented at IEEE 45th Annual Conference of the Industrial Electronics Society IECON'19, 14 Oct 2019, Lisbon, Portugal.
Open this publication in new window or tab >>SoFA: A Spark-oriented Fog Architecture
Show others...
2019 (English)In: IEEE 45th Annual Conference of the Industrial Electronics Society IECON'19, 2019Conference paper, Published paper (Refereed)
Abstract [en]

Fog computing offers a wide range of service levels including low bandwidth usage, low response time, support of heterogeneous applications, and high energy efficiency. Therefore, real-time embedded applications could potentially benefit from Fog infrastructure. However, providing high system utilization is an important challenge of Fog computing especially for processing embedded applications. In addition, although Fog computing extends cloud computing by providing more energy efficiency, it still suffers from remarkable energy consumption, which is a limitation for embedded systems. To overcome the above limitations, in this paper, we propose SoFA, a Spark-oriented Fog architecture that leverages Spark functionalities to provide higher system utilization, energy efficiency, and scalability. Compared to the common Fog computing platforms where edge devices are only responsible for processing data received from their IoT nodes, SoFA leverages the remaining processing capacity of all other edge devices. To attain this purpose, SoFA provides a distributed processing paradigm by the help of Spark to utilize the whole processing capacity of all the available edge devices leading to increase energy efficiency and system utilization. In other words, SoFA proposes a near- sensor processing solution in which the edge devices act as the Fog nodes. In addition, SoFA provides scalability by taking advantage of Spark functionalities. According to the experimental results, SoFA is a power-efficient and scalable solution desirable for embedded platforms by providing up to 3.1x energy efficiency for the Word-Count benchmark compared to the common Fog processing platform.

Keywords
Fog ComputingDistributed ProcessingSparkProgrammingIoTEnergy Efficiency
National Category
Engineering and Technology Computer Systems
Identifiers
urn:nbn:se:mdh:diva-45053 (URN)
Conference
IEEE 45th Annual Conference of the Industrial Electronics Society IECON'19, 14 Oct 2019, Lisbon, Portugal
Projects
Future factories in the CloudDeepMaker: Deep Learning Accelerator on Commercial Programmable DevicesMobiFog: mobility management in Fog-assisted IoT networksHealth5G: Future eHealth powered by 5GFlexiHealth: flexible softwarized networks for digital healthcare
Available from: 2019-08-22 Created: 2019-08-22 Last updated: 2019-08-22Bibliographically approved
Nazari, N., Loni, M., E. Salehi, M., Daneshtalab, M. & Sjödin, M. (2019). TOT-Net: An Endeavor Toward Optimizing Ternary Neural Networks. In: 22nd Euromicro Conference on Digital System Design DSD 2019: . Paper presented at 22nd Euromicro Conference on Digital System Design DSD 2019, 28 Aug 2019, Chalkidiki, Greece (pp. 305-312). , Article ID 8875067,.
Open this publication in new window or tab >>TOT-Net: An Endeavor Toward Optimizing Ternary Neural Networks
Show others...
2019 (English)In: 22nd Euromicro Conference on Digital System Design DSD 2019, 2019, p. 305-312, article id 8875067Conference paper, Published paper (Refereed)
Abstract [en]

High computation demands and big memory resources are the major implementation challenges of Convolutional Neural Networks (CNNs) especially for low-power and resource-limited embedded devices. Many binarized neural networks are recently proposed to address these issues. Although they have significantly decreased computation and memory footprint, they have suffered from accuracy loss especially for large datasets. In this paper, we propose TOT-Net, a ternarized neural network with [-1, 0, 1] values for both weights and activation functions that has simultaneously achieved a higher level of accuracy and less computational load. In fact, first, TOT-Net introduces a simple bitwise logic for convolution computations to reduce the cost of multiply operations. To improve the accuracy, selecting proper activation function and learning rate are influential, but also difficult. As the second contribution, we propose a novel piece-wise activation function, and optimized learning rate for different datasets. Our findings first reveal that 0.01 is a preferable learning rate for the studied datasets. Third, by using an evolutionary optimization approach, we found novel piece-wise activation functions customized for TOT-Net. According to the experimental results, TOT-Net achieves 2.15%, 8.77%, and 5.7/5.52% better accuracy compared to XNOR-Net on CIFAR-10, CIFAR-100, and ImageNet top-5/top-1 datasets, respectively.

Keywords
convolutional neural networks, ternary neural network, activation function, optimization
National Category
Engineering and Technology Computer Systems
Identifiers
urn:nbn:se:mdh:diva-45042 (URN)2-s2.0-85074915397 (Scopus ID)
Conference
22nd Euromicro Conference on Digital System Design DSD 2019, 28 Aug 2019, Chalkidiki, Greece
Projects
DPAC - Dependable Platforms for Autonomous systems and ControlDeepMaker: Deep Learning Accelerator on Commercial Programmable Devices
Available from: 2019-08-23 Created: 2019-08-23 Last updated: 2019-11-21Bibliographically approved
Akbari, N., Modarressi, M., Daneshtalab, M. & Loni, M. (2018). A Customized Processing-in-Memory Architecture for Biological Sequence Alignment. In: Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors: . Paper presented at 29th IEEE International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2018, 10 July 2018 through 12 July 2018. Institute of Electrical and Electronics Engineers Inc., Article ID 8445124.
Open this publication in new window or tab >>A Customized Processing-in-Memory Architecture for Biological Sequence Alignment
2018 (English)In: Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors, Institute of Electrical and Electronics Engineers Inc. , 2018, article id 8445124Conference paper, Published paper (Refereed)
Abstract [en]

Sequence alignment is the most widely used operation in bioinformatics. With the exponential growth of the biological sequence databases, searching a database to find the optimal alignment for a query sequence (that can be at the order of hundreds of millions of characters long) would require excessive processing power and memory bandwidth. Sequence alignment algorithms can potentially benefit from the processing power of massive parallel processors due their simple arithmetic operations, coupled with the inherent fine-grained and coarse-grained parallelism that they exhibit. However, the limited memory bandwidth in conventional computing systems prevents exploiting the maximum achievable speedup. In this paper, we propose a processing-in-memory architecture as a viable solution for the excessive memory bandwidth demand of bioinformatics applications. The design is composed of a set of simple and lightweight processing elements, customized to the sequence alignment algorithm, integrated at the logic layer of an emerging 3D DRAM architecture. Experimental results show that the proposed architecture results in up to 2.4x speedup and 41% reduction in power consumption, compared to a processor-side parallel implementation. 

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc., 2018
Keywords
Accelerator, Processing-in-memory, Sequence Alignment, Alignment, Bandwidth, Bioinformatics, Computation theory, Dynamic random access storage, Parallel processing systems, Particle accelerators, Query processing, 3d dram architectures, Bioinformatics applications, Biological sequence alignment, Massive parallel processors, Parallel implementations, Processing in memory, Proposed architectures, Sequence alignments, Memory architecture
National Category
Embedded Systems
Identifiers
urn:nbn:se:mdh:diva-41018 (URN)10.1109/ASAP.2018.8445124 (DOI)000447635800027 ()2-s2.0-85053445393 (Scopus ID)9781538674796 (ISBN)
Conference
29th IEEE International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2018, 10 July 2018 through 12 July 2018
Available from: 2018-09-27 Created: 2018-09-27 Last updated: 2018-12-27Bibliographically approved
Organisations

Search in DiVA

Show all publications