https://www.mdu.se/

mdu.sePublications
Change search
Refine search result
12 1 - 50 of 91
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Afsharmazayejani, R.
    et al.
    Shahid Bahonar University of Kerman, Kerman, Iran.
    Yazdanpanah, F.
    Vali-e-Asr University, Rafsanjan, Iran.
    Rezaei, A.
    Northwestern University, Evanston, United States.
    Alaei, M.
    Vali-e-Asr University, Rafsanjan, Iran.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    HoneyWiN: Novel honeycomb-based wireless NoC architecture in many-core era2018In: Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349, Vol. 10824 LNCS, p. 304-316Article in journal (Refereed)
    Abstract [en]

    Although NoC-based systems with many cores are commercially available, their multi-hop nature has become a bottleneck on scaling performance and energy consumption parameters. Alternatively, hybrid wireless NoC provides a postern by exploiting single-hop express links for long-distance communications. Also, there is a common wisdom that grid-like mesh is the most stable topology in conventional designs. That is why almost all of the emerging architectures had been relying on this topology as well. In this paper, first we challenge the efficiency of the grid-like mesh in emerging systems. Then, we propose HoneyWiN, a hybrid reconfigurable wireless NoC architecture that relies on the honeycomb topology. The simulation results show that on average HoneyWiN saves 17% of energy consumption while increases the network throughput by 10% compared to its wireless mesh counterpart. 

  • 2.
    Ahmadilivani, M. H.
    et al.
    Tallinn University of Technology, Tallinn, Estonia.
    Raik, J.
    Tallinn University of Technology, Tallinn, Estonia.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems. Tallinn University of Technology, Tallinn, Estonia.
    Kuusik, A.
    Tallinn University of Technology, Tallinn, Estonia.
    Analysis and Improvement of Resilience for Long Short-Term Memory Neural Networks2023In: Proc. IEEE Int. Symp. Defect Fault Toler. VLSI Nanotechnol. Syst., DFT, Institute of Electrical and Electronics Engineers Inc. , 2023Conference paper (Refereed)
    Abstract [en]

    The reliability of Artificial Neural Networks (ANNs) has emerged as a prominent research topic due to their increasing utilization in safety-critical applications. Long Short-Term Memory (LSTM) ANNs have demonstrated significant advantages in healthcare applications, primarily attributed to their robust processing of time-series data and memory-facilitated capabilities. This paper, for the first time, presents a comprehensive and fine-grain analysis of the resilience of LSTM-based ANNs in the context of gait analysis using fault injection into weights. Additionally, we improve their resilience by replacing faulty weights with zero, enabling ANNs to withstand environments that are up to 20 times harsher while experiencing up to 7 times fewer critical faults than an unprotected ANN.

  • 3.
    Ahmadilivani, M. H.
    et al.
    Tallinn University of Technology, Estonia.
    Taheri, M.
    Tallinn University of Technology, Estonia.
    Raik, J.
    Tallinn University of Technology, Estonia.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems. Tallinn University of Technology, Estonia.
    Jenihhin, M.
    Tallinn University of Technology, Estonia.
    A Systematic Literature Review on Hardware Reliability Assessment Methods for Deep Neural Networks2024In: ACM Computing Surveys, ISSN 0360-0300, E-ISSN 1557-7341, Vol. 56, no 6, article id 141Article in journal (Refereed)
    Abstract [en]

    Artificial Intelligence (AI) and, in particular, Machine Learning (ML), have emerged to be utilized in various applications due to their capability to learn how to solve complex problems. Over the past decade, rapid advances in ML have presented Deep Neural Networks (DNNs) consisting of a large number of neurons and layers. DNN Hardware Accelerators (DHAs) are leveraged to deploy DNNs in the target applications. Safety-critical applications, where hardware faults/errors would result in catastrophic consequences, also benefit from DHAs. Therefore, the reliability of DNNs is an essential subject of research. In recent years, several studies have been published accordingly to assess the reliability of DNNs. In this regard, various reliability assessment methods have been proposed on a variety of platforms and applications. Hence, there is a need to summarize the state-of-the-art to identify the gaps in the study of the reliability of DNNs. In this work, we conduct a Systematic Literature Review (SLR) on the reliability assessment methods of DNNs to collect relevant research works as much as possible, present a categorization of them, and address the open challenges. Through this SLR, three kinds of methods for reliability assessment of DNNs are identified, including Fault Injection (FI), Analytical, and Hybrid methods. Since the majority of works assess the DNN reliability by FI, we characterize different approaches and platforms of the FI method comprehensively. Moreover, Analytical and Hybrid methods are propounded. Thus, different reliability assessment methods for DNNs have been elaborated on their conducted DNN platforms and reliability evaluation metrics. Finally, we highlight the advantages and disadvantages of the identified methods and address the open challenges in the research area. We have concluded that Analytical and Hybrid methods are light-weight yet sufficiently accurate and have the potential to be extended in future research and to be utilized in establishing novel DNN reliability assessment frameworks.

  • 4.
    Ahmadilivani, M. H.
    et al.
    Tallinn University of Technology, Tallinn, Estonia.
    Taheri, M.
    Tallinn University of Technology, Tallinn, Estonia.
    Raik, J.
    Tallinn University of Technology, Tallinn, Estonia.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems. Tallinn University of Technology, Tallinn, Estonia.
    Jenihhin, M.
    Tallinn University of Technology, Tallinn, Estonia.
    Enhancing Fault Resilience of QNNs by Selective Neuron Splitting2023In: AICAS 2023 - IEEE International Conference on Artificial Intelligence Circuits and Systems, Proceeding, Institute of Electrical and Electronics Engineers Inc. , 2023Conference paper (Refereed)
    Abstract [en]

    The superior performance of Deep Neural Networks (DNNs) has led to their application in various aspects of human life. Safety-critical applications are no exception and impose rigorous reliability requirements on DNNs. Quantized Neural Networks (QNNs) have emerged to tackle the complexity of DNN accelerators, however, they are more prone to reliability issues.In this paper, a recent analytical resilience assessment method is adapted for QNNs to identify critical neurons based on a Neuron Vulnerability Factor (NVF). Thereafter, a novel method for splitting the critical neurons is proposed that enables the design of a Lightweight Correction Unit (LCU) in the accelerator without redesigning its computational part.The method is validated by experiments on different QNNs and datasets. The results demonstrate that the proposed method for correcting the faults has a twice smaller overhead than a selective Triple Modular Redundancy (TMR) while achieving a similar level of fault resiliency. 

  • 5.
    Ahmadilivani, Mohammad Hasan
    et al.
    Tallinn Univ Technol, Tallinn, Estonia..
    Taheri, Mandi
    Tallinn Univ Technol, Tallinn, Estonia..
    Raik, Jaan
    Tallinn Univ Technol, Tallinn, Estonia..
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems. Tallinn Univ Technol, Tallinn, Estonia.;Mälardalen Univ, Västerås, Sweden..
    Jenihhin, Maksim
    Tallinn Univ Technol, Tallinn, Estonia..
    DeepVigor: VulnerabIlity Value RanGes and FactORs for DNNs' Reliability Assessment2023In: 2023 IEEE EUROPEAN TEST SYMPOSIUM, ETS, IEEE, 2023Conference paper (Refereed)
    Abstract [en]

    Deep Neural Networks (DNNs) and their accelerators are being deployed ever more frequently in safety-critical applications leading to increasing reliability concerns. A traditional and accurate method for assessing DNNs' reliability has been resorting to fault injection, which, however, suffers from prohibitive time complexity. While analytical and hybrid fault injection-/analyticalbased methods have been proposed, they are either inaccurate or specific to particular accelerator architectures. In this work, we propose a novel accurate, fine-grain, metric-oriented, and accelerator-agnostic method called DeepVigor that provides vulnerability value ranges for DNN neurons' outputs. An outcome of DeepVigor is an analytical model representing vulnerable and non-vulnerable ranges for each neuron that can be exploited to develop different techniques for improving DNNs' reliability. Moreover, DeepVigor provides reliability assessment metrics based on vulnerability factors for bits, neurons, and layers using the vulnerability ranges. The proposed method is not only faster than fault injection but also provides extensive and accurate information about the reliability of DNNs, independent from the accelerator. The experimental evaluations in the paper indicate that the proposed vulnerability ranges are 99.9% to 100% accurate even when evaluated on previously unseen test data. Also, it is shown that the obtained vulnerability factors represent the criticality of bits, neurons, and layers proficiently. DeepVigor is implemented in the PyTorch framework and validated on complex DNN benchmarks.

  • 6.
    Ahmadilivani, Mohammed. H.
    et al.
    Tallinn University of Technology, Estonia.
    Barbareschi, Mario
    University of Naples Federico II, Italy.
    Barone, Salvatore
    University of Naples Federico II, Italy.
    Bosio, Alberto
    Ecole Centrale de Lyon, France.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems. Tallinn University of Technology, Estonia.
    Torca, Salvatore. D.
    University of Naples Federico II, Italy.
    Gavarini, Gabriele
    Politecnico di Torino, Italy.
    Jenihhin, Maksim
    Tallinn University of Technology, Estonia.
    Raik, Jaan
    Tallinn University of Technology, Estonia.
    Ruospo, Annachiara
    Politecnico di Torino, Italy.
    Sanchez, Ernesto
    Politecnico di Torino, Italy.
    Taheri, Mahdi
    Tallinn University of Technology, Estonia.
    Special Session: Approximation and Fault Resiliency of DNN Accelerators2023In: Proceedings of the IEEE VLSI Test Symposium, IEEE Computer Society , 2023, Vol. AprilConference paper (Refereed)
    Abstract [en]

    Deep Learning, and in particular, Deep Neural Network (DNN) is nowadays widely used in many scenarios, including safety-critical applications such as autonomous driving. In this context, besides energy efficiency and performance, reliability plays a crucial role since a system failure can jeopardize human life. As with any other device, the reliability of hardware architectures running DNNs has to be evaluated, usually through costly fault injection campaigns. This paper explores approximation and fault resiliency of DNN accelerators. We propose to use approximate (AxC) arithmetic circuits to agilely emulate errors in hardware without performing fault injection on the DNN. To allow fast evaluation of AxC DNN, we developed an efficient GPU-based simulation framework. Further, we propose a fine-grain analysis of fault resiliency by examining fault propagation and masking in networks.

  • 7.
    Akbari, N.
    et al.
    University of Tehran, Tehran, Iran.
    Modarressi, M.
    University of Tehran, Tehran, Iran.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems. Royal Institute of Technology (KTH), Sweden.
    Loni, Mohammad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems. Royal Institute of Technology (KTH), Sweden.
    A Customized Processing-in-Memory Architecture for Biological Sequence Alignment2018In: Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors, Institute of Electrical and Electronics Engineers Inc. , 2018, article id 8445124Conference paper (Refereed)
    Abstract [en]

    Sequence alignment is the most widely used operation in bioinformatics. With the exponential growth of the biological sequence databases, searching a database to find the optimal alignment for a query sequence (that can be at the order of hundreds of millions of characters long) would require excessive processing power and memory bandwidth. Sequence alignment algorithms can potentially benefit from the processing power of massive parallel processors due their simple arithmetic operations, coupled with the inherent fine-grained and coarse-grained parallelism that they exhibit. However, the limited memory bandwidth in conventional computing systems prevents exploiting the maximum achievable speedup. In this paper, we propose a processing-in-memory architecture as a viable solution for the excessive memory bandwidth demand of bioinformatics applications. The design is composed of a set of simple and lightweight processing elements, customized to the sequence alignment algorithm, integrated at the logic layer of an emerging 3D DRAM architecture. Experimental results show that the proposed architecture results in up to 2.4x speedup and 41% reduction in power consumption, compared to a processor-side parallel implementation. 

  • 8.
    Amin, Yoosefi
    et al.
    Mälardalen University, School of Innovation, Design and Engineering. School of Electrical and Computer Engineering, University of Tehran, Tehran, Iran.
    Mousavi, Hamid
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems. School of Computer Systems, Tallinn University of Technology, Tallinn, Estonia.
    Kargahi, M.
    School of Electrical and Computer Engineering, University of Tehran, Tehran, Iran.
    Efficient On-device Transfer Learning using Activation Memory Reduction2023In: Int. Conf. Fog Mob. Edge Comput., FMEC, Institute of Electrical and Electronics Engineers Inc. , 2023, p. 210-215Conference paper (Refereed)
    Abstract [en]

    On-device transfer learning suggests fine-tuning pretrained neural networks on new input data directly on edge devices. The memory limitation of edge devices necessitates using memory-efficient fine-tuning methods. Fine-tuning involves two primary phases: the forward-pass phase and the backwardpass phase. The forward-pass phase generates output activations, and the backward-pass phase computes gradients and updates the parameters accordingly. Although the forward-pass phase demands a temporary memory to store a layer’s input and output activations, the backward-pass phase may require storing the output activations from all layers to compute gradients. This fact introduces the memory cost of the backward-pass phase as the main contributor to the huge training memory demands of deep neural networks (DNNs), which has been the focus of many studies. However, little attention has been made to how the temporary activation memory involved in the forward-pass phase may also act as the memory bottleneck, which is the main focus of this paper. This paper aims to mitigate this memory bottleneck by pruning unimportant channels from layers that require significant temporary activation memory. Experimental results demonstrate how the proposed method effectively reduces peak activation memory and total memory costs of MobileNetV2 by 65% and 59%, respectively, at the cost of 3% accuracy drop.

  • 9.
    Asadi, M.
    et al.
    Department of Electrical Engineering, Tarbiat Modares University, Tehran, Iran.
    Poursalim, F.
    Shiraz University of Medical Science, Shiraz, Iran.
    Loni, Mohammad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sjödin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Gharehbaghi, A.
    Department of Biomedical Engineering, Linköping University, Linköping, Sweden.
    Accurate detection of paroxysmal atrial fibrillation with certified-GAN and neural architecture search2023In: Scientific Reports, E-ISSN 2045-2322, Vol. 13, no 1Article in journal (Refereed)
    Abstract [en]

    This paper presents a novel machine learning framework for detecting PxAF, a pathological characteristic of electrocardiogram (ECG) that can lead to fatal conditions such as heart attack. To enhance the learning process, the framework involves a generative adversarial network (GAN) along with a neural architecture search (NAS) in the data preparation and classifier optimization phases. The GAN is innovatively invoked to overcome the class imbalance of the training data by producing the synthetic ECG for PxAF class in a certified manner. The effect of the certified GAN is statistically validated. Instead of using a general-purpose classifier, the NAS automatically designs a highly accurate convolutional neural network architecture customized for the PxAF classification task. Experimental results show that the accuracy of the proposed framework exhibits a high value of 99.0% which not only enhances state-of-the-art by up to 5.1%, but also improves the classification performance of the two widely-accepted baseline methods, ResNet-18, and Auto-Sklearn, by [Formula: see text] and [Formula: see text].

  • 10.
    Asghari, S. A.
    et al.
    Kharazmi Univ, Dept Elect & Comp Engn, Tehran, Iran.
    Marvasti, M.B
    Kharazmi Univ, Dept Elect & Comp Engn, Tehran, Iran.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    A software implemented comprehensive soft error detection method for embedded systems2020In: Microprocessors and microsystems, ISSN 0141-9331, E-ISSN 1872-9436, Vol. 77, article id 103161Article in journal (Refereed)
    Abstract [en]

    This paper presents a comprehensive software-based technique that is capable of detecting soft errors in embedded systems. Soft errors can be categorized into Control Flow Errors (CFEs) and data errors. The CFEs change the flow of the program erroneously and data errors also change the results. In this paper, a new comprehensive method is presented to detect both (based on combination of authors’ previous works). In order to evaluate the proposed method, a new factor is defined that considers three main parameters simultaneously; namely fault coverage, memory overhead, and performance overhead. Since these parameters are very important in safety critical applications, they should be improved concurrently. The experimental results on SPEC2000 benchmarks show that the Evaluation Factor of the proposed method is 50% better than the Relationship Signatures for Control Flow Checking with Data Validation (RSCFCDV) methods, which are suggested in the literature. 

  • 11.
    Ashjaei, Mohammad
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Mubeen, Saad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Casamayor, Victor
    Technical University of Vienna, Austria.
    Nelissen, Geoffrey
    Eindhoven University of Technology, Netherlands.
    Towards a Predictable and Cognitive Edge-Cloud Architecture for Industrial Systems2022In: Proceedings of RAGE 2022, 2022Conference paper (Refereed)
  • 12.
    Ashjaei, Seyed Mohammad Hossein
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Lo Bello, L.
    University of Catania, Italy.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Patti, G.
    University of Catania, Italy.
    Saponara, S.
    University of Pisa, Italy.
    Mubeen, Saad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Time-Sensitive Networking in automotive embedded systems: State of the art and research opportunities2021In: Journal of systems architecture, ISSN 1383-7621, E-ISSN 1873-6165, Vol. 117, article id 102137Article in journal (Refereed)
    Abstract [en]

    The functionality advancements and novel customer features that are currently found in modern automotive systems require high-bandwidth and low-latency in-vehicle communications, which become even more compelling for autonomous vehicles. In a recent effort to meet these requirements, the IEEE Time-Sensitive Networking (TSN) task group has developed a set of standards that introduce novel features in Switched Ethernet. TSN standards offer, for example, a common notion of time through accurate and reliable clock synchronization, delay bounds for real-time traffic, time-driven transmissions, improved reliability, and much more. In order to fully utilize the potential of these novel protocols in the automotive domain, TSN should be seamlessly integrated into the state-of-the-art and state-of-practice model-based development processes for automotive embedded systems. Some of the core phases in these processes include software architecture modeling, timing predictability verification, simulation, and hardware realization and deployment. Moreover, throughout the development of automotive embedded systems, the safety and security requirements specified on these systems need to be duly taken into account. In this context, this work provides an overview of TSN in automotive applications and discusses the recent technological developments relevant to the adoption of TSN in automotive embedded systems. The work also points at the open challenges and future research directions. 

  • 13.
    Baloch, Naveed Khan
    et al.
    Univ Engn & Technol Taxila, Comp Engn Dept, Taxila 47040, Pakistan..
    Baig, Muhammad Iram
    Univ Engn & Technol Taxila, Elect Engn Dept, Taxila 47040, Pakistan..
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Defender: A Low Overhead and Efficient Fault-Tolerant Mechanism for Reliable on-Chip Router2019In: IEEE Access, E-ISSN 2169-3536, Vol. 7, p. 142843-142854Article in journal (Refereed)
    Abstract [en]

    The ever-shrinking size of a transistor has made Network on Chip (NoC) susceptible to faults. A single error in the NoC can disrupt the entire communication. In this paper, we introduce Defender, a fault-tolerant router architecture, that is capable of tolerating permanent faults in all the parts of the router. We intend to employ structural modifications in baseline router design to achieve fault tolerance. In Defender we provide the fault tolerance to the input ports and routing computation unit by grouping the neighboring ports together. Default winner strategy is used to provide fault resilience to the virtual channel arbiters and switch allocators. Multiple routes are provided to the crossbar to tolerate the faults. Defender provides improved fault tolerance to all stages of routers as compared to the currently prevailing fault tolerant router architectures. Reliability analysis using silicon protection factor (SPF) and Mean Time to Failure (MTTF) metrics confirms that our proposed design Defender is 10.78 times more reliable than baseline unprotected router and then the current state of the art architectures.

  • 14.
    Berisa, Aldin
    et al.
    Mälardalen University, School of Innovation, Design and Engineering.
    Ashjaei, Seyed Mohammad Hossein
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sjödin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Mubeen, Saad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Investigating and Analyzing CAN-to-TSN Gateway Forwarding Techniques2023In: Proc. - IEEE Int. Symp. Real-Time Distrib. Comput., ISORC, Institute of Electrical and Electronics Engineers Inc. , 2023, p. 136-145Conference paper (Refereed)
    Abstract [en]

    Controller Area Network (CAN) and Ethernet network are expected to co-exist in automotive industry as Ethernet provides a high-bandwidth communication, while CAN is a legacy cost-effective solution. Due to the shortcomings of conventional switched Etherent, such as determinism, IEEE Time Sensitive Networking (TSN) task group developed a set of standards to enhance the switched Ethernet technology providing low-jitter and deterministic communication. Considering these two network domains, we investigate various design approaches for a gateway that connects a CAN domain to a TSN domain. We present three gateway forwarding techniques and we develop end-to-end delay analysis methods for them. Via the analysis methods and applying them to synthetic use cases we show that the intuitive existing approach of encapsulating multiple CAN frames into a single Ethernet frame is not necessarily an efficient solution. In fact, we demonstrate several cases where it is preferable to encapsulate only one CAN frame into a TSN frame, in particular when we use a high speed TSN network. The results have a significant impact on developing such gateways as the implementation of the one-to-one frame encapsulation is considerably simpler than other complex gateway-forwarding techniques.

  • 15.
    Berisa, Aldin
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Panjevic, A.
    Arcticus Systems, Järfälla, Sweden.
    Kovac, I.
    Arcticus Systems, Järfälla, Sweden.
    Lyngbäck, H.
    HIAB, Hudiksvall, Sweden.
    Ashjaei, Seyed Mohammad Hossein
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sjödin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Mubeen, Saad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Comparative Evaluation of Various Generations of Controller Area Network Based on Timing Analysis2023In: IEEE Int. Conf. Emerging Technol. Factory Autom., ETFA, Institute of Electrical and Electronics Engineers Inc. , 2023Conference paper (Refereed)
    Abstract [en]

    This paper performs a comparative evaluation of various generations of Controller Area Network (CAN), including the classical CAN, CAN Flexible Data-Rate (FD), and CAN Extra Long (XL). We utilize response-time analysis for the evaluation. In this regard, we identify that the state of the art lacks the response-time analysis for CAN XL. Hence, we discuss the worst-case transmission times calculations for CAN XL frames and incorporate them to the existing analysis for CAN to support response-time analysis of CAN XL frames. Using the extended analysis, we perform a comparative evaluation of the three generations of CAN by analyzing an automotive industrial use case. In crux, we show that using CAN FD is more advantageous than the classical CAN and CAN XL when using frames with payloads of up to 8 bytes, despite the fact that CAN XL supports higher bit rates. For frames with 12-64 bytes payloads, CAN FD performs better than CAN XL when running at the same bit rate, but CAN XL performs better when running at a higher bit rate. Additionally, we discovered that CAN XL performs better than the classical CAN and CAN FD when the frame payload is over 64 bytes, even if it runs at the same or higher bit rates than CAN FD.

  • 16.
    Berisa, Aldin
    et al.
    Mälardalen University.
    Zhao, L.
    Beihang University, Beijing, China.
    Craciunas, S. S.
    TTTech Computertechnik Ag, Vienna, Austria.
    Ashjaei, Seyed Mohammad Hossein
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Mubeen, Saad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sjödin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    AVB-aware Routing and Scheduling for Critical Traffic in Time-sensitive Networks with Preemption2022In: ACM International Conference Proceeding Series, Association for Computing Machinery , 2022, p. 207-218Conference paper (Refereed)
    Abstract [en]

    The Time-Sensitive Network (TSN) amendments and protocols add capabilities on top of standard 802.1 Ethernet for guaranteeing the timeliness of both (isochronous) scheduled traffic (ST) and shaped (audio-video) communication (AVB) in distributed applications. ST streams are guaranteed via an offline computed schedule controlling the time-aware gate mechanism of IEEE 802.1Qbv, while AVB real-time streams are shaped via a credit-based shaper (CBS) and scheduler with lower-priority than ST. Although the two traffic classes use different TSN mechanisms, they are interrelated as the ST traffic class schedule influences the latency of AVB traffic. In this paper, we propose a method for the integration of the ST schedule synthesis with an analysis for the AVB class featuring IEEE 802.1Qbu frame preemption under different configurations to reduce the interference between the two classes. We first present a new worst-case response-time (WCRT) analysis for the AVB traffic class in TSN networks with preemption, considering an arbitrary number of AVB queues and different configurations for the CBS credit behavior. Then, we integrate the creation of ST schedule tables with the schedulability analysis of AVB traffic using a heuristic algorithm featuring frame preemption and a novel routing mechanism aimed at maximizing AVB schedulability. Finally, we evaluate our approach using both real-world and synthetic use cases showing the efficiency both in terms of schedule creation runtime and in terms of increasing the schedulability of lower-priority AVB traffic.

  • 17.
    Bidgoli, Ali M.
    et al.
    University of Tehran, Iran.
    Fattahi, Sepideh
    University of Tehran, Iran.
    Rezaei, Seyyed H. S.
    University of Tehran, Iran.
    Modarressi, Mehdi
    University of Tehran, Iran; Institute for Research in Fundamental Sciences (IPM), School of Computer Science, Iran.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems. Tallinn University of Technology, Tallinn, Estonia.
    NeuroPIM: Felxible Neural Accelerator for Processing-in-Memory Architectures2023In: Proceedings - 2023 26th International Symposium on Design and Diagnostics of Electronic Circuits and Systems, DDECS 2023, Institute of Electrical and Electronics Engineers Inc. , 2023, p. 51-56Conference paper (Refereed)
    Abstract [en]

    The performance of microprocessors under many modern workloads is mainly limited by the off-chip memory bandwidth. The emerging process-in-memory paradigm present a unique opportunity to reduce data movement overheads by moving computation closer to memory. State-of-the-art processing-in-memory proposals stack a logic layer on top of one or multiple memory layers in a 3D fashion and leverage the logic layer to build near-memory processing units. Such processing units are either application-specific accelerators or general-purpose cores. In this paper, we present NeuroPIM, a new processing-in-memory architecture that uses a neural network as the memory-side general-purpose accelerator. This design is mainly motivated by the observation that in many real-world applications, some program regions, or even the entire program, can be replaced by a neural network that is learned to approximate the program's output. NeuroPIM benefits from both the flexibility of general-purpose processors and superior performance of application-specific accelerators. Experimental results show that NeuroPIM provides up to 41% speedup over a processor-side neural network accelerator and up to 8x speedup over a general-purpose processor.

  • 18.
    Dabiri, Bita
    et al.
    College of Engineering University of Tehran, Tehran, Iran.
    Modarressi, Mehdi
    College of Engineering University of Tehran and School of Computor Science, IPM, Tehran, Iran.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems. Tallinn University of Technology, Estonia.
    Network-on-ReRAM for Scalable Processing-in-Memory Architecture Design2021In: Proceedings - 2021 24th Euromicro Conference on Digital System Design, DSD 2021, 2021, p. 143-149Conference paper (Refereed)
    Abstract [en]

    The non-volatile metal-oxide resistive random access memory (ReRAM) is an emerging alternative for the current memory technologies. The unique capability of ReRAM to perform analog and digital arithmetic and logic operations has enabled this technology to incorporate both computation and memory capabilities on the same unit. Due to this interesting property, there is a growing trend in recent years to implement emerging data-intensive applications on ReRAM structures. A typical ReRAM-based processing-in-memory architecture may consist tens to hundreds of ReRAM units (mats) that can either store or process data. To support such large-scale ReRAM structure, this paper proposes a scalable network-on-ReRAM architecture. The proposed network employs a novel associative router architecture, designed based on the ReRAM-based content-addressable memories. With the in-memory packet processing capability, this router yields higher throughput and resource utilization levels than a conventional router. This router is technology compatible with ReRAM and as our evaluations show, employing it to build a network-on-ReRAM makes the emerging ReRAM-based processing-in-memory architectures more scalable and performance-efficient.

  • 19.
    Daneshtalab, Masoud
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems. Tallinn University of Technology (Tal- Tech), Estonia.
    Modarressi, M.The Department of Electrical and Computer Engineering, University of Tehran, Iran.
    Hardware architectures for deep learning2020Collection (editor) (Other academic)
    Abstract [en]

    This book presents and discusses innovative ideas in the design, modelling, implementation, and optimization of hardware platforms for neural networks. The rapid growth of server, desktop, and embedded applications based on deep learning has brought about a renaissance in interest in neural networks, with applications including image and speech processing, data analytics, robotics, healthcare monitoring, and IoT solutions. Efficient implementation of neural networks to support complex deep learning-based applications is a complex challenge for embedded and mobile computing platforms with limited computational/storage resources and a tight power budget. Even for cloud-scale systems it is critical to select the right hardware configuration based on the neural network complexity and system constraints in order to increase power- and performance-efficiency. Hardware Architectures for Deep Learning provides an overview of this new field, from principles to applications, for researchers, postgraduate students and engineers who work on learning-based services and hardware platforms. 

  • 20.
    Ebrahimi, M.
    et al.
    KTH Royal Institute of Technology, Sweden.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    A General Methodology on Designing Acyclic Channel Dependency Graphs in Interconnection Networks2018In: IEEE Micro, ISSN 0272-1732, E-ISSN 1937-4143, Vol. 38, no 3, p. 79-85Article in journal (Refereed)
    Abstract [en]

    For the past three decades, the interconnection network has been developed based on two major theories, one by Dally and the other by Duato. In this article, we introduce EbDa with a simplified theoretical basis, which directly allows for designing an acyclic channel dependency graph and verifying algorithms on their freedom from deadlock. EbDa is composed of three theorems that enable extracting all allowable turns without dealing with turn models.

  • 21.
    Ebrahimi, M.
    et al.
    Royal Institute of Technology, Sweden.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems. Royal Institute of Technology, Sweden.
    EbDa: A new theory on design and verification of deadlock-free interconnection networks2017In: Proceedings - International Symposium on Computer Architecture, Institute of Electrical and Electronics Engineers Inc. , 2017, p. 703-715Conference paper (Refereed)
    Abstract [en]

    Freedom from deadlock is one of the most important issues when designing routing algorithms in on-chip/off-chip networks. Many works have been developed upon Dally's theory proving that a network is deadlock-free if there is no cyclic dependency on the channel dependency graph. However, fnding such acyclic graph has been very challenging, which limits Dally's theory to networks with a low number of channels. In this paper, we introduce three theorems that directly lead to routing algorithms with an acyclic channel dependency graph. We also propose the partitioning methodology, enabling a design to reach the maximum adaptiveness for the n-dimensional mesh and k-ary n-cube topologies with any given number of channels. In addition, deadlock-free routing algorithms can be derived ranging from maximally fully adaptive routing down to deterministic routing. The proposed theorems can drastically remove the diffculties of designing deadlock-free routing algorithms. 

  • 22.
    Ebrahimi, Masoumeh
    et al.
    KTH Royal Inst Technol, Stockholm, Sweden..
    Weldezion, Awet Yemane
    Hangofay AB, Stockholm, Sweden..
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    NoD: Network-on-Die as a Standalone NoC for Heterogeneous Many-core Systems in 2.5D ICs2017In: 2017 19TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND DIGITAL SYSTEMS (CADS), 2017, p. 28-33Conference paper (Refereed)
    Abstract [en]

    Due to a high cost of 3D IC process technology, the semiconductor industry is targeting 2.5D ICs with interposer as a fast and low-cost alternative to integrate dissimilar technologies. In this paper, we propose an independent network-on-chip die, called Network-on-Die (NoD), for 2.5D ICs that operates as a communication backbone for heterogeneous many-core systems on interposer. NoD is responsible for routing packets from a source router to a destination router, and the connections between routers and cores pass through the interposer. This technique eliminates the complexity of the routing algorithms in heterogeneous systems by turning the irregular form of NoC in 2.5D ICs into a regular/optimized one in NoD. The performance evaluation is verified through RTL simulations for a heterogeneous many-core system of varying die sizes and with asymmetric shapes. We provide the theoretical justification for our simulation results.

  • 23.
    Ebrahimi, Zahra
    et al.
    Shahrood University of Technology, Shahroud, Iran.
    Loni, Mohammad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Ghareh Baghi, Arash
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    A Review on Deep Learning Methods for ECG Arrhythmia Classification2020In: Expert systems with applications, ISSN 0957-4174, E-ISSN 1873-6793, Vol. 7, article id 100033Article in journal (Refereed)
    Abstract [en]

    Deep Learning (DL) has recently become a topic of study in different applications including healthcare, in which timely detection of anomalies on Electrocardiogram (ECG) can play a vital role in patient monitoring. This paper presents a comprehensive review study on the recent DL methods applied to the ECG signal for the classification purposes. This study considers various types of the DL methods such as Convolutional Neural Network (CNN), Deep Belief Network (DBN), Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU). From the 75 studies reported within 2017 and 2018, CNN is dominantly observed as the suitable technique for feature extraction, seen in 52% of the studies. DL methods showed high accuracy in correct classification of Atrial Fibrillation (AF) (100%), Supraventricular Ectopic Beats (SVEB) (99.8%), and Ventricular Ectopic Beats (VEB) (99.7%) using the GRU/LSTM, CNN, and LSTM, respectively

  • 24.
    Fallah, M. K.
    et al.
    GC Shahid Beheshti University, Tehran, Iran.
    Mirhosseini, M.
    GC Shahid Beheshti University, Tehran, Iran.
    Fazlali, M.
    GC Shahid Beheshti University, Tehran, Iran.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Scalable parallel genetic algorithm for solving large integer linear programming models derived from behavioral synthesis2020In: Proceedings - 2020 28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2020, Institute of Electrical and Electronics Engineers Inc. , 2020, p. 390-394, article id 9092208Conference paper (Other academic)
    Abstract [en]

    Solving Integer Linear Programming (ILP) models generally lies in the category of NP-hard problems. Therefore, as the size of ILP models grows, the efficiency of exact algorithms for solving the models reduced significantly and for large models it is not possible to have the result. Genetic Algorithm (GA) is a metaheuristic method capable of adjusting and redesigning parameters and operations according to the characteristics of ILP models. Still GA has huge search space for large models and parallelization is a suitable technique to tackle this problem. This paper presents a scalable parallel GA to solve large ILP models derived from behavioral synthesis of digital circuits. We show that although models have non-binary variables, only binary variables are sufficient for coding chromosomes. We also use 'unknown' values for some genes to decrease the likelihood of inconsistency in the encoded constraints. Our experiments verify the efficiency and scalability of the proposed algorithm on multicore platforms. The proposed method outperforms IBM ILOG CPLEX 12.6 and MI-LXPM algorithm where the ILP models include 550 to 2258 int / binary decision variables. Also, the results indicate that the saturation point of using parallel processing elements for solving the large ILP models is at least 60. 

  • 25.
    Fallah, Mohammad K.
    et al.
    GC Shahid Beheshti Univ, Fac Math Sci, Dept Data & Comp Sci, Tehran, Iran.
    Fazlali, Mahmood
    GC Shahid Beheshti Univ, Fac Math Sci, Dept Data & Comp Sci, Tehran, Iran.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    A symbiosis between population based incremental learning and LP-relaxation based parallel genetic algorithm for solving integer linear programming modelsIn: Computing, ISSN 0010-485X, E-ISSN 1436-5057Article in journal (Refereed)
    Abstract [en]

    Solving Integer Linear Programming (ILP) models generally lies in the category of NP-hard problems and finding the optimal answer for large models is a computational challenge. Genetic algorithms are a family of metaheuristic algorithms capable of adjusting and redesigning parameters and operations according to the characteristics of ILP models. On the other hand, still the genetic algorithm performs a lot of operations to solve large models, and parallel processing is a suitable technique to tackle this problem. This paper introduces an LP-Relaxation based parallel genetic algorithm that uses a population-based incremental learning technique to presents an expandable solver for large ILP models derived from a behavioral synthesis of digital circuits. In the proposed algorithm, each chromosome provides a state subspace of possible solutions, and each generation is produced based on a probability vector as well as elitism. Our experiments verify the efficiency of the proposed algorithm on multicore platforms, as it outperformed four previous genetic algorithms for solving mixed integer programming problems. The proposed genetic algorithm solved 20 ILP models include up to 5183 int / binary decision variables in less than 20 min using four 16-core AMD Opteron 6386 SE processors. Also, the results indicate that for models with more than 4000 variables, the speedup and the efficiency of the proposed parallel genetic algorithm on 60 CPU cores is more than 18X and 30%, respectively.

  • 26.
    Firuzan, A.
    et al.
    Islamic Azad University, Tehran, Iran.
    Modarressi, M.
    University of Tehran and IPM School of ComputerScience, Tehran, Iran.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Reshadi, M.
    Islamic Azad University, Tehran, Iran.
    Reconfigurable Network-on-Chip for 3D Neural Network Accelerators2018In: 2018 12th IEEE/ACM International Symposium on Networks-on-Chip, NOCS 2018, Institute of Electrical and Electronics Engineers Inc. , 2018Conference paper (Refereed)
    Abstract [en]

    Parallel hardware accelerators for large-scale neural networks typically consist of several processing nodes, arranged as a multi- or many-core system-on-chip, connected by a network-on-chip (NoC). Recent proposals also benefit from the emerging 3D memory-on-logic architectures to provide sufficient bandwidth for neural networks. Handling the heavy traffic between neurons and memory and also the multicast-based inter-neuron traffic, which often varies over time, is the most challenging design consideration for the networks-on-chip in such accelerators. To address these issues, a reconfigurable network-on-chip architecture for 3D memory-on-logic neural network accelerators is presented in this paper. The reconfigurable NoC can adapt its topology to the on-chip traffic patterns. It can be also configured as a tree-like structure to support multicast-based neuron-to-neuron and memory-to-neuron traffic of neural networks. The evaluation results show that the proposed architecture can better manage the multicast-based traffic of neural networks than some state-of-the-art topologies and considerably increase throughput and power efficiency. 

  • 27.
    Forsberg, Håkan
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Linden, J.
    Gripen C/D Saab Aeronautics.
    Hjorth, Johan
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Manefjord, T.
    Avionics Systems Saab.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Challenges in using neural networks in safety-critical applications2020In: AIAA/IEEE Digital Avionics Systems Conference - Proceedings, Institute of Electrical and Electronics Engineers Inc. , 2020Conference paper (Refereed)
    Abstract [en]

    In this paper, we discuss challenges when using neural networks (NNs) in safety-critical applications. We address the challenges one by one, with aviation safety in mind. We then introduce a possible implementation to overcome the challenges. Only a small portion of the solution has been implemented physically and much work is considered as future work. Our current understanding is that a real implementation in a safety-critical system would be extremely difficult. Firstly, to design the intended function of the NN, and secondly, designing monitors needed to achieve a deterministic and fail-safe behavior of the system. We conclude that only the most valuable implementations of NNs should be considered as meaningful to implement in safety-critical systems.

  • 28.
    Ghaderi, Adnan
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Ashjaei, Seyed Mohammad Hossein
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Loni, Mohammad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Mubeen, Saad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sjödin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Design challenges in hardware development of time-sensitive networking: A research plan2019In: CEUR Workshop Proceedings, Volume 2457, CEUR-WS , 2019, Vol. 2457Conference paper (Refereed)
    Abstract [en]

    Time-Sensitive Networking (TSN) is a set of ongoing projects within the IEEE standardization to guarantee timeliness and low-latency communication based on switched Ethernet for industrial applications. The huge demand is mainly coming from industries where intensive data transmission is required, such as in the modern vehicles where cameras, lidars and high-bandwidth modern sensors are connected. The TSN standards are evolving over time, hence the hardware needs to change depending upon the modifications. In addition, high performance hardware is required to obtain a full benefit from the standards. In this paper, we present a research plan for developing novel techniques to support a parameterized and modular hardware IP core of the multi-stage TSN switch fabric in VHSIC (Very High Speed Integrated Circuit) Hardware Description Language (VHDL), which can be deployed in any Field-Programmable-Gate-Array (FPGA) devices. We present the challenges on the way towards the mentioned goal. 

  • 29.
    Hojabr, R.
    et al.
    School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran.
    Khonsari, A.
    School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran.
    Modarressi, M.
    School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Feedforward neural networks on massively parallel architectures2020In: Hardware Architectures for Deep Learning, Institution of Engineering and Technology , 2020, p. 53-76Chapter in book (Other academic)
    Abstract [en]

    In this chapter, we present ClosNN, a specialized NoC for NNs based on the well-known Clos topology. Clos is perhaps the most popular Multistage Interconnection Network (MIN) topology. Clos is used commonly as a base of switching infrastructures in various commercial telecommunication and network routers and switches.

  • 30.
    Houtan, Bahar
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Innovation and Product Realisation. Mälardalen University.
    Ashjaei, Seyed Mohammad Hossein
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sjödin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Afshar, S.
    Volvo Construction Equipment, Eskilstuna, Sweden.
    Mubeen, Saad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Schedulability Analysis of Best-Effort Traffic in TSN Networks2021In: IEEE International Conference on Emerging Technologies and Factory Automation, ETFA, Institute of Electrical and Electronics Engineers (IEEE), 2021Conference paper (Other academic)
    Abstract [en]

    This paper presents a schedulability analysis for the Best-Effort (BE) traffic class within Time-Sensitive Networking (TSN) networks. The presented analysis considers several features in the TSN standards, including the Credit-Based Shaper (CBS), the Time-Aware Shaper (TAS), and the frame preemption. Although the BE class in TSN is primarily used for the traffic with no strict timing requirements, some industrial applications prefer to utilize this class for the non-hard real-time traffic instead of classes that use the CBS. The reason mainly lies in the fact that the complexity of TSN configuration becomes significantly high when the time-triggered traffic via the TAS and other classes via the CBS are used altogether. We demonstrate the applicability of the presented analysis on a vehicular application use case. We show that a network designer can get information on the schedulability of the BE traffic, based on which the network configuration can be further refined with respect to the application requirements. 

  • 31.
    Houtan, Bahar
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Ashjaei, Seyed Mohammad Hossein
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sjödin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Mubeen, Saad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Developing Predictable Vehicular Embedded Systems Utilizing Time-Sensitive Networking–A Research Plan2019In: 15th Swedish National Computer Networking Workshop (SNCNW'19) SNCNW 2019, 2019Conference paper (Refereed)
  • 32.
    Houtan, Bahar
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Innovation and Product Realisation.
    Ashjaei, Seyed Mohammad Hossein
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sjödin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Mubeen, Saad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Supporting end-to-end data propagation delay analysis for TSN-based distributed vehicular embedded systems2023In: Journal of systems architecture, ISSN 1383-7621, E-ISSN 1873-6165, Vol. 141, article id 102911Article in journal (Refereed)
    Abstract [en]

    In this paper, we identify that the existing end-to-end data propagation delay analysis for distributed embedded systems can calculate pessimistic (over-estimated) analysis results when the nodes are synchronized. This is particularly the case of the Scheduled Traffic (ST) class in Time-sensitive Networking (TSN), which is scheduled offline according to the IEEE 802.1Qbv standard and the nodes are synchronized according to the IEEE 802.1AS standard. We present a comprehensive system model for distributed embedded systems that incorporates all of the above mentioned aspect as well as all traffic classes in TSN. We extend the analysis to support both synchronization and non-synchronization among the ECUs as well as offline schedules on the networks. The extended analysis can now be used to analyze all traffic classes in TSN when the nodes are synchronized without introducing any pessimism in the analysis results. We evaluate the proposed model and the extended analysis on a vehicular industrial use case.

  • 33.
    Houtan, Bahar
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Ashjaei, Seyed Mohammad Hossein
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sjödin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Mubeen, Saad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Supporting End-to-end Data-propagation Delay Analysis for TSN Networks2021Report (Other academic)
    Abstract [en]

    End-to-end data-propagation delay analysis allows verification of important timing constraints, such as age and reaction, that areoften specified on chains of tasks and messages in real-time systems.We identify that the existing analysis does not support distributed taskchains that include the Time-Sensitive Networking (TSN) messages. Tothis end, this paper extends the existing analysis to allow the end-to-endtiming analysis of distributed task chains that include TSN messages.The extended analysis supports all types of traffic in TSN, includingthe Scheduled Traffic (ST), Audio Video Bridging (AVB), and BestEffort (BE) traffic. Furthermore, the extended analysis accounts for thesynchronization among the end stations that are connected via TSN.The applicability of the analysis is demonstrated using an automotiveapplication case study. 

    Download full text (pdf)
    paper
  • 34.
    Houtan, Bahar
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Ashjaei, Seyed Mohammad Hossein
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sjödin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Mubeen, Saad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Synthesising Schedules to Improve QoS of Best-effort Traffic in TSN Networks2021In: 29th International Conference on Real-Time Networks and Systems (RTNS'21) RTNS 2021, 2021, p. 68-77Conference paper (Refereed)
    Abstract [en]

    The IEEE Time-Sensitive Networking (TSN) standards' amendment 802.1Qbv provides real-time guarantees for Scheduled Traffic (ST) streams by the Time Aware Shaper (TAS) mechanism. In this paper, we develop offline schedule optimization objective functions to configure the TAS for ST streams, which can be effective to achieve a high Quality of Service (QoS) of lower priority Best-Effort (BE) traffic. This becomes useful if real-time streams from legacy protocols are configured to be carried by the BE class or if the BE class is used for value-added (but non-critical) services. We present three alternative objective functions, namely Maximization, Sparse and Evenly Sparse, followed by a set of constraints on ST streams. Based on simulated stream traces in OMNeT++/INET TSN NeSTiNg simulator, we compare our proposed schemes with a most commonly applied objective function in terms of overall maximum end-to-end delay and deadline misses of BE streams. The results confirm that changing the schedule synthesis objective to our proposed schemes ensures timely delivery and lower end-to-end delays in BE streams.

  • 35.
    Houtan, Bahar
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Ashjaei, Seyed Mohammad Hossein
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sjödin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Mubeen, Saad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Work in progress: Investigating the effects of high priority traffic on the best effort traffic in TSN networks2019In: Proceedings - Real-Time Systems Symposium, Institute of Electrical and Electronics Engineers Inc. , 2019, p. 556-559, article id 9052124Conference paper (Refereed)
    Abstract [en]

    This paper investigates the effects of various parameters of high priority traffic classes on the Best Effort (BE) traffic in the networks based on the IEEE Time Sensitive Networking (TSN) standards. In this regard, the paper discusses ongoing work and presents preliminary results using a TSN simulator. The results indicate that several parameters of the high priority traffic such as periods, offsets and preemption modes can have a significant impact on the quality of service (e.g., guaranteed message delivery and message delays) of the BE traffic.

  • 36.
    Houtan, Bahar
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Innovation and Product Realisation.
    Aybek, M. O.
    Arcticus Systems, Järfälla, Sweden.
    Ashjaei, Seyed Mohammad Hossein
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sjödin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Lundbäck, J.
    Arcticus Systems, Järfälla, Sweden.
    Mubeen, Saad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    End-to-end Timing Modeling and Analysis of TSN in Component-Based Vehicular Software2023In: Proc. - IEEE Int. Symp. Real-Time Distrib. Comput., ISORC, Institute of Electrical and Electronics Engineers Inc. , 2023, p. 126-135Conference paper (Refereed)
    Abstract [en]

    In this paper, we present an end-to-end timing model to capture timing information from software architectures of distributed embedded systems that use network communication based on the Time-Sensitive Networking (TSN) standards. Such a model is required as an input to perform end-to-end timing analysis of these systems. Furthermore, we present a methodology that aims at automated extraction of instances of the end-to-end timing model from component-based software architectures of the systems and the TSN network configurations. As a proof of concept, we implement the proposed end-to-end timing model and the extraction methodology in the Rubus Component Model (RCM) and its tool chain Rubus-ICE that are used in the vehicle industry. We demonstrate the usability of the proposed model and methodology by modeling a vehicular industrial use case and performing its timing analysis.

  • 37.
    Houtan, Bahar
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Innovation and Product Realisation.
    Aybek, M. O.
    Arcticus Systems, Järfälla, Sweden.
    Ashjaei, Seyed Mohammad Hossein
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sjödin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Mubeen, Saad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    End-to-end Timing Model Extraction from TSN-Aware Distributed Vehicle Software2022In: Proc. - Euromicro Conf. Softw. Eng. Adv. Appl., SEAA, Institute of Electrical and Electronics Engineers Inc. , 2022, p. 366-369Conference paper (Refereed)
    Abstract [en]

    Extraction of end-to-end timing information from software architectures of vehicular systems to support their timing analysis is a daunting challenge. To address this challenge, this paper presents a systematic method to extract this information from vehicular software architectures that can be distributed over several electronic control units connected by Time-Sensitive Networking (TSN) networks. As a proof of concept, the proposed extraction method is applied to an industrial component model, namely the Rubus Component Model (RCM), and its toolchain. Furthermore, the usability of the proposed method is demonstrated in an industrial use case from the vehicular domain.

  • 38.
    Houtan, Bahar
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Bergström, Albert
    Mälardalen University.
    Ashjaei, Seyed Mohammad Hossein
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sjödin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Mubeen, Saad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    An Automated Configuration Framework for TSN Networks2021In: 22nd IEEE International Conference on Industrial Technology (ICIT'21) ICIT 2021, 2021, p. 771-778Conference paper (Refereed)
    Abstract [en]

    Designing and simulating large networks, based on the Time-Sensitive Networking (TSN) standards, require complex and demanding configuration at the design and pre-simulation phases. The existing configuration and simulation frameworks support only the manual configuration of TSN networks. This hampers the applicability of these frameworks to large-sized TSN networks, especially in complex industrial embedded system applications. This paper proposes a modular framework to automate offline scheduling in TSN networks to facilitate the design time and pre-simulation automated network configurations as well as interpretation of the simulations. To demonstrate and evaluate the applicability of the proposed framework, a large TSN network is automatically configured and its performance is evaluated by measuring end-to-end delays of time-critical flows in a state-of-the-art simulation framework, namely NeSTiNg.

  • 39.
    Kakakhel, S. R. U.
    et al.
    Department of Future Technologies, University of Turku, Turku, Finland.
    Westerlund, T.
    Department of Future Technologies, University of Turku, Turku, Finland.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Zou, Z.
    Micro-Nano System Center, Fudan University, Shanghai, China.
    Plosila, J.
    Department of Future Technologies, University of Turku, Turku, Finland.
    Tenhunen, H.
    Department of Future Technologies, University of Turku, Turku, Finland.
    A qualitative comparison model for application layer IoT protocols2019In: 2019 4th International Conference on Fog and Mobile Edge Computing, FMEC 2019, Institute of Electrical and Electronics Engineers Inc. , 2019, p. 210-215Conference paper (Refereed)
    Abstract [en]

    Protocols enable things to connect and communicate, thus making the Internet of Things possible. The performance aspect of the Internet of Things protocols, vital to its widespread utilization, have received much attention. However, one aspect of IoT protocols, essential to its adoption in the real world, is a protocols' feature set. Comparative analysis based on competing features and properties are rarely if ever, discussed in the literature. In this paper, we define 19 attributes in 5 categories that are essential for IoT stakeholders to consider. These attributes are then used to contrast four IoT protocols, MQTT, HTTP, CoAP and XMPP. Furthermore, we discuss scenarios where an assessment based on comparative strengths and weaknesses would be beneficial. The provided comparison model can be easily extended to include protocols like MQTT-SN, AMQP and DDS. 

  • 40.
    Linden, Joakim
    et al.
    Saab Aeronaut, Jarfalla, Sweden..
    Forsberg, Håkan
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Haddad, Josef
    Saab Aeronaut, Jarfalla, Sweden..
    Tagebrand, Emil
    Saab Aeronaut, Jarfalla, Sweden..
    Cedernaes, Erasmus
    Saab Aeronaut, Jarfalla, Sweden..
    Ek, Emil Gustafsson
    Saab Aeronaut, Jarfalla, Sweden..
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Curating Datasets for Visual Runway Detection2021In: 2021 IEEE/AIAA 40TH DIGITAL AVIONICS SYSTEMS CONFERENCE (DASC), IEEE , 2021Conference paper (Refereed)
    Abstract [en]

    In Machine Learning systems, several factors impact the performance of a trained model. The most important ones include model architecture, the amount of training time, the dataset size and diversity. In the realm of safety-critical machine learning the used datasets need to reflect the environment in which the system is intended to operate, in order to minimize the generalization gap between trained and real-world inputs. Datasets should be thoroughly prepared and requirements on the properties and characteristics of the collected data need to be specified. In our work we present a case study in which generating a synthetic dataset is accomplished based on real-world flight data from the ADS-B system, containing thousands of approaches to several airports to identify real-world statistical distributions of relevant variables to vary within our dataset sampling space. We also investigate what the effects are of training a model on synthetic data to different extents, including training on translated image sets (using domain adaptation). Our results indicate airport location to be the most critical parameter to vary. We also conclude that all experiments did benefit in performance from pre-training on synthetic data rather than using only real data, however this did not hold true in general for domain adaptation-translated images.

  • 41.
    Lindén, Joakim
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems. Saab AB, Linköping, Sweden.
    Forsberg, Håkan
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Söderquist, I.
    Royal Institute of Technology, Stockholm, Sweden; Saab AB, Linköping, Sweden.
    Evaluating the Robustness of ML Models to Out-of-Distribution Data Through Similarity Analysis2023In: Commun. Comput. Info. Sci., Springer Science and Business Media Deutschland GmbH , 2023, p. 348-359Conference paper (Refereed)
    Abstract [en]

    In Machine Learning systems, several factors impact the performance of a trained model. The most important ones include model architecture, the amount of training time, the dataset size and diversity. We present a method for analyzing datasets from a use-case scenario perspective, detecting and quantifying out-of-distribution (OOD) data on dataset level. Our main contribution is the novel use of similarity metrics for the evaluation of the robustness of a model by introducing relative Fréchet Inception Distance (FID) and relative Kernel Inception Distance (KID) measures. These relative measures are relative to a baseline in-distribution dataset and are used to estimate how the model will perform on OOD data (i.e. estimate the model accuracy drop). We find a correlation between our proposed relative FID/relative KID measure and the drop in Average Precision (AP) accuracy on unseen data.

  • 42.
    Loni, Mohammad
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Ahlberg, Carl
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Ekström, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sjödin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Embedded Acceleration of Image Classification Applications for Stereo Vision Systems2018In: Design, Automation & Test in Europe Conference & Exhibition DATE'18, 2018Conference paper (Other academic)
  • 43.
    Loni, Mohammad
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sjödin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    ADONN: Adaptive design of optimized deep neural networks for embedded systems2018In: Proceedings - 21st Euromicro Conference on Digital System Design, DSD 2018, Institute of Electrical and Electronics Engineers Inc. , 2018, p. 397-404Conference paper (Refereed)
    Abstract [en]

    Nowadays, many modern applications, e.g. autonomous system, and cloud data services need to capture and process a big amount of raw data at runtime that ultimately necessitates a high-performance computing model. Deep Neural Network (DNN) has already revealed its learning capabilities in runtime data processing for modern applications. However, DNNs are becoming more deep sophisticated models for gaining higher accuracy which require a remarkable computing capacity. Considering high-performance cloud infrastructure as a supplier of required computational throughput is often not feasible. Instead, we intend to find a near-sensor processing solution which will lower the need for network bandwidth and increase privacy and power efficiency, as well as guaranteeing worst-case response-times. Toward this goal, we introduce ADONN framework, which aims to automatically design a highly robust DNN architecture for embedded devices as the closest processing unit to the sensors. ADONN adroitly searches the design space to find improved neural architectures. Our proposed framework takes advantage of a multi-objective evolutionary approach, which exploits a pruned design space inspired by a dense architecture. Unlike recent works that mainly have tried to generate highly accurate networks, ADONN also considers the network size factor as the second objective to build a highly optimized network fitting with limited computational resource budgets while delivers comparable accuracy level. In comparison with the best result on CIFAR-10 dataset, a generated network by ADONN presents up to 26.4 compression rate while loses only 4% accuracy. In addition, ADONN maps the generated DNN on the commodity programmable devices including ARM Processor, High-Performance CPU, GPU, and FPGA.

  • 44.
    Loni, Mohammad
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Hamouachy, Fadouao
    Casarrubios, Clémentine
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Nolin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    AutoRIO: An Indoor Testbed for Developing Autonomous Vehicles2018In: International Japan-Africa Conference on Electronics, Communications and Computations JAC-ECC, 2018, p. 69-72Conference paper (Refereed)
    Abstract [en]

    Autonomous vehicles have a great influence on our life. These vehicles are more convenient, more energy efficient providing higher safety level and cheaper driving solutions. In addition, decreasing the generation of CO 2 , and the risk vehicular accidents are other benefits of autonomous vehicles. However, leveraging a full autonomous system is challenging and the proposed solutions are newfound. Providing a testbed for evaluating new algorithms is beneficial for researchers and hardware developers to verify the real impact of their solutions. The existence of testing environment is a low-cost infrastructure leading to increase the time-to-market of novel ideas. In this paper, we propose Auto Rio, a cutting-edge indoor testbed for developing autonomous vehicles.

  • 45.
    Loni, Mohammad
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Majd, Amin
    Åbo Akademi University, Turku, Finland.
    Loni, Abdolah
    KTH Royal Institute of Technology, Stockholm, Sweden.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Nolin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Troubitsyna, Elena
    KTH Royal Institute of Technology, Stockholm, Sweden.
    Designing Compact Convolutional Neural Network for Embedded Stereo Vision Systems2018In: IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip MCSoC-2018, 2018, p. 244-251, article id 8540240Conference paper (Refereed)
  • 46.
    Loni, Mohammad
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Mousavi, Hamid
    Mälardalen University.
    Riazati, Mohammad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sjödin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    TAS: Ternarized Neural Architecture Search for Resource-Constrained Edge Devices2022Conference paper (Refereed)
    Abstract [en]

    Ternary Neural Networks (TNNs) compress network weights and activation functions into 2-bit representation resulting in remarkable network compression and energy efficiency. However, there remains a significant gap in accuracy between TNNs and full-precision counterparts. Recent advances in Neural Architectures Search (NAS) promise opportunities in automated optimization for various deep learning tasks. Unfortunately, this area is unexplored for optimizing TNNs. This paper proposes TAS, a framework that drastically reduces the accuracy gap between TNNs and their full-precision counterparts by integrating quantization into the network design. We experienced that directly applying NAS to the ternary domain provides accuracy degradation as the search settings are customized for full-precision networks. To address this problem, we propose (i) a new cell template for ternary networks with maximum gradient propagation; and (ii) a novel learnable quantizer that adaptively relaxes the ternarization mechanism from the distribution of the weights and activation functions. Experimental results reveal that TAS delivers 2.64% higher accuracy and 2.8x memory saving over competing methods with the same bit-width resolution on the CIFAR-10 dataset. These results suggest that TAS is an effective method that paves the way for the efficient design of the next generation of quantized neural networks.

  • 47.
    Loni, Mohammad
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sinaei, Sima
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Zoljodi, A.
    Shiraz University of Technology, Shiraz, Iran.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sjödin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    DeepMaker: A multi-objective optimization framework for deep neural networks in embedded systems2020In: Microprocessors and microsystems, ISSN 0141-9331, E-ISSN 1872-9436, Vol. 73, article id 102989Article in journal (Refereed)
    Abstract [en]

    Deep Neural Networks (DNNs) are compute-intensive learning models with growing applicability in a wide range of domains. Due to their computational complexity, DNNs benefit from implementations that utilize custom hardware accelerators to meet performance and response time as well as classification accuracy constraints. In this paper, we propose DeepMaker framework that aims to automatically design a set of highly robust DNN architectures for embedded devices as the closest processing unit to the sensors. DeepMaker explores and prunes the design space to find improved neural architectures. Our proposed framework takes advantage of a multi-objective evolutionary approach that exploits a pruned design space inspired by a dense architecture. DeepMaker considers the accuracy along with the network size factor as two objectives to build a highly optimized network fitting with limited computational resource budgets while delivers an acceptable accuracy level. In comparison with the best result on the CIFAR-10 dataset, a generated network by DeepMaker presents up to a 26.4x compression rate while loses only 4% accuracy. Besides, DeepMaker maps the generated CNN on the programmable commodity devices, including ARM Processor, High-Performance CPU, GPU, and FPGA. 

  • 48.
    Loni, Mohammad
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Zoljodi, Ali
    Shiraz Univ Technol, Shiraz, Iran.
    Maier, Daniel
    Technische Universität Berlin, Germany.
    Majd, Amin
    Abo Akad Univ, Dept Informat Technol, Turku, Finland.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sjödin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems. ES (Embedded Systems).
    Juurlink, Ben
    Technische Universität Berlin, Germany.
    Akbari, Reza
    Shiraz Univ Technol, Shiraz, Iran.
    DenseDisp: Resource-Aware Disparity Map Estimation by Compressing Siamese Neural Architecture2020In: IEEE WORLD CONGRESS ON COMPUTATIONAL INTELLIGENCE (WCCI) 2020 IEEE WCCI, Glasgow, United Kingdom, 2020Conference paper (Refereed)
    Abstract [en]

    Stereo vision cameras are flexible sensors due to providing heterogeneous information such as color, luminance, disparity map (depth), and shape of the objects. Today, Convolutional Neural Networks (CNNs) present the highest accuracy for the disparity map estimation [1]. However, CNNs require considerable computing capacity to process billions of floating-point operations in a real-time fashion. Besides, commercial stereo cameras produce huge size images (e.g., 10 Megapixels [2]), which impose a new computational cost to the system. The problem will be pronounced if we target resource-limited hardware for the implementation. In this paper, we propose DenseDisp, an automatic framework that designs a Siamese neural architecture for disparity map estimation in a reasonable time. DenseDisp leverages a meta-heuristic multi-objective exploration to discover hardware-friendly architectures by considering accuracy and network FLOPS as the optimization objectives. We explore the design space with four different fitness functions to improve the accuracy-FLOPS trade-off and convergency time of the DenseDisp. According to the experimental results, DenseDisp provides up to 39.1x compression rate while losing around 5% accuracy compared to the state-of-the-art results.

  • 49.
    Loni, Mohammad
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Zoljodi, Ali
    Shiraz University of Technology, Shiraz, Iran.
    Sinaei, Sima
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Nolin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    NeuroPower: Designing Energy Efficient Convolutional Neural Network Architecture for Embedded Systems2019In: Lecture Notes in Computer Science, Volume 11727, Munich, Germany: Springer , 2019, p. 208-222Conference paper (Refereed)
    Abstract [en]

    Convolutional Neural Networks (CNNs) suffer from energy-hungry implementation due to their computation and memory intensive processing patterns. This problem is even more significant by the proliferation of CNNs on embedded platforms. To overcome this problem, we offer NeuroPower as an automatic framework that designs a highly optimized and energy efficient set of CNN architectures for embedded systems. NeuroPower explores and prunes the design space to find improved set of neural architectures. Toward this aim, a multi-objective optimization strategy is integrated to solve Neural Architecture Search (NAS) problem by near-optimal tuning network hyperparameters. The main objectives of the optimization algorithm are network accuracy and number of parameters in the network. The evaluation results show the effectiveness of NeuroPower on energy consumption, compacting rate and inference time compared to other cutting-edge approaches. In comparison with the best results on CIFAR-10/CIFAR-100 datasets, a generated network by NeuroPower presents up to 2.1x/1.56x compression rate, 1.59x/3.46x speedup and 1.52x/1.82x power saving while loses 2.4%/-0.6% accuracy, respectively.

  • 50.
    Maabi, Somayeh
    et al.
    Shahid Beheshti University, Tehran, Iran.
    Safaei, Farshad R.Pour
    Shahid Beheshti University, Tehran, Iran .
    Rezaei, Amin
    University of Louisiana at Lafayette, Lafayette, United States.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems. Royal Institute of Technology (KTH), Stockholm, Sweden.
    Zhao, Dan
    Old Dominion University, Norfolk, United States .
    ERFAN: Efficient reconfigurable fault-tolerant deflection routing algorithm for 3-D Network-on-Chip2016In: International System on Chip Conference, IEEE Computer Society, 2016, p. 306-311Conference paper (Refereed)
    Abstract [en]

    With degradation in transistors dimensions and complication of circuits, Three-Dimensional Network-on-Chip (3-D NoC) is presented as a promising solution in electronic industry. By increasing the number of system components on a chip, the probability of failure will increase. Therefore, proposing fault tolerance mechanisms is an important target in emerging technologies. In this paper, two efficient fault-tolerant routing algorithms for 3-D NoC are presented. The presented algorithms have significant improvement in performance parameters, in exchange for small area overhead. Simulation results show that even with the presence of faults, the network latency is decreased in comparison with state-of-the-art works. In addition, the network reliability is improved reasonably.

12 1 - 50 of 91
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf