mdh.sePublications
Change search
Refine search result
1 - 11 of 11
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Akbari, N.
    et al.
    University of Tehran, Tehran, Iran.
    Modarressi, M.
    University of Tehran, Tehran, Iran.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems. Royal Institute of Technology (KTH), Sweden.
    Loni, Mohammad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems. Royal Institute of Technology (KTH), Sweden.
    A Customized Processing-in-Memory Architecture for Biological Sequence Alignment2018In: Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors, Institute of Electrical and Electronics Engineers Inc. , 2018, article id 8445124Conference paper (Refereed)
    Abstract [en]

    Sequence alignment is the most widely used operation in bioinformatics. With the exponential growth of the biological sequence databases, searching a database to find the optimal alignment for a query sequence (that can be at the order of hundreds of millions of characters long) would require excessive processing power and memory bandwidth. Sequence alignment algorithms can potentially benefit from the processing power of massive parallel processors due their simple arithmetic operations, coupled with the inherent fine-grained and coarse-grained parallelism that they exhibit. However, the limited memory bandwidth in conventional computing systems prevents exploiting the maximum achievable speedup. In this paper, we propose a processing-in-memory architecture as a viable solution for the excessive memory bandwidth demand of bioinformatics applications. The design is composed of a set of simple and lightweight processing elements, customized to the sequence alignment algorithm, integrated at the logic layer of an emerging 3D DRAM architecture. Experimental results show that the proposed architecture results in up to 2.4x speedup and 41% reduction in power consumption, compared to a processor-side parallel implementation. 

  • 2.
    Ghaderi, Adnan
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Ashjaei, Seyed Mohammad Hossein
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Loni, Mohammad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Mubeen, Saad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sjödin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Design challenges in hardware development of time-sensitive networking: A research plan2019In: CEUR Workshop Proceedings, Volume 2457, CEUR-WS , 2019, Vol. 2457Conference paper (Refereed)
    Abstract [en]

    Time-Sensitive Networking (TSN) is a set of ongoing projects within the IEEE standardization to guarantee timeliness and low-latency communication based on switched Ethernet for industrial applications. The huge demand is mainly coming from industries where intensive data transmission is required, such as in the modern vehicles where cameras, lidars and high-bandwidth modern sensors are connected. The TSN standards are evolving over time, hence the hardware needs to change depending upon the modifications. In addition, high performance hardware is required to obtain a full benefit from the standards. In this paper, we present a research plan for developing novel techniques to support a parameterized and modular hardware IP core of the multi-stage TSN switch fabric in VHSIC (Very High Speed Integrated Circuit) Hardware Description Language (VHDL), which can be deployed in any Field-Programmable-Gate-Array (FPGA) devices. We present the challenges on the way towards the mentioned goal. 

  • 3.
    Loni, Mohammad
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Ahlberg, Carl
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Ekström, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sjödin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Embedded Acceleration of Image Classification Applications for Stereo Vision Systems2018In: Design, Automation & Test in Europe Conference & Exhibition DATE'18, 2018Conference paper (Other academic)
  • 4.
    Loni, Mohammad
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sjödin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    ADONN: Adaptive design of optimized deep neural networks for embedded systems2018In: Proceedings - 21st Euromicro Conference on Digital System Design, DSD 2018, Institute of Electrical and Electronics Engineers Inc. , 2018, p. 397-404Conference paper (Refereed)
    Abstract [en]

    Nowadays, many modern applications, e.g. autonomous system, and cloud data services need to capture and process a big amount of raw data at runtime that ultimately necessitates a high-performance computing model. Deep Neural Network (DNN) has already revealed its learning capabilities in runtime data processing for modern applications. However, DNNs are becoming more deep sophisticated models for gaining higher accuracy which require a remarkable computing capacity. Considering high-performance cloud infrastructure as a supplier of required computational throughput is often not feasible. Instead, we intend to find a near-sensor processing solution which will lower the need for network bandwidth and increase privacy and power efficiency, as well as guaranteeing worst-case response-times. Toward this goal, we introduce ADONN framework, which aims to automatically design a highly robust DNN architecture for embedded devices as the closest processing unit to the sensors. ADONN adroitly searches the design space to find improved neural architectures. Our proposed framework takes advantage of a multi-objective evolutionary approach, which exploits a pruned design space inspired by a dense architecture. Unlike recent works that mainly have tried to generate highly accurate networks, ADONN also considers the network size factor as the second objective to build a highly optimized network fitting with limited computational resource budgets while delivers comparable accuracy level. In comparison with the best result on CIFAR-10 dataset, a generated network by ADONN presents up to 26.4 compression rate while loses only 4% accuracy. In addition, ADONN maps the generated DNN on the commodity programmable devices including ARM Processor, High-Performance CPU, GPU, and FPGA.

  • 5.
    Loni, Mohammad
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Hamouachy, Fadouao
    Casarrubios, Clémentine
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Nolin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    AutoRIO: An Indoor Testbed for Developing Autonomous Vehicles2019In: International Japan-Africa Conference on Electronics, Communications and Computations JAC-ECC, 2019, p. 69-72Conference paper (Refereed)
    Abstract [en]

    Autonomous vehicles have a great influence on our life. These vehicles are more convenient, more energy efficient providing higher safety level and cheaper driving solutions. In addition, decreasing the generation of CO 2 , and the risk vehicular accidents are other benefits of autonomous vehicles. However, leveraging a full autonomous system is challenging and the proposed solutions are newfound. Providing a testbed for evaluating new algorithms is beneficial for researchers and hardware developers to verify the real impact of their solutions. The existence of testing environment is a low-cost infrastructure leading to increase the time-to-market of novel ideas. In this paper, we propose Auto Rio, a cutting-edge indoor testbed for developing autonomous vehicles.

  • 6.
    Loni, Mohammad
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Majd, Amin
    Åbo Akademi University, Turku, Finland.
    Loni, Abdolah
    KTH Royal Institute of Technology, Stockholm, Sweden.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Nolin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Troubitsyna, Elena
    KTH Royal Institute of Technology, Stockholm, Sweden.
    Designing Compact Convolutional Neural Network for Embedded Stereo Vision Systems2018In: IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip MCSoC-2018, 2018, p. 244-251, article id 8540240Conference paper (Refereed)
  • 7.
    Loni, Mohammad
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Zoljodi, Ali
    Shiraz University of Technology, Shiraz, Iran.
    Seenan, Sima
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Nolin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    NeuroPower: Designing Energy Efficient Convolutional Neural Network Architecture for Embedded Systems2019In: Lecture Notes in Computer Science, Volume 11727, Munich, Germany: Springer , 2019, p. 208-222Conference paper (Refereed)
    Abstract [en]

    Convolutional Neural Networks (CNNs) suffer from energy-hungry implementation due to their computation and memory intensive processing patterns. This problem is even more significant by the proliferation of CNNs on embedded platforms. To overcome this problem, we offer NeuroPower as an automatic framework that designs a highly optimized and energy efficient set of CNN architectures for embedded systems. NeuroPower explores and prunes the design space to find improved set of neural architectures. Toward this aim, a multi-objective optimization strategy is integrated to solve Neural Architecture Search (NAS) problem by near-optimal tuning network hyperparameters. The main objectives of the optimization algorithm are network accuracy and number of parameters in the network. The evaluation results show the effectiveness of NeuroPower on energy consumption, compacting rate and inference time compared to other cutting-edge approaches. In comparison with the best results on CIFAR-10/CIFAR-100 datasets, a generated network by NeuroPower presents up to 2.1x/1.56x compression rate, 1.59x/3.46x speedup and 1.52x/1.82x power saving while loses 2.4%/-0.6% accuracy, respectively.

  • 8.
    Majd, Amin
    et al.
    Åbo Akademi, Finland.
    Loni, Mohammad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sahebi, Golnaz
    University of Turku, Finland.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Troubitsyna, Elena
    KTH, Sweden.
    A Cloud Based Super-Optimization Method to Parallelize the Sequential Code’s Nested Loops2019In: IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip MCSoC-2019, 2019Conference paper (Refereed)
    Abstract [en]

    Advances in hardware architecture regarding multi-core processors make parallel computing ubiquitous. To achieve the maximum utilization of multi-core processors, parallel programming techniques are required. However, there are several challenges standing in front of parallel programming. These problems are mainly divided into three major groups. First, although recent advancements in parallel programming languages (e.g. MPI, OpenCL, etc.) assist developers, still parallel programming is not desirable for most programmers. The second one belongs to the massive volume of old software and applications, which have been written in serial mode. However, converting millions of line of serial codes to parallel codes is highly time-consuming and requiring huge verification effort. Third, the production of software and applications in parallel mode is very expensive since it needs knowledge and expertise. Super-optimization provided by super compilers is the process of automatically determine the dependent and independent instructions to find any data dependency and loop-free sequence of instructions. Super compiler then runs these instructions on different processors in the parallel mode, if it is possible. Super-optimization is a feasible solution for helping the programmer to get relaxed from parallel programming workload. Since the most complexity of the sequential codes is in the nested loops, we try to parallelize the nested loops by using the idea of super-optimization. One of the underlying stages in the super-optimization is scheduling tiled space for iterating nested loops. Since the problem is NP-Hard, using the traditional optimization methods are not feasible. In this paper, we propose a cloud-based super-optimization method as Software-as-a-Service (SaaS) to reduce the cost of parallel programming. In addition, it increases the utilization of the processing capacity of the multi-core processor. As the result, an intermediate programmer can use the whole processing capacity of his/her system without knowing anything about writing parallel codes or super compiler functions by sending the serial code to a cloud server and receiving the parallel version of the code from the cloud server. In this paper, an evolutionary algorithm is leveraged to solve the scheduling problem of tiles. Our proposed super-optimization method will serve as software and provided as a hybrid (public and private) deployment model.

  • 9. Maleki, Neda
    et al.
    Loni, Mohammad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Conti, Mauro
    University of Padua, Italy .
    Fotouhi, Hossein
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    SoFA: A Spark-oriented Fog Architecture2019In: IEEE 45th Annual Conference of the Industrial Electronics Society IECON'19, 2019Conference paper (Refereed)
    Abstract [en]

    Fog computing offers a wide range of service levels including low bandwidth usage, low response time, support of heterogeneous applications, and high energy efficiency. Therefore, real-time embedded applications could potentially benefit from Fog infrastructure. However, providing high system utilization is an important challenge of Fog computing especially for processing embedded applications. In addition, although Fog computing extends cloud computing by providing more energy efficiency, it still suffers from remarkable energy consumption, which is a limitation for embedded systems. To overcome the above limitations, in this paper, we propose SoFA, a Spark-oriented Fog architecture that leverages Spark functionalities to provide higher system utilization, energy efficiency, and scalability. Compared to the common Fog computing platforms where edge devices are only responsible for processing data received from their IoT nodes, SoFA leverages the remaining processing capacity of all other edge devices. To attain this purpose, SoFA provides a distributed processing paradigm by the help of Spark to utilize the whole processing capacity of all the available edge devices leading to increase energy efficiency and system utilization. In other words, SoFA proposes a near- sensor processing solution in which the edge devices act as the Fog nodes. In addition, SoFA provides scalability by taking advantage of Spark functionalities. According to the experimental results, SoFA is a power-efficient and scalable solution desirable for embedded platforms by providing up to 3.1x energy efficiency for the Word-Count benchmark compared to the common Fog processing platform.

  • 10.
    Nazari, Najmeh
    et al.
    University of Tehran, Tehran , Iran.
    Loni, Mohammad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems. ES (Embedded Systems).
    E. Salehi, Mostafa
    University of Tehran, Tehran , Iran.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sjödin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    TOT-Net: An Endeavor Toward Optimizing Ternary Neural Networks2019In: 22nd Euromicro Conference on Digital System Design DSD 2019, 2019, p. 305-312, article id 8875067Conference paper (Refereed)
    Abstract [en]

    High computation demands and big memory resources are the major implementation challenges of Convolutional Neural Networks (CNNs) especially for low-power and resource-limited embedded devices. Many binarized neural networks are recently proposed to address these issues. Although they have significantly decreased computation and memory footprint, they have suffered from accuracy loss especially for large datasets. In this paper, we propose TOT-Net, a ternarized neural network with [-1, 0, 1] values for both weights and activation functions that has simultaneously achieved a higher level of accuracy and less computational load. In fact, first, TOT-Net introduces a simple bitwise logic for convolution computations to reduce the cost of multiply operations. To improve the accuracy, selecting proper activation function and learning rate are influential, but also difficult. As the second contribution, we propose a novel piece-wise activation function, and optimized learning rate for different datasets. Our findings first reveal that 0.01 is a preferable learning rate for the studied datasets. Third, by using an evolutionary optimization approach, we found novel piece-wise activation functions customized for TOT-Net. According to the experimental results, TOT-Net achieves 2.15%, 8.77%, and 5.7/5.52% better accuracy compared to XNOR-Net on CIFAR-10, CIFAR-100, and ImageNet top-5/top-1 datasets, respectively.

  • 11.
    Salimi, M.
    et al.
    Tehran University, Tehran, Iran.
    Majd, A.
    Åbo Akademi University, Turku, Finland.
    Loni, Mohammad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Seceleanu, Tiberiu
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Seceleanu, Cristina
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sirjani, Marjan
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Troubitsyna, E.
    Royal Institute of Technology, Stockholm, Sweden.
    Multi-objective optimization of real-time task scheduling problem for distributed environments2019In: ACM International Conference Proceeding Series, Association for Computing Machinery , 2019, article id a13Conference paper (Refereed)
    Abstract [en]

    Real-world applications are composed of multiple tasks which usually have intricate data dependencies. To exploit distributed processing platforms, task allocation and scheduling, that is assigning tasks to processing units and ordering inter-processing unit data transfers, plays a vital role. However, optimally scheduling tasks on processing units and finding an optimized network topology is an NP-complete problem. The problem becomes more complicated when the tasks have real-time deadlines for termination. Exploring the whole search space in order to find the optimal solution is not feasible in a reasonable amount of time, therefore meta-heuristics are often used to find a near-optimal solution. We propose here a multi-population evolutionary approach for near-optimal scheduling optimization, that guarantees end-to-end deadlines of tasks in distributed processing environments. We analyze two different exploration scenarios including single and multi-objective exploration. The main goal of the single objective exploration algorithm is to achieve the minimal number of processing units for all the tasks, whereas a multi-objective optimization tries to optimize two conflicting objectives simultaneously considering the total number of processing units and end-to-end finishing time for all the jobs. The potential of the proposed approach is demonstrated by experiments based on a use case for mapping a number of jobs covering industrial automation systems, where each of the jobs consists of a number of tasks in a distributed environment.

1 - 11 of 11
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf