https://www.mdu.se/

mdu.sePublications
Change search
Refine search result
1 - 26 of 26
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Akbari, N.
    et al.
    University of Tehran, Tehran, Iran.
    Modarressi, M.
    University of Tehran, Tehran, Iran.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems. Royal Institute of Technology (KTH), Sweden.
    Loni, Mohammad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems. Royal Institute of Technology (KTH), Sweden.
    A Customized Processing-in-Memory Architecture for Biological Sequence Alignment2018In: Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors, Institute of Electrical and Electronics Engineers Inc. , 2018, article id 8445124Conference paper (Refereed)
    Abstract [en]

    Sequence alignment is the most widely used operation in bioinformatics. With the exponential growth of the biological sequence databases, searching a database to find the optimal alignment for a query sequence (that can be at the order of hundreds of millions of characters long) would require excessive processing power and memory bandwidth. Sequence alignment algorithms can potentially benefit from the processing power of massive parallel processors due their simple arithmetic operations, coupled with the inherent fine-grained and coarse-grained parallelism that they exhibit. However, the limited memory bandwidth in conventional computing systems prevents exploiting the maximum achievable speedup. In this paper, we propose a processing-in-memory architecture as a viable solution for the excessive memory bandwidth demand of bioinformatics applications. The design is composed of a set of simple and lightweight processing elements, customized to the sequence alignment algorithm, integrated at the logic layer of an emerging 3D DRAM architecture. Experimental results show that the proposed architecture results in up to 2.4x speedup and 41% reduction in power consumption, compared to a processor-side parallel implementation. 

  • 2.
    Asadi, M.
    et al.
    Department of Electrical Engineering, Tarbiat Modares University, Tehran, Iran.
    Poursalim, F.
    Shiraz University of Medical Science, Shiraz, Iran.
    Loni, Mohammad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sjödin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Gharehbaghi, A.
    Department of Biomedical Engineering, Linköping University, Linköping, Sweden.
    Accurate detection of paroxysmal atrial fibrillation with certified-GAN and neural architecture search2023In: Scientific Reports, E-ISSN 2045-2322, Vol. 13, no 1Article in journal (Refereed)
    Abstract [en]

    This paper presents a novel machine learning framework for detecting PxAF, a pathological characteristic of electrocardiogram (ECG) that can lead to fatal conditions such as heart attack. To enhance the learning process, the framework involves a generative adversarial network (GAN) along with a neural architecture search (NAS) in the data preparation and classifier optimization phases. The GAN is innovatively invoked to overcome the class imbalance of the training data by producing the synthetic ECG for PxAF class in a certified manner. The effect of the certified GAN is statistically validated. Instead of using a general-purpose classifier, the NAS automatically designs a highly accurate convolutional neural network architecture customized for the PxAF classification task. Experimental results show that the accuracy of the proposed framework exhibits a high value of 99.0% which not only enhances state-of-the-art by up to 5.1%, but also improves the classification performance of the two widely-accepted baseline methods, ResNet-18, and Auto-Sklearn, by [Formula: see text] and [Formula: see text].

  • 3.
    Ebrahimi, Zahra
    et al.
    Shahrood University of Technology, Shahroud, Iran.
    Loni, Mohammad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Ghareh Baghi, Arash
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    A Review on Deep Learning Methods for ECG Arrhythmia Classification2020In: Expert systems with applications, ISSN 0957-4174, E-ISSN 1873-6793, Vol. 7, article id 100033Article in journal (Refereed)
    Abstract [en]

    Deep Learning (DL) has recently become a topic of study in different applications including healthcare, in which timely detection of anomalies on Electrocardiogram (ECG) can play a vital role in patient monitoring. This paper presents a comprehensive review study on the recent DL methods applied to the ECG signal for the classification purposes. This study considers various types of the DL methods such as Convolutional Neural Network (CNN), Deep Belief Network (DBN), Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU). From the 75 studies reported within 2017 and 2018, CNN is dominantly observed as the suitable technique for feature extraction, seen in 52% of the studies. DL methods showed high accuracy in correct classification of Atrial Fibrillation (AF) (100%), Supraventricular Ectopic Beats (SVEB) (99.8%), and Ventricular Ectopic Beats (VEB) (99.7%) using the GRU/LSTM, CNN, and LSTM, respectively

  • 4.
    Ghaderi, Adnan
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Ashjaei, Seyed Mohammad Hossein
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Loni, Mohammad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Mubeen, Saad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sjödin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Design challenges in hardware development of time-sensitive networking: A research plan2019In: CEUR Workshop Proceedings, Volume 2457, CEUR-WS , 2019, Vol. 2457Conference paper (Refereed)
    Abstract [en]

    Time-Sensitive Networking (TSN) is a set of ongoing projects within the IEEE standardization to guarantee timeliness and low-latency communication based on switched Ethernet for industrial applications. The huge demand is mainly coming from industries where intensive data transmission is required, such as in the modern vehicles where cameras, lidars and high-bandwidth modern sensors are connected. The TSN standards are evolving over time, hence the hardware needs to change depending upon the modifications. In addition, high performance hardware is required to obtain a full benefit from the standards. In this paper, we present a research plan for developing novel techniques to support a parameterized and modular hardware IP core of the multi-stage TSN switch fabric in VHSIC (Very High Speed Integrated Circuit) Hardware Description Language (VHDL), which can be deployed in any Field-Programmable-Gate-Array (FPGA) devices. We present the challenges on the way towards the mentioned goal. 

  • 5.
    Loni, Mohammad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems. MDH.
    DeepMaker: Customizing the Architecture of Convolutional Neural Networks for Resource-Constrained Platforms2020Licentiate thesis, comprehensive summary (Other academic)
    Abstract [en]

    Convolutional Neural Networks (CNNs) suffer from energy-hungry implementation due to requiring huge amounts of computations and significant memory consumption. This problem will be more highlighted by the proliferation of CNNs on resource-constrained platforms in, e.g., embedded systems. In this thesis, we focus on decreasing the computational cost of CNNs in order to be appropriate for resource-constrained platforms. The thesis work proposes two distinct methods to tackle the challenges: optimizing CNN architecture while considering network accuracy and network complexity, and proposing an optimized ternary neural network to compensate the accuracy loss of network quantization methods. We evaluated the impact of our solutions on Commercial-Off-The-Shelf (COTS) platforms where the results show considerable improvement in network accuracy and energy efficiency.

    Download full text (pdf)
    fulltext
  • 6.
    Loni, Mohammad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Efficient Design of Scalable Deep Neural Networks for Resource-Constrained Edge Devices2022Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Deep Neural Networks (DNNs) are increasingly being processed on resource-constrained edge nodes (computer nodes used in, e.g., cyber-physical systems or at the edge of computational clouds) due to efficiency, connectivity, and privacy concerns. This thesis investigates and presents new techniques to design and deploy DNNs for resource-constrained edge nodes. We have identified two major bottlenecks that hinder the proliferation of DNNs on edge nodes: (i) the significant computational demand for designing DNNs that consumes a low amount of resources in terms of energy, latency, and memory footprint; and (ii) further conserving resources by quantizing the numerical calculations of a DNN provides remarkable accuracy degradation.

    To address (i), we present novel methods for cost-efficient Neural Architecture Search (NAS) to automate the design of DNNs that should meet multifaceted goals such as accuracy and hardware performance. To address (ii), we extend our NAS approach to handle the quantization of numerical calculations by using only the numbers -1, 0, and 1 (so-called ternary DNNs), which achieves higher accuracy. Our experimental evaluation shows that the proposed NAS approach can provide a 5.25x reduction in design time and up to 44.4x reduction in network size compared to state-of-the-art methods. In addition, the proposed quantization approach delivers 2.64% higher accuracy and 2.8x memory saving compared to full-precision counterparts with the same bit-width resolution. These benefits are attained over a wide range of commercial-off-the-shelf edge nodes showing this thesis successfully provides seamless deployment of DNNs on resource-constrained edge nodes.

    Download full text (pdf)
    fulltext
  • 7.
    Loni, Mohammad
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Ahlberg, Carl
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Ekström, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sjödin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Embedded Acceleration of Image Classification Applications for Stereo Vision Systems2018In: Design, Automation & Test in Europe Conference & Exhibition DATE'18, 2018Conference paper (Other academic)
  • 8.
    Loni, Mohammad
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sjödin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    ADONN: Adaptive design of optimized deep neural networks for embedded systems2018In: Proceedings - 21st Euromicro Conference on Digital System Design, DSD 2018, Institute of Electrical and Electronics Engineers Inc. , 2018, p. 397-404Conference paper (Refereed)
    Abstract [en]

    Nowadays, many modern applications, e.g. autonomous system, and cloud data services need to capture and process a big amount of raw data at runtime that ultimately necessitates a high-performance computing model. Deep Neural Network (DNN) has already revealed its learning capabilities in runtime data processing for modern applications. However, DNNs are becoming more deep sophisticated models for gaining higher accuracy which require a remarkable computing capacity. Considering high-performance cloud infrastructure as a supplier of required computational throughput is often not feasible. Instead, we intend to find a near-sensor processing solution which will lower the need for network bandwidth and increase privacy and power efficiency, as well as guaranteeing worst-case response-times. Toward this goal, we introduce ADONN framework, which aims to automatically design a highly robust DNN architecture for embedded devices as the closest processing unit to the sensors. ADONN adroitly searches the design space to find improved neural architectures. Our proposed framework takes advantage of a multi-objective evolutionary approach, which exploits a pruned design space inspired by a dense architecture. Unlike recent works that mainly have tried to generate highly accurate networks, ADONN also considers the network size factor as the second objective to build a highly optimized network fitting with limited computational resource budgets while delivers comparable accuracy level. In comparison with the best result on CIFAR-10 dataset, a generated network by ADONN presents up to 26.4 compression rate while loses only 4% accuracy. In addition, ADONN maps the generated DNN on the commodity programmable devices including ARM Processor, High-Performance CPU, GPU, and FPGA.

  • 9.
    Loni, Mohammad
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Hamouachy, Fadouao
    Casarrubios, Clémentine
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Nolin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    AutoRIO: An Indoor Testbed for Developing Autonomous Vehicles2018In: International Japan-Africa Conference on Electronics, Communications and Computations JAC-ECC, 2018, p. 69-72Conference paper (Refereed)
    Abstract [en]

    Autonomous vehicles have a great influence on our life. These vehicles are more convenient, more energy efficient providing higher safety level and cheaper driving solutions. In addition, decreasing the generation of CO 2 , and the risk vehicular accidents are other benefits of autonomous vehicles. However, leveraging a full autonomous system is challenging and the proposed solutions are newfound. Providing a testbed for evaluating new algorithms is beneficial for researchers and hardware developers to verify the real impact of their solutions. The existence of testing environment is a low-cost infrastructure leading to increase the time-to-market of novel ideas. In this paper, we propose Auto Rio, a cutting-edge indoor testbed for developing autonomous vehicles.

  • 10.
    Loni, Mohammad
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Majd, Amin
    Åbo Akademi University, Turku, Finland.
    Loni, Abdolah
    KTH Royal Institute of Technology, Stockholm, Sweden.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Nolin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Troubitsyna, Elena
    KTH Royal Institute of Technology, Stockholm, Sweden.
    Designing Compact Convolutional Neural Network for Embedded Stereo Vision Systems2018In: IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip MCSoC-2018, 2018, p. 244-251, article id 8540240Conference paper (Refereed)
  • 11.
    Loni, Mohammad
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Mohan, A.
    Institute of Artificial Intelligence, Leibniz University Hannover, Germany.
    Asadi, M.
    Department of Electrical Engineering, Tarbiat Modares University, Tehran, Iran.
    Lindauer, M.
    Institute of Artificial Intelligence, Leibniz University Hannover, Germany.
    Learning Activation Functions for Sparse Neural Networks2023In: Proc. Mach. Learn. Res., ML Research Press , 2023Conference paper (Refereed)
    Abstract [en]

    Sparse Neural Networks (SNNs) can potentially demonstrate similar performance to their dense counterparts while saving significant energy and memory at inference. However, the accuracy drop incurred by SNNs, especially at high pruning ratios, can be an issue in critical deployment conditions. While recent works mitigate this issue through sophisticated pruning techniques, we shift our focus to an overlooked factor: hyperparameters and activation functions. Our analyses have shown that the accuracy drop can additionally be attributed to (i) Using ReLU as the default choice for activation functions unanimously, and (ii) Fine-tuning SNNs with the same hyperparameters as dense counterparts. Thus, we focus on learning a novel way to tune activation functions for sparse networks and combining these with a separate hyperparameter optimization (HPO) regime for sparse networks. By conducting experiments on popular DNN models (LeNet-5, VGG-16, ResNet-18, and EfficientNet-B0) trained on MNIST, CIFAR-10, and ImageNet-16 datasets, we show that the novel combination of these two approaches, dubbed Sparse Activation Function Search, short: SAFS, results in up to 15.53%, 8.88%, and 6.33% absolute improvement in the accuracy for LeNet-5, VGG-16, and ResNet-18 over the default training protocols, especially at high pruning ratios.

  • 12.
    Loni, Mohammad
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Mousavi, Hamid
    Mälardalen University.
    Riazati, Mohammad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sjödin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    TAS: Ternarized Neural Architecture Search for Resource-Constrained Edge Devices2022Conference paper (Refereed)
    Abstract [en]

    Ternary Neural Networks (TNNs) compress network weights and activation functions into 2-bit representation resulting in remarkable network compression and energy efficiency. However, there remains a significant gap in accuracy between TNNs and full-precision counterparts. Recent advances in Neural Architectures Search (NAS) promise opportunities in automated optimization for various deep learning tasks. Unfortunately, this area is unexplored for optimizing TNNs. This paper proposes TAS, a framework that drastically reduces the accuracy gap between TNNs and their full-precision counterparts by integrating quantization into the network design. We experienced that directly applying NAS to the ternary domain provides accuracy degradation as the search settings are customized for full-precision networks. To address this problem, we propose (i) a new cell template for ternary networks with maximum gradient propagation; and (ii) a novel learnable quantizer that adaptively relaxes the ternarization mechanism from the distribution of the weights and activation functions. Experimental results reveal that TAS delivers 2.64% higher accuracy and 2.8x memory saving over competing methods with the same bit-width resolution on the CIFAR-10 dataset. These results suggest that TAS is an effective method that paves the way for the efficient design of the next generation of quantized neural networks.

  • 13.
    Loni, Mohammad
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sinaei, Sima
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Zoljodi, A.
    Shiraz University of Technology, Shiraz, Iran.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sjödin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    DeepMaker: A multi-objective optimization framework for deep neural networks in embedded systems2020In: Microprocessors and microsystems, ISSN 0141-9331, E-ISSN 1872-9436, Vol. 73, article id 102989Article in journal (Refereed)
    Abstract [en]

    Deep Neural Networks (DNNs) are compute-intensive learning models with growing applicability in a wide range of domains. Due to their computational complexity, DNNs benefit from implementations that utilize custom hardware accelerators to meet performance and response time as well as classification accuracy constraints. In this paper, we propose DeepMaker framework that aims to automatically design a set of highly robust DNN architectures for embedded devices as the closest processing unit to the sensors. DeepMaker explores and prunes the design space to find improved neural architectures. Our proposed framework takes advantage of a multi-objective evolutionary approach that exploits a pruned design space inspired by a dense architecture. DeepMaker considers the accuracy along with the network size factor as two objectives to build a highly optimized network fitting with limited computational resource budgets while delivers an acceptable accuracy level. In comparison with the best result on the CIFAR-10 dataset, a generated network by DeepMaker presents up to a 26.4x compression rate while loses only 4% accuracy. Besides, DeepMaker maps the generated CNN on the programmable commodity devices, including ARM Processor, High-Performance CPU, GPU, and FPGA. 

  • 14.
    Loni, Mohammad
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Zoljodi, Ali
    Shiraz Univ Technol, Shiraz, Iran.
    Maier, Daniel
    Technische Universität Berlin, Germany.
    Majd, Amin
    Abo Akad Univ, Dept Informat Technol, Turku, Finland.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sjödin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems. ES (Embedded Systems).
    Juurlink, Ben
    Technische Universität Berlin, Germany.
    Akbari, Reza
    Shiraz Univ Technol, Shiraz, Iran.
    DenseDisp: Resource-Aware Disparity Map Estimation by Compressing Siamese Neural Architecture2020In: IEEE WORLD CONGRESS ON COMPUTATIONAL INTELLIGENCE (WCCI) 2020 IEEE WCCI, Glasgow, United Kingdom, 2020Conference paper (Refereed)
    Abstract [en]

    Stereo vision cameras are flexible sensors due to providing heterogeneous information such as color, luminance, disparity map (depth), and shape of the objects. Today, Convolutional Neural Networks (CNNs) present the highest accuracy for the disparity map estimation [1]. However, CNNs require considerable computing capacity to process billions of floating-point operations in a real-time fashion. Besides, commercial stereo cameras produce huge size images (e.g., 10 Megapixels [2]), which impose a new computational cost to the system. The problem will be pronounced if we target resource-limited hardware for the implementation. In this paper, we propose DenseDisp, an automatic framework that designs a Siamese neural architecture for disparity map estimation in a reasonable time. DenseDisp leverages a meta-heuristic multi-objective exploration to discover hardware-friendly architectures by considering accuracy and network FLOPS as the optimization objectives. We explore the design space with four different fitness functions to improve the accuracy-FLOPS trade-off and convergency time of the DenseDisp. According to the experimental results, DenseDisp provides up to 39.1x compression rate while losing around 5% accuracy compared to the state-of-the-art results.

  • 15.
    Loni, Mohammad
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Zoljodi, Ali
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Majd, Amin
    Arcada Univ Appl Sci, Dept Econ & Business Anal, Helsinki 00560, Finland..
    Ahn, Byung Hoon
    Univ Calif San Diego, Dept Comp Sci & Engn, Alternat Comp Technol Lab, La Jolla, CA 92093 USA..
    Daneshtalab, Masoud
    Malardalen Univ, Sch Innovat Design & Engn, S-72218 Vasteras, Sweden.;TalTech Univ, Dept Comp Syst, EE-19086 Tallinn, Estonia..
    Sjödin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems. Malardalen Univ, Sch Innovat Design & Engn, S-72218 Vasteras, Sweden..
    Esmaeilzadeh, Hadi
    Univ Calif San Diego, Dept Comp Sci & Engn, Alternat Comp Technol Lab, La Jolla, CA 92093 USA..
    FastStereoNet: A Fast Neural Architecture Search for Improving the Inference of Disparity Estimation on Resource-Limited Platforms2022In: IEEE Transactions on Systems, Man & Cybernetics. Systems, ISSN 2168-2216, E-ISSN 2168-2232, Vol. 52, no 8, p. 5222-5234Article in journal (Refereed)
    Abstract [en]

    Convolutional neural networks (CNNs) provide the best accuracy for disparity estimation. However, CNNs are computationally expensive, making them unfavorable for resource-limited devices with real-time constraints. Recent advances in neural architectures search (NAS) promise opportunities in automated optimization for disparity estimation. However, the main challenge of the NAS methods is the significant amount of computing time to explore a vast search space [e.g., 1.6x10(29)] and costly training candidates. To reduce the NAS computational demand, many proxy-based NAS methods have been proposed. Despite their success, most of them are designed for comparatively small-scale learning tasks. In this article, we propose a fast NAS method, called FastStereoNet, to enable resource-aware NAS within an intractably large search space. FastStereoNet automatically searches for hardware-friendly CNN architectures based on late acceptance hill climbing (LAHC), followed by simulated annealing (SA). FastStereoNet also employs a fine-tuning with a transferred weights mechanism to improve the convergence of the search process. The collection of these ideas provides competitive results in terms of search time and strikes a balance between accuracy and efficiency. Compared to the state of the art, FastStereoNet provides 5.25x reduction in search time and 44.4x reduction in model size. These benefits are attained while yielding a comparable accuracy that enables seamless deployment of disparity estimation on resource-limited devices. Finally, FastStereoNet significantly improves the perception quality of disparity estimation deployed on field-programmable gate array and Intel Neural Compute Stick 2 accelerator in a significantly less onerous manner.

  • 16.
    Loni, Mohammad
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Zoljodi, Ali
    Shiraz University of Technology, Shiraz, Iran.
    Sinaei, Sima
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Nolin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    NeuroPower: Designing Energy Efficient Convolutional Neural Network Architecture for Embedded Systems2019In: Lecture Notes in Computer Science, Volume 11727, Munich, Germany: Springer , 2019, p. 208-222Conference paper (Refereed)
    Abstract [en]

    Convolutional Neural Networks (CNNs) suffer from energy-hungry implementation due to their computation and memory intensive processing patterns. This problem is even more significant by the proliferation of CNNs on embedded platforms. To overcome this problem, we offer NeuroPower as an automatic framework that designs a highly optimized and energy efficient set of CNN architectures for embedded systems. NeuroPower explores and prunes the design space to find improved set of neural architectures. Toward this aim, a multi-objective optimization strategy is integrated to solve Neural Architecture Search (NAS) problem by near-optimal tuning network hyperparameters. The main objectives of the optimization algorithm are network accuracy and number of parameters in the network. The evaluation results show the effectiveness of NeuroPower on energy consumption, compacting rate and inference time compared to other cutting-edge approaches. In comparison with the best results on CIFAR-10/CIFAR-100 datasets, a generated network by NeuroPower presents up to 2.1x/1.56x compression rate, 1.59x/3.46x speedup and 1.52x/1.82x power saving while loses 2.4%/-0.6% accuracy, respectively.

  • 17.
    Majd, A.
    et al.
    Faculty of Science and Engineering, Åbo Akademi University, Domkyrkotorget 3, Turku, Finland.
    Loni, Mohammad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sahebi, G.
    Department of Future Technologies, University of Turku, Turku, Finland.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Improving motion safety and efficiency of intelligent autonomous swarm of drones2020In: Drones, E-ISSN 2504-446X, Vol. 4, no 3, p. 1-19, article id 48Article in journal (Refereed)
    Abstract [en]

    Interest is growing in the use of autonomous swarms of drones in various mission-physical applications such as surveillance, intelligent monitoring, and rescue operations. Swarm systems should fulfill safety and efficiency constraints in order to guarantee dependable operations. To maximize motion safety, we should design the swarm system in such a way that drones do not collide with each other and/or other objects in the operating environment. On other hand, to ensure that the drones have sufficient resources to complete the required task reliably, we should also achieve efficiency while implementing the mission, by minimizing the travelling distance of the drones. In this paper, we propose a novel integrated approach that maximizes motion safety and efficiency while planning and controlling the operation of the swarm of drones. To achieve this goal, we propose a novel parallel evolutionary-based swarm mission planning algorithm. The evolutionary computing allows us to plan and optimize the routes of the drones at the run-time to maximize safety while minimizing travelling distance as the efficiency objective. In order to fulfill the defined constraints efficiently, our solution promotes a holistic approach that considers the whole design process from the definition of formal requirements through the software development. The results of benchmarking demonstrate that our approach improves the route efficiency by up to 10% route efficiency without any crashes in controlling swarms compared to state-of-the-art solutions. 

  • 18.
    Majd, Amin
    et al.
    Åbo Akademi, Finland.
    Loni, Mohammad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sahebi, Golnaz
    University of Turku, Finland.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Troubitsyna, Elena
    KTH, Sweden.
    A Cloud Based Super-Optimization Method to Parallelize the Sequential Code’s Nested Loops2019In: IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip MCSoC-2019, 2019Conference paper (Refereed)
    Abstract [en]

    Advances in hardware architecture regarding multi-core processors make parallel computing ubiquitous. To achieve the maximum utilization of multi-core processors, parallel programming techniques are required. However, there are several challenges standing in front of parallel programming. These problems are mainly divided into three major groups. First, although recent advancements in parallel programming languages (e.g. MPI, OpenCL, etc.) assist developers, still parallel programming is not desirable for most programmers. The second one belongs to the massive volume of old software and applications, which have been written in serial mode. However, converting millions of line of serial codes to parallel codes is highly time-consuming and requiring huge verification effort. Third, the production of software and applications in parallel mode is very expensive since it needs knowledge and expertise. Super-optimization provided by super compilers is the process of automatically determine the dependent and independent instructions to find any data dependency and loop-free sequence of instructions. Super compiler then runs these instructions on different processors in the parallel mode, if it is possible. Super-optimization is a feasible solution for helping the programmer to get relaxed from parallel programming workload. Since the most complexity of the sequential codes is in the nested loops, we try to parallelize the nested loops by using the idea of super-optimization. One of the underlying stages in the super-optimization is scheduling tiled space for iterating nested loops. Since the problem is NP-Hard, using the traditional optimization methods are not feasible. In this paper, we propose a cloud-based super-optimization method as Software-as-a-Service (SaaS) to reduce the cost of parallel programming. In addition, it increases the utilization of the processing capacity of the multi-core processor. As the result, an intermediate programmer can use the whole processing capacity of his/her system without knowing anything about writing parallel codes or super compiler functions by sending the serial code to a cloud server and receiving the parallel version of the code from the cloud server. In this paper, an evolutionary algorithm is leveraged to solve the scheduling problem of tiles. Our proposed super-optimization method will serve as software and provided as a hybrid (public and private) deployment model.

  • 19. Maleki, Neda
    et al.
    Loni, Mohammad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Conti, Mauro
    University of Padua, Italy .
    Fotouhi, Hossein
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    SoFA: A Spark-oriented Fog Architecture2019In: IEEE 45th Annual Conference of the Industrial Electronics Society IECON'19, 2019Conference paper (Refereed)
    Abstract [en]

    Fog computing offers a wide range of service levels including low bandwidth usage, low response time, support of heterogeneous applications, and high energy efficiency. Therefore, real-time embedded applications could potentially benefit from Fog infrastructure. However, providing high system utilization is an important challenge of Fog computing especially for processing embedded applications. In addition, although Fog computing extends cloud computing by providing more energy efficiency, it still suffers from remarkable energy consumption, which is a limitation for embedded systems. To overcome the above limitations, in this paper, we propose SoFA, a Spark-oriented Fog architecture that leverages Spark functionalities to provide higher system utilization, energy efficiency, and scalability. Compared to the common Fog computing platforms where edge devices are only responsible for processing data received from their IoT nodes, SoFA leverages the remaining processing capacity of all other edge devices. To attain this purpose, SoFA provides a distributed processing paradigm by the help of Spark to utilize the whole processing capacity of all the available edge devices leading to increase energy efficiency and system utilization. In other words, SoFA proposes a near- sensor processing solution in which the edge devices act as the Fog nodes. In addition, SoFA provides scalability by taking advantage of Spark functionalities. According to the experimental results, SoFA is a power-efficient and scalable solution desirable for embedded platforms by providing up to 3.1x energy efficiency for the Word-Count benchmark compared to the common Fog processing platform.

  • 20.
    Mousavi, Hamid
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Loni, Mohammad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Alibeigi, M.
    Zenseact Ab, Göteborg, Sweden.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems. Computer Systems, Tallinn University of Technology, Tallinn, Estonia.
    DASS: Differentiable Architecture Search for Sparse Neural Networks2023In: ACM Transactions on Embedded Computing Systems, ISSN 1539-9087, E-ISSN 1558-3465, Vol. 22, no 5 s, article id 105Article in journal (Refereed)
    Abstract [en]

    The deployment of Deep Neural Networks (DNNs) on edge devices is hindered by the substantial gap between performance requirements and available computational power. While recent research has made significant strides in developing pruning methods to build a sparse network for reducing the computing overhead of DNNs, there remains considerable accuracy loss, especially at high pruning ratios. We find that the architectures designed for dense networks by differentiable architecture search methods are ineffective when pruning mechanisms are applied to them. The main reason is that the current methods do not support sparse architectures in their search space and use a search objective that is made for dense networks and does not focus on sparsity.This paper proposes a new method to search for sparsity-friendly neural architectures. It is done by adding two new sparse operations to the search space and modifying the search objective. We propose two novel parametric SparseConv and SparseLinear operations in order to expand the search space to include sparse operations. In particular, these operations make a flexible search space due to using sparse parametric versions of linear and convolution operations. The proposed search objective lets us train the architecture based on the sparsity of the search space operations. Quantitative analyses demonstrate that architectures found through DASS outperform those used in the state-of-the-art sparse networks on the CIFAR-10 and ImageNet datasets. In terms of performance and hardware effectiveness, DASS increases the accuracy of the sparse version of MobileNet-v2 from 73.44% to 81.35% (+7.91% improvement) with a 3.87× faster inference time.

  • 21.
    Nazari, Najmeh
    et al.
    University of Tehran, Tehran , Iran.
    Loni, Mohammad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems. ES (Embedded Systems).
    E. Salehi, Mostafa
    University of Tehran, Tehran , Iran.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sjödin, Mikael
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    TOT-Net: An Endeavor Toward Optimizing Ternary Neural Networks2019In: 22nd Euromicro Conference on Digital System Design DSD 2019, 2019, p. 305-312, article id 8875067Conference paper (Refereed)
    Abstract [en]

    High computation demands and big memory resources are the major implementation challenges of Convolutional Neural Networks (CNNs) especially for low-power and resource-limited embedded devices. Many binarized neural networks are recently proposed to address these issues. Although they have significantly decreased computation and memory footprint, they have suffered from accuracy loss especially for large datasets. In this paper, we propose TOT-Net, a ternarized neural network with [-1, 0, 1] values for both weights and activation functions that has simultaneously achieved a higher level of accuracy and less computational load. In fact, first, TOT-Net introduces a simple bitwise logic for convolution computations to reduce the cost of multiply operations. To improve the accuracy, selecting proper activation function and learning rate are influential, but also difficult. As the second contribution, we propose a novel piece-wise activation function, and optimized learning rate for different datasets. Our findings first reveal that 0.01 is a preferable learning rate for the studied datasets. Third, by using an evolutionary optimization approach, we found novel piece-wise activation functions customized for TOT-Net. According to the experimental results, TOT-Net achieves 2.15%, 8.77%, and 5.7/5.52% better accuracy compared to XNOR-Net on CIFAR-10, CIFAR-100, and ImageNet top-5/top-1 datasets, respectively.

  • 22.
    Salimi, M.
    et al.
    Tehran University, Tehran, Iran.
    Majd, A.
    Åbo Akademi University, Turku, Finland.
    Loni, Mohammad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Seceleanu, Tiberiu
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Seceleanu, Cristina
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sirjani, Marjan
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Troubitsyna, E.
    Royal Institute of Technology, Stockholm, Sweden.
    Multi-objective optimization of real-time task scheduling problem for distributed environments2020In: PROCEEDINGS OF THE 6TH CONFERENCE ON THE ENGINEERING OF COMPUTER BASED SYSTEMS (ECBS 2019), Association for Computing Machinery , 2020, article id a13Conference paper (Refereed)
    Abstract [en]

    Real-world applications are composed of multiple tasks which usually have intricate data dependencies. To exploit distributed processing platforms, task allocation and scheduling, that is assigning tasks to processing units and ordering inter-processing unit data transfers, plays a vital role. However, optimally scheduling tasks on processing units and finding an optimized network topology is an NP-complete problem. The problem becomes more complicated when the tasks have real-time deadlines for termination. Exploring the whole search space in order to find the optimal solution is not feasible in a reasonable amount of time, therefore meta-heuristics are often used to find a near-optimal solution. We propose here a multi-population evolutionary approach for near-optimal scheduling optimization, that guarantees end-to-end deadlines of tasks in distributed processing environments. We analyze two different exploration scenarios including single and multi-objective exploration. The main goal of the single objective exploration algorithm is to achieve the minimal number of processing units for all the tasks, whereas a multi-objective optimization tries to optimize two conflicting objectives simultaneously considering the total number of processing units and end-to-end finishing time for all the jobs. The potential of the proposed approach is demonstrated by experiments based on a use case for mapping a number of jobs covering industrial automation systems, where each of the jobs consists of a number of tasks in a distributed environment.

  • 23.
    Salimi, Maghsood
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Loni, Mohammad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sirjani, Marjan
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Learning Activation Functions for Adversarial Attack Resilience in CNNs2023In: Lect. Notes Comput. Sci., Springer Science and Business Media Deutschland GmbH , 2023, p. 203-214Conference paper (Refereed)
    Abstract [en]

    Adversarial attacks on convolutional neural networks (CNNs) have been a serious concern in recent years, as they can cause CNNs to produce inaccurate predictions. Through our analysis of training CNNs with adversarial examples, we discovered that this was primarily caused by naïvely selecting ReLU as the default choice for activation functions. In contrast to the focus of recent works on proposing adversarial training methods, we study the feasibility of an innovative alternative: learning novel activation functions to make CNNs more resilient to adversarial attacks. In this paper, we propose a search framework that combines simulated annealing and late acceptance hill-climbing to find activation functions that are more robust against adversarial attacks in CNN architectures. The proposed search method has superior search convergence compared to commonly used baselines. The proposed method improves the resilience to adversarial attacks by achieving up to 17.1%, 22.8%, and 16.6% higher accuracy against BIM, FGSM, and PGD attacks, respectively, over ResNet-18 trained on the CIFAR-10 dataset.

  • 24.
    Salimi, Maghsood
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Loni, Mohammad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Sirjani, Marjan
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Cicchetti, Antonio
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Abbaspour Asadollah, Sara
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    SARAF: Searching for Adversarial Robust Activation Functions2023In: ACM International Conference Proceeding Series, Association for Computing Machinery , 2023, p. 174-182Conference paper (Refereed)
    Abstract [en]

    Convolutional Neural Networks (CNNs) have received great attention in the computer vision domain. However, CNNs are vulnerable to adversarial attacks, which are manipulations of input data that are imperceptible to humans but can fool the network. Several studies tried to address this issue, which can be divided into two categories: (i) training the network with adversarial examples, and (ii) optimizing the network architecture and/or hyperparameters. Although adversarial training is a sufficient defense mechanism, they suffer from requiring a large volume of training samples to cover a wide perturbation bound. Tweaking network activation functions (AFs) has been shown to provide promising results where CNNs suffer from performance loss. However, optimizing network AFs for compensating the negative impacts of adversarial attacks has not been addressed in the literature. This paper proposes the idea of searching for AFs that are robust against adversarial attacks. To this aim, we leverage the Simulated Annealing (SA) algorithm with a fast convergence time. This proposed method is called SARAF. We demonstrate the consistent effectiveness of SARAF by achieving up to 16.92%, 18.3%, and 15.57% accuracy improvement against BIM, FGSM, and PGD adversarial attacks, respectively, over ResNet-18 with ReLU AFs (baseline) trained on CIFAR-10. Meanwhile, SARAF provides a significant search efficiency compared to random search as the optimization baseline.

  • 25.
    Vidimlic, Najda
    et al.
    Mälardalen University.
    Levin, Alexandra
    Mälardalen University.
    Loni, Mohammad
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Daneshtalab, Masoud
    Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
    Image synthesisation and data augmentation for safe object detection in aircraft auto-landing system2021In: VISIGRAPP 2021 - Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, SciTePress , 2021, Vol. 5, p. 123-135Conference paper (Refereed)
    Abstract [en]

    The feasibility of deploying object detection to interpret the environment is questioned in several mission-critical applications leading to raised concerns about the ability of object detectors in providing reliable and safe predictions of the operational environment, regardless of weather and light conditions. The lack of a comprehensive dataset, which causes class imbalance and detection difficulties of hard examples, is one of the main reasons of accuracy loss in attitude safe object detection. Data augmentation, as an implicit regularisation technique, has been shown to significantly improve object detection by increasing both the diversity and the size of the training dataset. Despite the success of data augmentation in various computer vision tasks, applying data augmentation techniques to improve safety has not been sufficiently addressed in the literature. In this paper, we leverage a set of data augmentation techniques to improve the safety of object detection. The aircraft in-flight image data is used to evaluate the feasibility of our proposed solution in real-world safety-required scenarios. To achieve our goal, we first generate a training dataset by synthesising the images collected from in-flight recordings. Next, we augment the generated dataset to cover real weather and lighting changes. Introduction of artificially produced distortions is also known as corruptions and has since recently been an approach to enrich the dataset. The introduction of corruptions, as augmentations of weather and luminance in combination with the introduction of artificial artefacts, is done as an approach to achieve a comprehensive representation of an aircraft’s operational environment. Finally, we evaluate the impact of data augmentation on the studied dataset. Faster R-CNN with ResNet-50-FPN was used as an object detector for the experiments. An AP@[IoU=.5:.95] score of 50.327% was achieved with the initial setup, while exposure to altered weather and lighting conditions yielded an 18.1% decrease. The introduction of the conditions into the training set led to a 15.6% increase in comparison to the score achieved from exposure to the conditions. 

  • 26.
    Zoljodi, Ali
    et al.
    Mälardalen University, School of Innovation, Design and Engineering, Innovation and Product Realisation.
    Loni, Mohammad
    Mälardalen University, School of Innovation, Design and Engineering, Innovation and Product Realisation.
    Abadijou, Sadegh
    Mälardalen University, School of Innovation, Design and Engineering, Innovation and Product Realisation.
    Alibeigi, Mina
    Zenseact AB, Gothenburg, Sweden.
    Daneshtalab, Masoud
    Tallinn University of Technology, Estonia.
    3DLaneNAS: Neural Architecture Search for Accurate and Light-Weight 3D Lane Detection2022In: Artificial Neural Networks and Machine Learning – ICANN 2022: 31st International Conference on Artificial Neural Networks, Bristol, UK, September 6–9, 2022, Proceedings, Part I / [ed] Elias Pimenidis; Plamen Angelov; Chrisina Jayne; Antonios Papaleonidas; Mehmet Aydin, Springer Science and Business Media Deutschland GmbH , 2022, p. 404-415Conference paper (Refereed)
    Abstract [en]

    Lane detection is one of the most fundamental tasks for autonomous driving. It plays a crucial role in the lateral control and the precise localization of autonomous vehicles. Monocular 3D lane detection methods provide state-of-the-art results for estimating the position of lanes in 3D world coordinates using only the information obtained from the front-view camera. Recent advances in Neural Architecture Search (NAS) facilitate automated optimization of various computer vision tasks. NAS can automatically optimize monocular 3D lane detection methods to enhance the extraction and combination of visual features, consequently reducing computation loads and increasing accuracy. This paper proposes 3DLaneNAS, a multi-objective method that enhances the accuracy of monocular 3D lane detection for both short- and long-distance scenarios while at the same time providing a fair amount of hardware acceleration. 3DLaneNAS utilizes a new multi-objective energy function to optimize the architecture of feature extraction and feature fusion modules simultaneously. Moreover, a transfer learning mechanism is used to improve the convergence of the search process. Experimental results reveal that 3DLaneNAS yields a minimum of 5.2% higher accuracy and ≈ 1.33 × lower latency over competing methods on the synthetic-3D-lanes dataset. Code is at https://github.com/alizoljodi/3DLaneNAS

1 - 26 of 26
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf