mdh.sePublikationer
Ändra sökning
Avgränsa sökresultatet
1 - 12 av 12
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Träffar per sida
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
Markera
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1.
    Akbari, N.
    et al.
    University of Tehran, Tehran, Iran.
    Modarressi, M.
    University of Tehran, Tehran, Iran.
    Daneshtalab, Masoud
    Mälardalens högskola, Akademin för innovation, design och teknik, Inbyggda system. Royal Institute of Technology (KTH), Sweden.
    Loni, Mohammad
    Mälardalens högskola, Akademin för innovation, design och teknik, Inbyggda system. Royal Institute of Technology (KTH), Sweden.
    A Customized Processing-in-Memory Architecture for Biological Sequence Alignment2018Ingår i: Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors, Institute of Electrical and Electronics Engineers Inc. , 2018, artikel-id 8445124Konferensbidrag (Refereegranskat)
    Abstract [en]

    Sequence alignment is the most widely used operation in bioinformatics. With the exponential growth of the biological sequence databases, searching a database to find the optimal alignment for a query sequence (that can be at the order of hundreds of millions of characters long) would require excessive processing power and memory bandwidth. Sequence alignment algorithms can potentially benefit from the processing power of massive parallel processors due their simple arithmetic operations, coupled with the inherent fine-grained and coarse-grained parallelism that they exhibit. However, the limited memory bandwidth in conventional computing systems prevents exploiting the maximum achievable speedup. In this paper, we propose a processing-in-memory architecture as a viable solution for the excessive memory bandwidth demand of bioinformatics applications. The design is composed of a set of simple and lightweight processing elements, customized to the sequence alignment algorithm, integrated at the logic layer of an emerging 3D DRAM architecture. Experimental results show that the proposed architecture results in up to 2.4x speedup and 41% reduction in power consumption, compared to a processor-side parallel implementation. 

  • 2.
    Ghaderi, Adnan
    et al.
    Mälardalens högskola, Akademin för innovation, design och teknik, Inbyggda system.
    Daneshtalab, Masoud
    Mälardalens högskola, Akademin för innovation, design och teknik, Inbyggda system.
    Ashjaei, Seyed Mohammad Hossein
    Mälardalens högskola, Akademin för innovation, design och teknik, Inbyggda system.
    Loni, Mohammad
    Mälardalens högskola, Akademin för innovation, design och teknik, Inbyggda system.
    Mubeen, Saad
    Mälardalens högskola, Akademin för innovation, design och teknik, Inbyggda system.
    Sjödin, Mikael
    Mälardalens högskola, Akademin för innovation, design och teknik, Inbyggda system.
    Design challenges in hardware development of time-sensitive networking: A research plan2019Ingår i: CEUR Workshop Proceedings, Volume 2457, CEUR-WS , 2019, Vol. 2457Konferensbidrag (Refereegranskat)
    Abstract [en]

    Time-Sensitive Networking (TSN) is a set of ongoing projects within the IEEE standardization to guarantee timeliness and low-latency communication based on switched Ethernet for industrial applications. The huge demand is mainly coming from industries where intensive data transmission is required, such as in the modern vehicles where cameras, lidars and high-bandwidth modern sensors are connected. The TSN standards are evolving over time, hence the hardware needs to change depending upon the modifications. In addition, high performance hardware is required to obtain a full benefit from the standards. In this paper, we present a research plan for developing novel techniques to support a parameterized and modular hardware IP core of the multi-stage TSN switch fabric in VHSIC (Very High Speed Integrated Circuit) Hardware Description Language (VHDL), which can be deployed in any Field-Programmable-Gate-Array (FPGA) devices. We present the challenges on the way towards the mentioned goal. 

  • 3.
    Loni, Mohammad
    et al.
    Mälardalens högskola, Akademin för innovation, design och teknik, Inbyggda system.
    Ahlberg, Carl
    Mälardalens högskola, Akademin för innovation, design och teknik, Inbyggda system.
    Daneshtalab, Masoud
    Mälardalens högskola, Akademin för innovation, design och teknik, Inbyggda system.
    Ekström, Mikael
    Mälardalens högskola, Akademin för innovation, design och teknik, Inbyggda system.
    Sjödin, Mikael
    Mälardalens högskola, Akademin för innovation, design och teknik, Inbyggda system.
    Embedded Acceleration of Image Classification Applications for Stereo Vision Systems2018Ingår i: Design, Automation & Test in Europe Conference & Exhibition DATE'18, 2018Konferensbidrag (Övrigt vetenskapligt)
  • 4.
    Loni, Mohammad
    et al.
    Mälardalens högskola, Akademin för innovation, design och teknik, Inbyggda system.
    Daneshtalab, Masoud
    Mälardalens högskola, Akademin för innovation, design och teknik, Inbyggda system.
    Sjödin, Mikael
    Mälardalens högskola, Akademin för innovation, design och teknik, Inbyggda system.
    ADONN: Adaptive design of optimized deep neural networks for embedded systems2018Ingår i: Proceedings - 21st Euromicro Conference on Digital System Design, DSD 2018, Institute of Electrical and Electronics Engineers Inc. , 2018, s. 397-404Konferensbidrag (Refereegranskat)
    Abstract [en]

    Nowadays, many modern applications, e.g. autonomous system, and cloud data services need to capture and process a big amount of raw data at runtime that ultimately necessitates a high-performance computing model. Deep Neural Network (DNN) has already revealed its learning capabilities in runtime data processing for modern applications. However, DNNs are becoming more deep sophisticated models for gaining higher accuracy which require a remarkable computing capacity. Considering high-performance cloud infrastructure as a supplier of required computational throughput is often not feasible. Instead, we intend to find a near-sensor processing solution which will lower the need for network bandwidth and increase privacy and power efficiency, as well as guaranteeing worst-case response-times. Toward this goal, we introduce ADONN framework, which aims to automatically design a highly robust DNN architecture for embedded devices as the closest processing unit to the sensors. ADONN adroitly searches the design space to find improved neural architectures. Our proposed framework takes advantage of a multi-objective evolutionary approach, which exploits a pruned design space inspired by a dense architecture. Unlike recent works that mainly have tried to generate highly accurate networks, ADONN also considers the network size factor as the second objective to build a highly optimized network fitting with limited computational resource budgets while delivers comparable accuracy level. In comparison with the best result on CIFAR-10 dataset, a generated network by ADONN presents up to 26.4 compression rate while loses only 4% accuracy. In addition, ADONN maps the generated DNN on the commodity programmable devices including ARM Processor, High-Performance CPU, GPU, and FPGA.

  • 5.
    Loni, Mohammad
    et al.
    Mälardalens högskola, Akademin för innovation, design och teknik, Inbyggda system.
    Hamouachy, Fadouao
    Casarrubios, Clémentine
    Daneshtalab, Masoud
    Mälardalens högskola, Akademin för innovation, design och teknik, Inbyggda system.
    Nolin, Mikael
    Mälardalens högskola, Akademin för innovation, design och teknik, Inbyggda system.
    AutoRIO: An Indoor Testbed for Developing Autonomous Vehicles2019Ingår i: International Japan-Africa Conference on Electronics, Communications and Computations JAC-ECC, 2019, s. 69-72Konferensbidrag (Refereegranskat)
    Abstract [en]

    Autonomous vehicles have a great influence on our life. These vehicles are more convenient, more energy efficient providing higher safety level and cheaper driving solutions. In addition, decreasing the generation of CO 2 , and the risk vehicular accidents are other benefits of autonomous vehicles. However, leveraging a full autonomous system is challenging and the proposed solutions are newfound. Providing a testbed for evaluating new algorithms is beneficial for researchers and hardware developers to verify the real impact of their solutions. The existence of testing environment is a low-cost infrastructure leading to increase the time-to-market of novel ideas. In this paper, we propose Auto Rio, a cutting-edge indoor testbed for developing autonomous vehicles.

  • 6.
    Loni, Mohammad
    et al.
    Mälardalens högskola, Akademin för innovation, design och teknik, Inbyggda system.
    Majd, Amin
    Åbo Akademi University, Turku, Finland.
    Loni, Abdolah
    KTH Royal Institute of Technology, Stockholm, Sweden.
    Daneshtalab, Masoud
    Mälardalens högskola, Akademin för innovation, design och teknik, Inbyggda system.
    Nolin, Mikael
    Mälardalens högskola, Akademin för innovation, design och teknik, Inbyggda system.
    Troubitsyna, Elena
    KTH Royal Institute of Technology, Stockholm, Sweden.
    Designing Compact Convolutional Neural Network for Embedded Stereo Vision Systems2018Ingår i: IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip MCSoC-2018, 2018, s. 244-251, artikel-id 8540240Konferensbidrag (Refereegranskat)
  • 7.
    Loni, Mohammad
    et al.
    Mälardalens högskola, Akademin för innovation, design och teknik, Inbyggda system.
    Sinaei, Sima
    Mälardalens högskola.
    Zoljodi, A.
    Shiraz University of Technology, Shiraz, Iran.
    Daneshtalab, Masoud
    Mälardalens högskola, Akademin för innovation, design och teknik, Inbyggda system.
    Sjödin, Mikael
    Mälardalens högskola, Akademin för innovation, design och teknik, Inbyggda system.
    DeepMaker: A multi-objective optimization framework for deep neural networks in embedded systems2020Ingår i: Microprocessors and microsystems, ISSN 0141-9331, E-ISSN 1872-9436, Vol. 73, artikel-id 102989Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Deep Neural Networks (DNNs) are compute-intensive learning models with growing applicability in a wide range of domains. Due to their computational complexity, DNNs benefit from implementations that utilize custom hardware accelerators to meet performance and response time as well as classification accuracy constraints. In this paper, we propose DeepMaker framework that aims to automatically design a set of highly robust DNN architectures for embedded devices as the closest processing unit to the sensors. DeepMaker explores and prunes the design space to find improved neural architectures. Our proposed framework takes advantage of a multi-objective evolutionary approach that exploits a pruned design space inspired by a dense architecture. DeepMaker considers the accuracy along with the network size factor as two objectives to build a highly optimized network fitting with limited computational resource budgets while delivers an acceptable accuracy level. In comparison with the best result on the CIFAR-10 dataset, a generated network by DeepMaker presents up to a 26.4x compression rate while loses only 4% accuracy. Besides, DeepMaker maps the generated CNN on the programmable commodity devices, including ARM Processor, High-Performance CPU, GPU, and FPGA. 

  • 8.
    Loni, Mohammad
    et al.
    Mälardalens högskola, Akademin för innovation, design och teknik, Inbyggda system.
    Zoljodi, Ali
    Shiraz University of Technology, Shiraz, Iran.
    Seenan, Sima
    Mälardalens högskola, Akademin för innovation, design och teknik, Inbyggda system.
    Daneshtalab, Masoud
    Mälardalens högskola, Akademin för innovation, design och teknik, Inbyggda system.
    Nolin, Mikael
    Mälardalens högskola, Akademin för innovation, design och teknik, Inbyggda system.
    NeuroPower: Designing Energy Efficient Convolutional Neural Network Architecture for Embedded Systems2019Ingår i: Lecture Notes in Computer Science, Volume 11727, Munich, Germany: Springer , 2019, s. 208-222Konferensbidrag (Refereegranskat)
    Abstract [en]

    Convolutional Neural Networks (CNNs) suffer from energy-hungry implementation due to their computation and memory intensive processing patterns. This problem is even more significant by the proliferation of CNNs on embedded platforms. To overcome this problem, we offer NeuroPower as an automatic framework that designs a highly optimized and energy efficient set of CNN architectures for embedded systems. NeuroPower explores and prunes the design space to find improved set of neural architectures. Toward this aim, a multi-objective optimization strategy is integrated to solve Neural Architecture Search (NAS) problem by near-optimal tuning network hyperparameters. The main objectives of the optimization algorithm are network accuracy and number of parameters in the network. The evaluation results show the effectiveness of NeuroPower on energy consumption, compacting rate and inference time compared to other cutting-edge approaches. In comparison with the best results on CIFAR-10/CIFAR-100 datasets, a generated network by NeuroPower presents up to 2.1x/1.56x compression rate, 1.59x/3.46x speedup and 1.52x/1.82x power saving while loses 2.4%/-0.6% accuracy, respectively.

  • 9.
    Majd, Amin
    et al.
    Åbo Akademi, Finland.
    Loni, Mohammad
    Mälardalens högskola, Akademin för innovation, design och teknik, Inbyggda system.
    Sahebi, Golnaz
    University of Turku, Finland.
    Daneshtalab, Masoud
    Mälardalens högskola, Akademin för innovation, design och teknik, Inbyggda system.
    Troubitsyna, Elena
    KTH, Sweden.
    A Cloud Based Super-Optimization Method to Parallelize the Sequential Code’s Nested Loops2019Ingår i: IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip MCSoC-2019, 2019Konferensbidrag (Refereegranskat)
    Abstract [en]

    Advances in hardware architecture regarding multi-core processors make parallel computing ubiquitous. To achieve the maximum utilization of multi-core processors, parallel programming techniques are required. However, there are several challenges standing in front of parallel programming. These problems are mainly divided into three major groups. First, although recent advancements in parallel programming languages (e.g. MPI, OpenCL, etc.) assist developers, still parallel programming is not desirable for most programmers. The second one belongs to the massive volume of old software and applications, which have been written in serial mode. However, converting millions of line of serial codes to parallel codes is highly time-consuming and requiring huge verification effort. Third, the production of software and applications in parallel mode is very expensive since it needs knowledge and expertise. Super-optimization provided by super compilers is the process of automatically determine the dependent and independent instructions to find any data dependency and loop-free sequence of instructions. Super compiler then runs these instructions on different processors in the parallel mode, if it is possible. Super-optimization is a feasible solution for helping the programmer to get relaxed from parallel programming workload. Since the most complexity of the sequential codes is in the nested loops, we try to parallelize the nested loops by using the idea of super-optimization. One of the underlying stages in the super-optimization is scheduling tiled space for iterating nested loops. Since the problem is NP-Hard, using the traditional optimization methods are not feasible. In this paper, we propose a cloud-based super-optimization method as Software-as-a-Service (SaaS) to reduce the cost of parallel programming. In addition, it increases the utilization of the processing capacity of the multi-core processor. As the result, an intermediate programmer can use the whole processing capacity of his/her system without knowing anything about writing parallel codes or super compiler functions by sending the serial code to a cloud server and receiving the parallel version of the code from the cloud server. In this paper, an evolutionary algorithm is leveraged to solve the scheduling problem of tiles. Our proposed super-optimization method will serve as software and provided as a hybrid (public and private) deployment model.

  • 10. Maleki, Neda
    et al.
    Loni, Mohammad
    Mälardalens högskola, Akademin för innovation, design och teknik, Inbyggda system.
    Daneshtalab, Masoud
    Mälardalens högskola, Akademin för innovation, design och teknik, Inbyggda system.
    Conti, Mauro
    University of Padua, Italy .
    Fotouhi, Hossein
    Mälardalens högskola, Akademin för innovation, design och teknik, Inbyggda system.
    SoFA: A Spark-oriented Fog Architecture2019Ingår i: IEEE 45th Annual Conference of the Industrial Electronics Society IECON'19, 2019Konferensbidrag (Refereegranskat)
    Abstract [en]

    Fog computing offers a wide range of service levels including low bandwidth usage, low response time, support of heterogeneous applications, and high energy efficiency. Therefore, real-time embedded applications could potentially benefit from Fog infrastructure. However, providing high system utilization is an important challenge of Fog computing especially for processing embedded applications. In addition, although Fog computing extends cloud computing by providing more energy efficiency, it still suffers from remarkable energy consumption, which is a limitation for embedded systems. To overcome the above limitations, in this paper, we propose SoFA, a Spark-oriented Fog architecture that leverages Spark functionalities to provide higher system utilization, energy efficiency, and scalability. Compared to the common Fog computing platforms where edge devices are only responsible for processing data received from their IoT nodes, SoFA leverages the remaining processing capacity of all other edge devices. To attain this purpose, SoFA provides a distributed processing paradigm by the help of Spark to utilize the whole processing capacity of all the available edge devices leading to increase energy efficiency and system utilization. In other words, SoFA proposes a near- sensor processing solution in which the edge devices act as the Fog nodes. In addition, SoFA provides scalability by taking advantage of Spark functionalities. According to the experimental results, SoFA is a power-efficient and scalable solution desirable for embedded platforms by providing up to 3.1x energy efficiency for the Word-Count benchmark compared to the common Fog processing platform.

  • 11.
    Nazari, Najmeh
    et al.
    University of Tehran, Tehran , Iran.
    Loni, Mohammad
    Mälardalens högskola, Akademin för innovation, design och teknik, Inbyggda system. ES (Embedded Systems).
    E. Salehi, Mostafa
    University of Tehran, Tehran , Iran.
    Daneshtalab, Masoud
    Mälardalens högskola, Akademin för innovation, design och teknik, Inbyggda system.
    Sjödin, Mikael
    Mälardalens högskola, Akademin för innovation, design och teknik, Inbyggda system.
    TOT-Net: An Endeavor Toward Optimizing Ternary Neural Networks2019Ingår i: 22nd Euromicro Conference on Digital System Design DSD 2019, 2019, s. 305-312, artikel-id 8875067Konferensbidrag (Refereegranskat)
    Abstract [en]

    High computation demands and big memory resources are the major implementation challenges of Convolutional Neural Networks (CNNs) especially for low-power and resource-limited embedded devices. Many binarized neural networks are recently proposed to address these issues. Although they have significantly decreased computation and memory footprint, they have suffered from accuracy loss especially for large datasets. In this paper, we propose TOT-Net, a ternarized neural network with [-1, 0, 1] values for both weights and activation functions that has simultaneously achieved a higher level of accuracy and less computational load. In fact, first, TOT-Net introduces a simple bitwise logic for convolution computations to reduce the cost of multiply operations. To improve the accuracy, selecting proper activation function and learning rate are influential, but also difficult. As the second contribution, we propose a novel piece-wise activation function, and optimized learning rate for different datasets. Our findings first reveal that 0.01 is a preferable learning rate for the studied datasets. Third, by using an evolutionary optimization approach, we found novel piece-wise activation functions customized for TOT-Net. According to the experimental results, TOT-Net achieves 2.15%, 8.77%, and 5.7/5.52% better accuracy compared to XNOR-Net on CIFAR-10, CIFAR-100, and ImageNet top-5/top-1 datasets, respectively.

  • 12.
    Salimi, M.
    et al.
    Tehran University, Tehran, Iran.
    Majd, A.
    Åbo Akademi University, Turku, Finland.
    Loni, Mohammad
    Mälardalens högskola, Akademin för innovation, design och teknik, Inbyggda system.
    Seceleanu, Tiberiu
    Mälardalens högskola, Akademin för innovation, design och teknik, Inbyggda system.
    Seceleanu, Cristina
    Mälardalens högskola, Akademin för innovation, design och teknik, Inbyggda system.
    Sirjani, Marjan
    Mälardalens högskola, Akademin för innovation, design och teknik, Inbyggda system.
    Daneshtalab, Masoud
    Mälardalens högskola, Akademin för innovation, design och teknik, Inbyggda system.
    Troubitsyna, E.
    Royal Institute of Technology, Stockholm, Sweden.
    Multi-objective optimization of real-time task scheduling problem for distributed environments2019Ingår i: ACM International Conference Proceeding Series, Association for Computing Machinery , 2019, artikel-id a13Konferensbidrag (Refereegranskat)
    Abstract [en]

    Real-world applications are composed of multiple tasks which usually have intricate data dependencies. To exploit distributed processing platforms, task allocation and scheduling, that is assigning tasks to processing units and ordering inter-processing unit data transfers, plays a vital role. However, optimally scheduling tasks on processing units and finding an optimized network topology is an NP-complete problem. The problem becomes more complicated when the tasks have real-time deadlines for termination. Exploring the whole search space in order to find the optimal solution is not feasible in a reasonable amount of time, therefore meta-heuristics are often used to find a near-optimal solution. We propose here a multi-population evolutionary approach for near-optimal scheduling optimization, that guarantees end-to-end deadlines of tasks in distributed processing environments. We analyze two different exploration scenarios including single and multi-objective exploration. The main goal of the single objective exploration algorithm is to achieve the minimal number of processing units for all the tasks, whereas a multi-objective optimization tries to optimize two conflicting objectives simultaneously considering the total number of processing units and end-to-end finishing time for all the jobs. The potential of the proposed approach is demonstrated by experiments based on a use case for mapping a number of jobs covering industrial automation systems, where each of the jobs consists of a number of tasks in a distributed environment.

1 - 12 av 12
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf