mdh.sePublications
Change search
Link to record
Permanent link

Direct link
BETA
Alternative names
Publications (10 of 16) Show all publications
Danielsson, J., Marcus, J., Seceleanu, T., Behnam, M. & Sjödin, M. (2019). Run-time Cache-Partition Controller for Multi-core Systems. In: : . Paper presented at In 45th Annual Conference of the IEEE Industrial Electronics Society (IECON), 2019.
Open this publication in new window or tab >>Run-time Cache-Partition Controller for Multi-core Systems
Show others...
2019 (English)Conference paper, Published paper (Refereed)
National Category
Computer Systems
Identifiers
urn:nbn:se:mdh:diva-45949 (URN)
Conference
In 45th Annual Conference of the IEEE Industrial Electronics Society (IECON), 2019
Available from: 2019-11-11 Created: 2019-11-11 Last updated: 2019-11-11Bibliographically approved
Danielsson, J., Seceleanu, T., Marcus, J., Behnam, M. & Sjödin, M. (2019). Testing Performance-Isolation in Multi-Core Systems. In: : . Paper presented at 43rd IEEE Annual Computer Software and Applications Conference, COMPSAC 2019; Milwaukee; United States; 15 July 2019 through 19 July 2019 (pp. 604-609). , Article ID 8754208.
Open this publication in new window or tab >>Testing Performance-Isolation in Multi-Core Systems
Show others...
2019 (English)Conference paper, Published paper (Refereed)
Abstract [en]

In this paper we present a methodology to be used for quantifying the level of performance isolation for a multi-core system. We have devised a test that can be applied to breaches of isolation in different computing resources that may be shared between different cores. We use this test to determine the level of isolation gained by using the Jailhouse hypervisor compared to a regular Linux system in terms of CPU isolation, cache isolation and memory bus isolation. Our measurements show that the Jailhouse hypervisor provides performance isolation of local computing resources such as CPU. We have also evaluated if any isolation could be gained for shared computing resources such as the system wide cache and the memory bus controller. Our tests show no measurable difference in partitioning between a regular Linux system and a Jailhouse partitioned system for shared resources. Using the Jailhouse hypervisor provides only a small noticeable overhead when executing multiple shared-resource intensive tasks on multiple cores, which implies that running Jailhouse in a memory saturated system will not be harmful. However, contention still exist in the memory bus and in the system-wide cache.

National Category
Computer Systems
Identifiers
urn:nbn:se:mdh:diva-45947 (URN)10.1109/COMPSAC.2019.00092 (DOI)2-s2.0-85072706762 (Scopus ID)978-1-7281-2607-4 (ISBN)
Conference
43rd IEEE Annual Computer Software and Applications Conference, COMPSAC 2019; Milwaukee; United States; 15 July 2019 through 19 July 2019
Available from: 2019-11-11 Created: 2019-11-11 Last updated: 2019-12-17Bibliographically approved
Marcus, J. (2018). Mallocpool: Improving Memory Performance Through Contiguously TLB Mapped Memory. In: International Conference on Emerging Technologies and Factory Automation ETFA'18: . Paper presented at International Conference on Emerging Technologies and Factory Automation ETFA'18, 04 Sep 2018, Torino, Italy.
Open this publication in new window or tab >>Mallocpool: Improving Memory Performance Through Contiguously TLB Mapped Memory
2018 (English)In: International Conference on Emerging Technologies and Factory Automation ETFA'18, 2018Conference paper, Published paper (Refereed)
Abstract [en]

Many computer systems allocate and free many memory chunks over the application lifespan. One problem with allocating many chunks is that they may not be contiguously allocated causing a massive strain on caches, translation lookaside buffers (TLB), and the memory subsystem. We have devised a method that preallocates a large memory fragment, mapping it with a variable size TLB, and then allocate subsequently requested chunks from that fragment. Our method has two advantages. The first is that all chunks allocated by malloc() are allocated contiguously, thus allowing a better cache-locality. The second advantage is that we can map the whole memory region with one variable size TLB reducing much of the 4kB TLB strain. These two advantages drastically improve the memory access performance. We have implemented our method in a Linux library which we can either dynamically or statically link to an existing application. The library is API-compatible to the GlibC library and can act as a drop-in replacement removing any need for legacy application changes.

National Category
Computer Systems
Identifiers
urn:nbn:se:mdh:diva-40900 (URN)
Conference
International Conference on Emerging Technologies and Factory Automation ETFA'18, 04 Sep 2018, Torino, Italy
Projects
ITS-EASY Post Graduate School for Embedded Software and SystemsDPAC - Dependable Platforms for Autonomous systems and Control
Available from: 2018-09-18 Created: 2018-09-18 Last updated: 2018-09-18Bibliographically approved
Danielsson, J., Marcus, J., Behnam, M., Sjödin, M. & Seceleanu, T. (2018). Measurement-based evaluation of data-parallelism for OpenCV feature-detection algorithms. In: Staying Smarter in a Smartening World COMPSAC'18: . Paper presented at 42nd IEEE Computer Software and Applications Conference, COMPSAC 2018; Tokyo; Japan; 23 July 2018 through 27 July 2018 (pp. 701-710).
Open this publication in new window or tab >>Measurement-based evaluation of data-parallelism for OpenCV feature-detection algorithms
Show others...
2018 (English)In: Staying Smarter in a Smartening World COMPSAC'18, 2018, p. 701-710Conference paper, Published paper (Refereed)
Abstract [en]

We investigate the effects on the execution time, shared cache usage and speed-up gains when using data-partitioned parallelism for the feature detection algorithms available in the OpenCV library. We use a data set of three different images which are scaled to six different sizes to exercise the different cache memories of our test architectures. Our measurements reveal that the algorithms using the default settings of OpenCV behave very differently when using data-partitioned parallelism. Our investigation shows that the executions of the algorithms SURF, Dense and MSER correlate to L3-cache usage and they are therefore not suitable for data-partitioned parallelism on multi-core CPUs. Other algorithms: BRISK, FAST, ORB, HARRIS, GFTT, SimpleBlob and SIFT, do not correlate to L3-cache in the same extent, and they are therefore more suitable for data-partitioned parallelism. Furthermore, the SIFT algorithm provides the most stable speed-up, resulting in an execution between 3 and 3.5 times faster than the original execution time for all image sizes. We also have evaluated the hardware resource usage by measuring the algorithm execution time simultaneously with the L3-cache usage. We have used our measurements to conclude which algorithms are suitable for parallelization on hardware with shared resources.

Keywords
Multi-core, OpenCV, Cache
National Category
Computer Systems
Identifiers
urn:nbn:se:mdh:diva-40855 (URN)10.1109/COMPSAC.2018.00105 (DOI)2-s2.0-85055434865 (Scopus ID)9781538626665 (ISBN)
Conference
42nd IEEE Computer Software and Applications Conference, COMPSAC 2018; Tokyo; Japan; 23 July 2018 through 27 July 2018
Projects
DPAC - Dependable Platforms for Autonomous systems and Control
Available from: 2018-09-20 Created: 2018-09-20 Last updated: 2019-11-11Bibliographically approved
Jägemar, M. (2018). Utilizing Hardware Monitoring to Improve the Quality of Service and Performance of Industrial Systems. (Doctoral dissertation). Västerås: Mälardalen University
Open this publication in new window or tab >>Utilizing Hardware Monitoring to Improve the Quality of Service and Performance of Industrial Systems
2018 (English)Doctoral thesis, monograph (Other academic)
Abstract [en]

The drastically increased use of information and communications technology has resulted in a growing demand for telecommunication network capacity. The demand for radically increased network capacity coincides with industrial cost-reductions due to an increasingly competitive telecommunication market. We have addressed the capacity and cost-reduction problems in three ways.

Our first contribution is a method to support shorter development cycles for new functionality and more powerful hardware. We reduce the development time by replicating the hardware usage of production systems in our test environment. Having a realistic test environment allows us to run performance tests at early design phases and therefore reducing the overall system development time.

Our second contribution is a method to improve the communication performance through selective and automatic message compression. The message compression functionality monitors transmissions continuously and selects the most efficient compression algorithm. The message compression functionality evaluates several parameters such as network congestion level, CPU usage, and message content. Our implementation extends the communication capacity of a legacy communication API running on Linux where it emulates a legacy real-time operating system.

In our third an final contribution, we implement a process allocation and scheduling framework to allow higher system performance and quality of service. The framework continuously monitors selected processes and correlate their performance to hardware usage such as caches, floating point unit and similar. The framework uses the performance-hardware correlation to minimize shared hardware resource congestion by efficiently allocate processes on multi-core CPUs. We have also designed a shared hardware resource aware process scheduler that makes it possible for multiple processes to co-exist on a CPU without affecting the performance of each other through hardware resource congestions. The allocation and scheduling techniques can be used to consolidate several functions on shared hardware thus reducing the system cost. We have implemented and evaluated our process scheduler as a new scheduling class in Linux.

We have conducted several case studies in an industrial environment and verified all contributions in the scope of a large telecommunication system manufactured by Ericsson.%We have deployed all techniques in a complicated industrial legacy system with minimal impact. We show that we can provide a cost-effective solution, which is an essential requirement for industrial systems.

Place, publisher, year, edition, pages
Västerås: Mälardalen University, 2018
Series
Mälardalen University Press Dissertations, ISSN 1651-4238 ; 270
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:mdh:diva-40179 (URN)978-91-7485-395-7 (ISBN)
Public defence
2018-10-18, Beta, Mälardalens högskola, Västerås, 13:15 (English)
Opponent
Supervisors
Projects
ITS-EASYDependable Platforms for Autonomous systems and Control
Available from: 2018-07-04 Created: 2018-07-04 Last updated: 2018-09-27Bibliographically approved
Marcus, J., Ermedahl, A., Eldh, S. & Behnam, M. (2017). A Scheduling Architecture for Enforcing Quality of Service in Multi-Process Systems. In: International Conference on Emerging Technologies And Factory Automation ETFA'17: . Paper presented at International Conference on Emerging Technologies And Factory Automation ETFA'17, 12 Aug 2017, Limassol, Cyprus (pp. 1-8).
Open this publication in new window or tab >>A Scheduling Architecture for Enforcing Quality of Service in Multi-Process Systems
2017 (English)In: International Conference on Emerging Technologies And Factory Automation ETFA'17, 2017, p. 1-8Conference paper, Published paper (Refereed)
Abstract [en]

There is a massive deployment of multi-core CPUs. It requires a significant drive to consolidate multiple services while still achieving high performance on these off-the-shelf CPUs. Each function had earlier an own execution environment, which guaranteed a certain Quality of Service (QoS). Consolidating multiple services can give rise to shared resource congestions, resulting in lower and non-deterministic QoS. We describe a method to increase the overall system performance by assisting the operating system process scheduler to utilize shared resources more efficiently. Our method utilizes hardware- and system-level performance counters to profile the shared resource usage of each process. We also use a big-data approach to analyzing statistics from many nodes. The outcome of the analysis is a decision support model that is utilized by the process scheduler when allocating and scheduling process. Our scheduler can efficiently distribute processes compared to traditional CPU-load based process schedulers by considering the hardware capacity and previous scheduling- and allocation decisions.

National Category
Computer Systems
Identifiers
urn:nbn:se:mdh:diva-37025 (URN)10.1109/ETFA.2017.8247613 (DOI)000427812000048 ()2-s2.0-85044479964 (Scopus ID)9781509065059 (ISBN)
Conference
International Conference on Emerging Technologies And Factory Automation ETFA'17, 12 Aug 2017, Limassol, Cyprus
Projects
ITS-EASY Post Graduate School for Embedded Software and SystemsDPAC - Dependable Platforms for Autonomous systems and Control
Available from: 2017-11-20 Created: 2017-11-20 Last updated: 2019-06-25Bibliographically approved
Jägemar, M., Lisper, B., Eldh, S., Ermedahl, A. & Andai, G. (2016). Automatic Benchmarking for Early-Stage Performance Verification of Industrial Systems.
Open this publication in new window or tab >>Automatic Benchmarking for Early-Stage Performance Verification of Industrial Systems
Show others...
2016 (English)Manuscript (preprint) (Other academic)
National Category
Computer Engineering
Research subject
Computer Science
Identifiers
urn:nbn:se:mdh:diva-31499 (URN)
Available from: 2016-05-03 Created: 2016-05-03 Last updated: 2018-01-10Bibliographically approved
Jägemar, M., Eldh, S., Ermedahl, A. & Lisper, B. (2016). Automatic Message Compression with Overload Protection. Journal of Systems and Software, 121(1 nov), 209-222
Open this publication in new window or tab >>Automatic Message Compression with Overload Protection
2016 (English)In: Journal of Systems and Software, ISSN 0164-1212, E-ISSN 1873-1228, Vol. 121, no 1 nov, p. 209-222Article in journal (Refereed) Published
Abstract [en]

In this paper, we show that it is possible to increase the message throughput of a large-scale industrial system by selectively compress messages. The demand for new high-performance message processing systems conflicts with the cost effectiveness of legacy systems. The result is often a mixed environment with several concurrent system generations. Such a mixed environment does not allow a complete replacement of the communication backbone to provide the increased messaging performance. Thus, performance-enhancing software solutions are highly attractive. Our contribution is 1) an online compression mechanism that automatically selects the most appropriate compression algorithm to minimize the message round trip time; 2) a compression overload mechanism that ensures ample resources for other processes sharing the same CPU. We have integrated 11 well-known compression algorithms/configurations and tested them with production node traffic. In our target system, automatic message compression results is a 9.6% reduction of message round trip time. The selection procedure is fully automatic and does not require any manual intervention. The automatic behavior makes it particularly suitable for large systems where it is difficult to predict future system behavior.

Keywords
Automatic compression, Message compression, Feedback control, Performance prediction, Network performance, Mobile systems
National Category
Computer Engineering
Identifiers
urn:nbn:se:mdh:diva-31500 (URN)10.1016/j.jss.2016.04.010 (DOI)000384864500015 ()2-s2.0-84966728710 (Scopus ID)
Available from: 2016-05-03 Created: 2016-05-03 Last updated: 2018-01-10Bibliographically approved
Jägemar, M. (2016). Utilizing Hardware Monitoring to Improve the Performance of Industrial Systems. (Licentiate dissertation). Västerås: Mälardalen University
Open this publication in new window or tab >>Utilizing Hardware Monitoring to Improve the Performance of Industrial Systems
2016 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

The drastically increasing use of Information and Communications Technology has resulted in a growing demand for network capacity. In this Licentiate thesis, we show how to monitor, model and finally improve network performance for large industrial systems. We also show how to use modeling techniques to move performance testing to an earlier design phase, with the aim to reduce the total development time of large systems. Our first contribution is a low-intrusive method for long-term hardware characteristic measurements of production nodes located at customer sites. Our second contribution is a technique to mimic the hardware usage of a production environment by creating a characteristics model. The cloned environment makes function test suites more realistic. The goal when creating the model is to reduce the system development time by moving late-stage performance testing to early design phases thereby improving the quality of the test environment. The third and final contribution is a network performance improvement where we dynamically trade computational capacity for a message round-trip time reduction when there are CPU cycles to spare. We have implemented an automatic feedback controlled mechanism for transparent message compression resulting in improved messaging performance between interconnected network nodes. Our mechanism continuously evaluates eleven compression algorithms on message stream content and network congestion level. The message subsystem will use the compression algorithm that provides the lowest messaging time. If the message content or network load change, a new evaluation is performed. We have conducted several case studies in an industrial environment and verified all contributions on a large telecommunication system manufactured by Ericsson. System engineers frequently use the monitoring and modeling functionality for debugging purposes in production environments. We have deployed all techniques in a complicated industrial legacy system with minimal impact. We show that we can provide not only a solution but a cost-effective solution, which is an important requirement for industrial systems.

Place, publisher, year, edition, pages
Västerås: Mälardalen University, 2016
Series
Mälardalen University Press Licentiate Theses, ISSN 1651-9256 ; 200
National Category
Computer Engineering
Research subject
Computer Science
Identifiers
urn:nbn:se:mdh:diva-31501 (URN)978-91-7485-203-5 (ISBN)
Presentation
2016-06-20, Kappa, Mälardalens högskola, Västerås, 13:00 (English)
Opponent
Supervisors
Projects
ITS-EASY
Available from: 2016-05-04 Created: 2016-05-03 Last updated: 2018-01-10Bibliographically approved
Jägemar, M. & Dodig-Crnkovic, G. (2015). Cognitively Sustainable ICT with Ubiquitous Mobile Services - Challenges and Opportunities. In: The 37th International Conference on Software Engineering ICSE: . Paper presented at 37th IEEE/ACM International Conference on Software Engineering, ICSE 2015; Florence; Italy; 16 May 2015 through 24 May 2015 (pp. 531-540). (37)
Open this publication in new window or tab >>Cognitively Sustainable ICT with Ubiquitous Mobile Services - Challenges and Opportunities
2015 (English)In: The 37th International Conference on Software Engineering ICSE, 2015, no 37, p. 531-540Conference paper, Published paper (Refereed)
Abstract [en]

Information and Communication Technology (ICT) has led to an unprecedented development in almost all areas of human life. It forms the basis for what is called “the cognitive revolution” – a fundamental change in the way we communicate, feel, think and learn based on an extension of individual information processing capacities by communication with other people through technology. This so-called “extended cognition” shapes human relations in a radically new way. It is accompanied by a decrease of shared attention and affective presence within closely related groups. This weakens the deepest and most important bonds, that used to shape human identity. Sustainability, both environmental and social (economic, technological, political and cultural) is one of the most important issues of our time. In connection with “extended cognition” we have identified a new, basic type of social sustainability that everyone takes for granted, and which we claim is in danger due to our changed ways of communication. We base our conclusion on a detailed analysis of the current state of the practice and observed trends. The contribution of our article consists of identifying cognitive sustainability and explaining its central role for all other aspects of sustainability, showing how it relates to the cognitive revolution, its opportunities and challenges. Complex social structures with different degrees of proximity have always functioned as mechanisms behind belongingness and identity. To create a long-term cognitive sustainability, we need to rethink and design new communication technologies that support differentiated and complex social relationships.

Keywords
Cognitive sustainability, Social sustainability, Sustainable ICT, Cognitive revolution, Privacy, Shared attention, Social cognition, Software engineering for social good.
National Category
Software Engineering Computer Systems
Identifiers
urn:nbn:se:mdh:diva-28131 (URN)10.1109/ICSE.2015.189 (DOI)000380572400065 ()2-s2.0-84951830525 (Scopus ID)9781479919345 (ISBN)
External cooperation:
Conference
37th IEEE/ACM International Conference on Software Engineering, ICSE 2015; Florence; Italy; 16 May 2015 through 24 May 2015
Available from: 2015-06-09 Created: 2015-06-08 Last updated: 2018-01-11Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-2612-4135

Search in DiVA

Show all publications