https://www.mdu.se/

mdu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Fault Management Framework and Multi-layer Recovery Methodology for Resilient System
Ericsson Ab, Technology Management, Stockholm, Sweden.ORCID iD: 0000-0003-2598-6796
Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.ORCID iD: 0000-0002-5032-2310
Ericsson Ab, Sys Compute Dimensioning, Stockholm, Sweden.
Ericsson Ab, Sys Architecture, Stockholm, Sweden.
Show others and affiliations
2022 (English)In: 2022 6th International Conference on System Reliability and Safety, ICSRS 2022, Institute of Electrical and Electronics Engineers Inc. , 2022, p. 32-39Conference paper, Published paper (Refereed)
Abstract [en]

Fault management is a key function to guarantee the quality of the service. Research has done a lot to improve fault supervision, and investigation is ongoing in fault prediction, thanks to the potentials of artificial intelligence and machine learning. In this study, we propose a fault management framework that puts an emphasis on fault recovery: a framework developed on multi-layer function and a fault recovery methodology distributed over several technological layers. The basic principle of our proposal is that the system's complexity exposes it to a higher probability of temporary error. Newfound attention to the fault recovery phase is the key to keeping the service's quality high and saving maintenance costs by decreasing the return rate. 

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc. , 2022. p. 32-39
Keywords [en]
Fault Management, Recovery methodology, Resilient system, Artificial intelligence, Artificial intelligence learning, Fault prediction, Fault recovery, Machine-learning, Management frameworks, Management IS, Multi-layers, Resilient systems, Failure analysis
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:mdh:diva-62284DOI: 10.1109/ICSRS56243.2022.10067849ISI: 000981836500005Scopus ID: 2-s2.0-85151674429ISBN: 9781665470926 (print)OAI: oai:DiVA.org:mdh-62284DiVA, id: diva2:1751708
Conference
6th International Conference on System Reliability and Safety, ICSRS 2022, Venice 23 November 2022 through 25 November 2022
Available from: 2023-04-19 Created: 2023-04-19 Last updated: 2024-03-11Bibliographically approved
In thesis
1. The role of fault management in the embedded system design
Open this publication in new window or tab >>The role of fault management in the embedded system design
2024 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

In the last decade, the world of telecommunications has seen the value ofservices definitively affirmed and the loss of the connectivity value. This changeof pace in the use of the network (and available hardware resources) has ledto continuous, unlimited growth in data traffic, increased incomes for serviceproviders, and a constant erosion of operators’ incomes for voice and ShortMessage Service (SMS) traffic.The change in mobile service consumption is evident to operators. Themarket today is in the hands of over the top (OTT) media content deliverycompanies (Google, Meta, Netflix, Amazon, etc.), and The fifth generation ofmobile networks (5G), the latest generation of mobile architecture, is nothingother than how operators can invest in system infrastructure to participate in theprosperous service business.With the advent of 5G, the worlds of cloud and telecommunications havefound their meeting point, paving the way for new infrastructures and ser-vices, such as smart cities, industry 4.0, industry 5.0, and Augmented Reality(AR)/Virtual Reality (VR). People, infrastructures, and devices are connected toprovide services that we even struggle to imagine today, but a highly intercon-nected system requires high levels of reliability and resilience.Hardware reliability has increased since the 1990s. However, it is equallycorrect to mention that the introduction of new technologies in the nanometerdomain and the growing complexity of on-chip systems have made fault man-agement critical to guarantee the quality of the service offered to the customerand the sustainability of the network infrastructure.

In this thesis, our first contribution is a review of the fault managementimplementation framework for the radio access network domain. Our approachintroduces a holistic vision in fault management where there is increasingly moresignificant attention to the recovery action, the crucial target of the proposedframework. A new contribution underlines the attention toward the recoverytarget: we revisited the taxonomy of faults in mobile systems to enhance theresult of the recovery action, which, in our opinion, must be propagated betweenthe different layers of an embedded system ( hardware, firmware, middleware,and software). The practical adoption of the new framework and the newtaxonomy allowed us to make a unique contribution to the thesis: the proposalof a new algorithm for managing system memory errors, both temporary (soft)and permanent (hard)The holistic vision of error management we introduced in this thesis involveshardware that proactively manages faults. An efficient implementation of faultmanagement is only possible if the hardware design considers error-handlingtechniques and methodologies. Another contribution of this thesis is the def-inition of the fault management requirements for the RAN embedded systemhardware design.Another primary function of the proposed fault management framework isfault prediction. Recognizing error patterns means allowing the system to reactin time, even before the error condition occurs, or identifying the topology of theerror to implement more targeted and, therefore, more efficient recovery actions.The operating temperature is always a critical characteristic of embedded radioaccess network systems. Base stations must be able to work in very differenttemperature conditions. However, the working temperature also directly affectsthe probability of error for the system. In this thesis, we have also contributed interms of a machine-learning algorithm for predicting the working temperature ofbase stations in radio access networks — a first step towards a more sophisticatedimplementation of error prevention and prediction.

Place, publisher, year, edition, pages
Västerås: Mälardalens universitet, 2024
Series
Mälardalen University Press Licentiate Theses, ISSN 1651-9256 ; 357
Keywords
Fault Management, Resilient system, Recovery methodology.
National Category
Telecommunications
Research subject
Computer Science
Identifiers
urn:nbn:se:mdh:diva-66227 (URN)978-91-7485-639-2 (ISBN)
Presentation
2024-04-18, Milos, Mälardalens universitet, Västerås, 13:15 (English)
Opponent
Supervisors
Available from: 2024-03-11 Created: 2024-03-11 Last updated: 2024-03-28Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Vitucci, CarloSundmark, DanielNolte, Thomas

Search in DiVA

By author/editor
Vitucci, CarloSundmark, DanielNolte, Thomas
By organisation
Embedded Systems
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 84 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf