Most of the critical challenges seen in the past decades have impacted citizens in a global way. Given shrinking resources, educationists find preparing students for the global market place a formidable challenge. Hence exposing students to multi-lateral educational initiatives are critical to their growth, understanding and future contributions. This paper focuses on European Union's Erasmus Mundus programs, involving academic cooperation amongst international universities in engineering programs. A phased undergraduate engineering program with multiple specializations is analyzed within this context. Based on their performance at the end of first phase, selected students were provided opportunities using scholarship to pursue completion of their degree requirements at various European universities. This paper will elaborate the impact of differing pedagogical interventions, language and cultural differences amongst these countries on students in diverse engineering disciplines. The data presented is based on on the feedback analysis from Eramus Mundus students (N=121) that underwent the mobility programs. The findings have given important insights into the structure of the initiative and implications for academia and education policy makers for internationalizing engineering education. These included considering digital interventions such as MOOCs (Massive Open Online Courses) and Virtual Laboratory (VL) initiatives for systemic reorganization of engineering education.
Most of the critical challenges seen in the past decades have impacted citizens in a global way. Given shrinking resources, educationists find preparing students for the global market place a formidable challenge. Hence exposing students to multi-lateral educational initiatives are critical to their growth, understanding and future contributions. This paper focuses on European Union's Erasmus Mundus programs, involving academic cooperation amongst international universities in engineering programs. A phased undergraduate engineering program with multiple specializations is analyzed within this context. Based on their performance at the end of first phase, selected students were provided opportunities using scholarship to pursue completion of their degree requirements at various European universities. This paper will elaborate the impact of differing pedagogical interventions, language and cultural differences amongst these countries on students in diverse engineering disciplines. The data presented is based on on the feedback analysis from Eramus Mundus students (N = 121) that underwent the mobility programs. The findings have given important insights into the structure of the initiative and implications for academia and education policy makers for internationalizing engineering education. These included considering digital interventions such as MOOCs (Massive Open Online Courses) and Virtual Laboratory (VL) initiatives for systemic reorganization of engineering education.
Hazard analysis for a single system focuses on identifying and evaluating potential hazards associated with the individual system, its components, and their interactions. There are well-established hazard analysis techniques that are widely used to identify hazards for single systems. However, unlike single systems, hazard analysis in a System of Systems (SoS) must focus on analyzing the potential hazards (including emergent ones) that can arise from the interactions between multiple individual systems. This type of analysis considers the complex interactions between systems and the interdependence between their components and the environment in which they operate. Therefore, it is necessary to understand the application scenarios of SoS and to employ a systematic approach to identify all potential hazards. This paper applies the Composite Hazard Analysis Technique (CompHAT) to an industrial case study from a mining and equipment domain. The results show that the CompHAT is useful in identifying the interaction faults and their propagation routes between components of a constituent system and between constituent systems in an SoS. We also report that, due to the tool support, CompHAT can be beneficial for safety engineers to trace the faults in the network of an SoS in a more efficient and effective manner.
Context: Collaborative systems enable multiple independent systems to work together towards a common goal. These systems can include both human-system and system-system interactions and can be found in a variety of settings, including smart manufacturing, smart transportation, and healthcare. Safety is an important consideration for collaborative systems because one system's failure can significantly impact the overall system performance and adversely affect other systems, humans or the environment. Goal: Fail-safe mechanisms for safety-critical systems are designed to bring the system to a safe state in case of a failure in the sensors or actuators. However, a collaborative safety-critical system must do better and be safe-operational, for e.g., a failure of one of the members in a platoon of vehicles in the middle of a highway is not acceptable. Thus, failures must be compensated, and compliance with safety constraints must be ensured even under faults or failures of constituent systems. Method: In this paper, we model and analyze safety for collaborative safety-critical systems using hierarchical Coloured Petri nets (CPN). We used an automated Human Rescue Robot System (HRRS) as a case study, modeled it using hierarchical CPN, and injected some specified failures to check and confirm the safe behavior in case of unexpected scenarios. Results: The system behavior was observed after injecting three types of failures in constituent systems, and then safety mechanisms were applied to mitigate the effect of these failures. After applying safety mechanisms, the HRRS system's overall behavior was again observed both in terms of verification and validation, and the simulated results show that all the identified failures were mitigated and HRRS completed its mission. Conclusion: It was found that the approach based on formal methods (CPN modeling) can be used for the safety analysis, modeling, validation, and verification of collaborative safety-critical systems like HRRS. The hierarchical CPN provides a rigorous way of modeling to implement complex collaborative systems.
Critical systems often use N-modular redundancy to tolerate faults in subsystems. Traditional approaches to N-modular redundancy in distributed, loosely-synchronised, real-time systems handle time and value errors separately: a voter detects value errors, while watchdog-based health monitoring detects timing errors. In prior work, we proposed the integrated Voting on Time and Value (VTV) strategy, which allows both timing and value errors to be detected simultaneously. In this paper, we show how VTV can be harnessed as part of an overall fault tolerance strategy and evaluate its performance using a well-known control application, the Inverted Pendulum. Through extensive simulations, we compare the performance of Inverted Pendulum systems which employs VTV and alternative voting strategies to demonstrate that VTV better tolerates well-recognised faults in this realistically complex control problem.
In dependable embedded real-time systems, typically built of computing nodes exchanging messages over reliability-constrained networks, the provision of schedulability guarantees for task and message sets under realistic fault and error assumptions is an essential requirement, though complex and tricky to achieve. An important factor to be considered in this context is the random nature of occurrences of faults and errors, which, if addressed in the traditional schedulability analysis by assuming a rigid worst-case occurrence scenario, may lead to inaccurate results. In this work we propose a framework for end-to-end probabilistic schedulability analysis for real-time tasks exchanging messages over Controller Area Network under stochastic errors.
In this paper, we present a general framework which allows the designer to specify a wide range of criteria for allocation. Major factors considered as part of our framework are mixed criticalities of tasks, schedulability, power consumption, fault-tolerance, and dependability requirements in addition to typical functional aspects such as memory constraints. This being a global optimization problem, we are forced to use meta-heuristic algorithms, and we were able to represent these requirements in a very intuitive manner by the usage of energy functions in simulated annealing. We envision the proposed methodology as a quite simple, scalable, as well as computationally effective solution covering a wide range of system architectures and solution spaces.
Dependable communications is becoming a critical factor due to the pervasive usage of networked embedded systems that increasingly interact with human lives in one way or the other in many real-time applications. Though many smaller systems are providing dependable services employing uniprocesssor solutions, stringent fault containment strategies etc., these practices are fast becoming inadequate due to the prominence of COTS in hardware and component based development(CBD) in software as well as the increased focus on building 'system of systems'. Hence the repertoire of design paradigms, methods and tools available to the developers of distributed real-time systems needs to be enhanced in multiple directions and dimensions. In future scenarios, potentially a network needs to cater to messages of multiple criticality levels (and hence varied redundancy requirements) and scheduling them in a fault tolerant manner becomes an important research issue. We address this problem in the context of Controller Area Network (CAN), which is widely used in automotive and automation domains, and describe a methodology which enables the provision of appropriate scheduling guarantees. The proposed approach involves definition of fault-tolerant windows of execution for critical messages and the derivation of message priorities based on earliest deadline first (EDF).
Real-time systems typically have to satisfy complex requirements mapped to the timing attributes of the tasks that are eventually guaranteed by the underlying scheduler. These systems consist of a mix of hard and soft tasks with varying criticalities as well as associated fault tolerance (FT) requirements. Often time redundancy techniques are preferred in many embedded applications and hence it is extremely important to devise appropriate methodologies for scheduling real-time tasks under fault assumptions. Additionally, the relative criticality of tasks could undergo changes during the evolution of the system. Hence scheduling decisions under fault assumptions have to reflect all these important factors in addition to the resource constraints.
In this paper we propose a framework for 'FTfeasibility', i.e., to provide a priori guarantees that all critical tasks in the system will meet their deadlines even in case of faults. Our main objective here is to ensure FTfeasibility of all critical tasks in the system and do so with minimal costs and without any fundamental changes in the scheduling paradigm. We demonstrate its applicability in scenarios where the FT strategy employed is re-execution of the affected tasks or an alternate action upon occurrence of transient faults or software design faults. We analyse a feasible set of tasks and propose methods to adapt it to varying FT requirements without modifications to the underlying scheduler. We do so by reassigning task attributes to achieve FT-feasibility while keeping the costs minimised.
In dependable real-time systems, provision of schedulability guarantees for task sets under realistic fault and error assumptions is an essential requirement, though complex and tricky to achieve. An important factor to be considered in this context is the random nature of occurrences of faults and errors, which, if addressed in the traditional schedulability analysis by assuming a rigid worst case occurrence scenario, may lead to inaccurate results. In this paper we first propose a stochastic fault and error model which has the capability of modeling error bursts in lieu of the commonly used simplistic error assumptions in processor scheduling. We then present a novel schedulability analysis that accounts for a range of worst case scenarios generated by stochastic error burst occurrences on the response times of tasks scheduled under the fixed priority scheduling (FPS) policy. Finally, we describe a methodology for the calculation of probabilistic schedulability guarantees as a weighted sum of the conditional probabilities of schedulability under specified error burst characteristics.
Dependable real-time systems typically consist of tasks of mixed-criticality levels with associated fault tolerance (FT) requirements and scheduling them in a fault-tolerant manner to efficiently satisfy these requirements is a challenging problem. From the designers' perspective, the most natural way to specify the task criticalities is by expressing the reliability requirements at task level, without having to deal with low level decisions, such as deciding on which FT method to use, where in the system to implement the FT and the amount of resources to be dedicated to the FT mechanism. Hence, it is extremely important to devise methods for translating the highlevel requirement specifications for each task into the low-level scheduling decisions needed for the FT mechanism to function efficiently and correctly. In this paper, we focus achieving FT by redundancy in the temporal domain, as it is the commonly preferred method in embedded applications to recover from transient and intermittent errors, mainly due to its relatively low cost and ease of implementation. We propose a method which allows the system designer to specify task-level reliability requirements and provides a priori probabilistic scheduling guarantees for real-time tasks with mixed-criticality levels in the context of preemptive fixed-priority scheduling. We illustrate the method on a running example.
Hard real-time applications typically have to satisfy high dependability requirements in terms of fault tolerance in both the value and the time domains. Loosely synchronized real-time systems, which represent many of the systems that are developed, make any form of voting difficult as each replica may provide different outputs independent of whether there has been an error or not. This can also lead to false positives and false negatives which makes achieving fault tolerance, and hence dependability, difficult. We have earlier proposed a majority voting technique, ”Voting on Time and Value” (VTV) that explicitly considers combinations of value and timing errors, targeting loosely synchronised systems. In this paper, we extend VTV to enable voter parameter tuning to obtain the desired user specified trade-offs between the false positive and false negative rates in the voter outputs. We evaluate the performance of VTV against Compare Majority Voting (CMV), which is a known voting approach applicable in similar contexts, through extensive simulation studies. The results clearly demonstrate that VTV outperforms CMV in all scenarios with lower false negative rates.
The fundamental requirement for the design of effective and efficient fault-tolerance mechanisms in dependable real-time systems is a realistic and applicable model of potential faults, their manifestations and consequences. Fault and error models also need to be evolved based on the characteristics of the operational environments or even based on technological advances. In this paper we propose a probabilistic burst error model in lieu of the commonly used simplistic fault assumptions in the context of processor scheduling. We present a novel schedulability analysis that accounts for the worst case interference caused by error bursts on the response times of tasks scheduled under the fixed priority scheduling (FPS) policy. Further, we describe a methodology for the calculation of probabilistic schedulability guarantees as a weighted sum of the conditional probabilities of schedulability under specified error burst characteristics. Finally, we identify potential sources of pessimism in the worst case response time calculations and discuss potential means for circumventing these issues.
Networked embedded systems used in many real-time (RT) applications rely on dependable communication. Controller Area Network (CAN) has gained wider acceptance as a standard in a large number of applications, mostly due to its cost effectiveness, predictable performance, and its fault-tolerance capability. Research so far has focused on rather simplistic error models which assume only singleton errors separated by a minimum inter-arrival time. However, these systems are often subject to faults that manifest as error bursts of various lengths which have an adverse effect on the message response times that needs to be accounted for. Furthermore, an important factor to be considered in this context is the random nature of occurrences of faults and errors, which, if addressed in the traditional schedulability analysis by assuming a rigid worst case occurrence scenario, may lead to inaccurate results. In this paper we first present a stochastic fault and error model which has the capability of modeling error bursts in lieu of the commonly used simplistic error assumptions. We then present a methodology which enables the provision of appropriate probabilistic RT guarantees in distributed RT systems for the particular case of message scheduling on CAN under the assumed error assumptions
The fundamental requirement for the design of effective and efficient fault-tolerance mechanisms in dependable real-time systems is a realistic and applicable model of potential faults, their manifestations and consequences. Fault and error models also need to be evolved based on the changes in the environments of usage or even based on technological advances. In this paper we propose a novel probabilistic burst error model in lieu of the commonly used simplistic fault assumptions. We introduce an approach to reason about real-time systems schedulability under the proposed error model in a probabilistic manner. We first present a sufficient analysis that accounts for the worst case interference caused by error bursts on the response times of tasks scheduled under the fixed priority scheduling (FPS) policy. Further, we identify potential sources of pessimism in the calculations and propose an algorithm that refines the results.
Dependable real-time systems typically consist of tasks of multiple criticality levels and scheduling them in a fault-tolerantmanner is a challenging problem. Redundancy in the physical and temporal domains for achieving fault tolerance has been often dealt independently based on the types of errors one needs to tolerate. To our knowledge, there had been no work which tries to integrate fault tolerant scheduling and multiple redundancy mechanisms. In this paper we propose a novel cascading redundancy approach within a generic fault tolerant scheduling framework. The proposed approach is capable of tolerating errors with a wider coverage (with respect to error frequency and error types) than time and space redundancy in isolation, allows tasks with mixed criticality levels, is independent of the scheduling technique and, above all, ensures that every critical task instance can be feasibly replicated in both time and space.
Component-Based Development (CBD) of software, with its successes in enterprise computing, has the promise of being a good development model due to its cost effectiveness and potential for achieving high quality of components by virtue of reuse. However, for systems with dependability concerns, such as real-time systems, a major challenge in using CBD consists of predicting dependability attributes, or providing dependability assertions, based on the individual component properties and architectural aspects. In this paper, we propose a framework which aims to address this challenge. Specifically, we present a revised error classification together with error propagation aspects, and briefly sketch how to compose errormodels within the context of Component-Based Systems (CBS). The ultimate goal is to perform the analysis on a given CBS, in order to find bottle-necks in achieving dependability requirements and to provide guidelines to the designer on the usage of appropriate error detection and fault tolerance mechanisms.
Real-time applications typically have to satisfy high dependability requirements and require fault tolerance in both value and time domains. A widely used approach to ensure fault tolerance in dependable systems is the N-modular redundancy (NMR) which typically uses a majority voting mechanism. However, NMR primarily focuses on producing the correct value, without taking into account the time dimension. In this paper, we propose a new approach, Voting on Time and Value (VTV), applicable to real-time systems, which extends the modular redundancy approach by explicitly considering both value and timing failures, such that correct value is produced at a correct time, under specified assumptions. We illustrate our voting approach by instantiating it in the context of the well-known triple modular redundancy (TMR) approach. Further, we present a generalized version targeting NMR that enables a high degree of customization from the user perspective.
Dependable communication is becoming a critical factor due to the pervasive usage of networked embedded systems that increasingly interact with human lives in many real-time applications. Controller Area Network (CAN) has gained wider acceptance as a standard in a large number of industrial applications, mostly due to its efficient bandwidth utilization, ability to provide real-time guarantees, as well as its fault-tolerant capability. However, the native CAN fault-tolerant mechanism assumes that all messages transmitted on the bus are equally critical, which has an adverse impact on the message latencies, results in the inability to meet user defined reliability requirements, and, in some cases, even leads to violation of timing requirements. As the network potentially needs to cater to messages of multiple criticality levels (and hence varied redundancy requirements), scheduling them in an efficient fault-tolerant manner becomes an important research issue. We propose a methodology which enables the provision of appropriate guarantees in CAN scheduling of messages with mixed criticalities. The proposed approach involves definition of fault-tolerant feasibility windows of execution for critical messages, and off-line derivation of optimal message priorities that fulfill the user specified level of fault-tolerance.
Increased levels of automation together with increased complexity of automation systems brings increased responsibility on the system developers in terms of quality demands from the legal perspectives as well as company reputation. Component based development of software systems provides a viable and cost-effective alternative in this context provided one can address the quality and safety certification demands in an efficient manner. In this paper we present our vision, challenges and a brief outline of various research themes in which our team is engaged currently within two major projects.
The product line engineering approach is a promising concept to identify and manage reuse in a structured and efficient way and is even applied for the development of safety critical embedded systems. Managing the complexity of variability and addressing functional safety at the same time is challenging and is not yet solved. Variability management is an enabler to both establish traceability and making necessary information visible for safety engineers. We identify a set of requirements for such a method and evaluate existing variability management methods. We apply the most promising method to an industrial case and study its suitability for developing safety critical product family members. This study provides positive feedback on the potential of the model-based method PLUS in supporting the development of functional safety critical embedded systems in product lines. As a result of our analysis we suggest potential improvements for it.
The increasing use of embedded systems to provide new functionality and customer experience requires developing the embedded systems carefully. As a new challenge, autonomous systems are developed to be working in a fleet to provide production workflows. Developing such a system-of-systems requires utilizing various software tools to manage the complexity. One task in developing safety-critical products, in general, is to analyze if the applied tools can introduce failures into the final product. Today's functional safety standards consider only single software tools for analysis. In our industrial work, we can observe a trend towards supporting product lines. A common configurable platform is developed to support a range of different products. Developing such a platform and supporting variability, a toolchain is created where software tools are glued together using scripts to support product lines and automatically generate compiled code. The current functional safety standards do not straight forward support this. This paper discusses how software tools need to support functional safety and show limitations by providing an industrial case. We provide a model-based approach to describe a toolchain and show its application to an industrial case. To analyze potential failures in the toolchain, we utilize the HAZOP method and show its application.
Autonomous vehicles grow importance in many domains and depending on the domain and user needs, autonomous vehicles can be designed as stand-alone solutions as in the automotive domain or as part of a fleet with a specific purpose as in the earth moving machinery domain. Contemporary hazard analysis methods primarily focus on analyzing hazards for single systems. Such an analysis requires knowledge about typical usage of a product, and it is evaluated among others if an operator is able to handle a critical situation. Each hazard analysis method requires specific information as input in order to conduct the method. However, for system-of-systems it is not yet clear how to analyze hazards and provide the required information. In this paper we describe a use case from the earth moving machinery domain where autonomous machines collaborate as a system-of-systems to achieve the mission. We propose a hierarchical process to document a system-of-systems and propose the use of model-based development methods. In this work we discuss how to utilize the provided details in a hazard analysis. Our approach helps to design a complex system-of-systems and supports hazard analysis in a more effective and efficient manner.
Automation of earth moving machinery enables improving existing production workflows in various applications like surface mines, material handling operations or material transporting. Such connected and collaborating autonomous machines can be seen as a system-of-systems. It is not yet clear how to consider safety during the development of such systemof- systems (SoS). One potentially useful approach to analyze the safety for complex systems is the System Theoretic Process Analysis (STPA). However, STPA is essentially suitable to static monolithic systems and lacks the ability to deal with emergent and dysfunctional behaviors in the case of SoS. These behaviors if not identified could potentially lead to hazards and it is important to provide mechanisms for SoS developers/integrators to capture such critical situations. In this paper, we present an approach for enriching STPA to provide the ability to check whether the distributed constituent systems of a SoS have a consistent perspective of the global state which is necessary to ensure safety. In other words, these checks must be capable at least to identify and highlight inconsistencies that can lead to critical situations. We describe the above approach by taking a specific case of state change related issues that could potentially be missed by STPA by looking at an industrial case. By applying Petri nets, we show that possible critical situations related to state changes are not identified by STPA. In this context we also propose a modelbased extension to STPA and show how our new process could function in tandem with STPA.
Automation is gaining importance in many domains, such as vehicle platoons, smart manufacturing, smart cities, and defense applications. However, the automated system must guarantee safe operation in any critical situation without humans in the loop as a fall-back solution. Additionally, autonomy can cause new types of hazards that need to be identified and analyzed.This paper studies cases from the transportation domain where autonomous vehicles are integrated into workflows in an open-surface mine for efficient material transportation. In this application many individual systems collaborate to form a system-of-system (SoS) to achieve the mission goals. The complexity of such an SoSand the dependencies between the constituent systems complicate the safety analysis. In an SoS there exist several causes leading to new emergent hazards, failure of identification of which could lead to catastrophes.
In this paper, we describe an SoS-centric process called 'SafeSoS', capable of identifying emergent hazards, through structuring the complex characteristics of an SoS on three hierarchical levels to enable better comprehension and analysis. We describe the process in detail and apply the process to an industrial transportation system from the earth-moving machinery domain.As part of the SafeSoS process, we utilize model-based formalisms to describe the characteristics of the application and the constituent systems, which form the input for analyzing the safety of the resulting SoS.We apply the safety analysis methods HiSoS, SMM, FTA, FMEA and Hazop to the industrial SoS with the purpose to identify emergent hazards. As a result of our work, we show how to identify and analyze emergent hazards by the help of our SafeSoS approach.
Automation is becoming prevalent in more and more industrial domains due to the potential benefits in cost reduction as well as the new approaches/solutions they enable. When machines are automated and utilized in system-of-systems, a thorough analysis of potential critical scenarios is necessary to derive appropriate design solutions that are safe as well. Hazard analysis methods like PHA, FTA or FMEA help to identify and follow up potential risks for the machine operators or bystanders and are well-established in the development process for safety critical machinery. However, safety certified individual machines can no way guarantee safety in the context of system-of-systems since their integration and interactions could bring forth newer hazards. Hence it is paramount to understand the application sce- narios of the system-of-systems and to apply a structured method to identify all potential hazards. In this paper, we 1) provide an overview of proposed hazard analysis methods for system-of- systems, 2) describe a case from construction equipment domain, and 3) apply the well-known System-Theoretic Process Analysis (STPA)f to our case. Our experiences during the case study and the analysis of results clearly point out certain inadequacies of STPA in the context of system-of-systems and underlines the need for the development of improved techniques for safety analysis of system-of-systems.
—Automating a quarry site as developed within the electric site research project at Volvo Construction Equipment is an example of a directed system-of-systems (SoS). In our case automated machines and connected smart systems are utilized to improve the work-flow at the site. We currently work on conducting hazard and safety analyses on the SoS level. Performing a hazard analysis on a SoS has been a challenge in terms of complexity and work effort. We elaborate on the suitability of methods, discuss requirements on a feasible method, and propose a tailoring of the STPA method to leverage complexity.
Today's industrial product lines in the automotive and construction equipment domain face the challenge to show functional safety standard compliance and argue for the absence of failures for all derived product variants. The product line approaches are not su cient to support practitioners to trace safety-related characteristics through development. We aim to provide aid in creating a safety case for a certain con guration in a product line such that overall less e ort is necessary for each con guration. In this paper we 1) discuss the impact of functional safety on product line development, 2) propose a model-based approach to capture safety-related characteristics during concept phase for product lines and 3) analyze the usefulness of our proposal.
Developing safety-critical products like cars, trains, or airplanes requires rigor in following development processes, and evidence for product safety must be collected. Safety needs to be considered during each development step and traced through the development life cycle. The current standards and approaches focus on single human-operated products. The technical evolution enables integrating existing products and new autonomous products into system-of-systems to automate workflows and production streams. Developing safety-critical systems-of-systems requires similar processes and mapping to safety-related activities. However, it is unclear how to consider safety during different development steps for a safety-critical system-of-systems. The existing hazard analysis methods are not explicitly mapped to developing a system-of-systems and are vague about the required information on the intended behavior. This paper focuses on the concept phase for developing a system-of-systems, where different technical concepts for a specific product feature are evaluated. Specifically, we concentrate on the evaluation of the safety properties of each concept. We present a process to support the concept phase and apply a model-driven approach to capture the system-of-systemsâ relevant information. We then show how this knowledge is used for conducting an FMEA and HAZOP analysis. Lastly, the results from the analysis are mapped back into the sequence diagrams. This information is made available during the next development stages. We apply the method during the concept phase for designing an industrial system-of-systems. Our approach helps to design complex system-of-systems and supports concept evaluation considering the criticality of the concept under consideration.
Developing safety critical products demands a clear safety argumentation for each product in spite of whether it has been derived from a product line or not. The functional safety standards do not explain how to develop safety critical products in product lines, and the product line concept is lacking specific approaches to develop safety critical products. Nonetheless, product lines are well-established concepts even in companies developing safety critical products. In this paper we present the results of an exploratory study interviewing 15 practitioners from 6 different companies. We identify typical challenges and approaches from industry and discuss their suitability. The challenges and approaches brought out by this study help us to identify and enhance applicable methods from the product line engineering domain that can meet the challenges in the safety critical domain as well.
Electronic systems in the automotive domain implement safety critical functionality in vehicles and the safety certification process according to a functional safety standard is time consuming and a big part of the expenses of a development project. We describe the functional safety certification of electronic automotive systems by presenting a use case from the construction equipment industry. In this context, we highlight some of the major challenges we foresee, while using a product-line approach to achieve efficient functional safety certification of vehicle variants. We further elaborate on the impact of functional safety certification when applying the component-based approach on developing safety critical product variants and discuss the implications by cost modeling and analysis.
Methods for analyzing hazards related to individual systems are well studied and established in industry today. When system-of-systems are set up to achieve new emergent behavior, hazards specifically caused by malfunctioning behavior of the complex interactions between the involved systems may not be revealed by just analyzing single system hazards. A structured process is required to reduce the complexity to enable identification of hazards when designing system-of-systems. In this paper we first present how hazards are identified and analyzed using hazard and risk assessment (HARA) methodology by the industry in the context of single systems. We describe systems-of-systems and provide a quarry site automation example from the construction equipment domain. We propose a new structured process for identifying potential hazards in systems-of-systems (HISoS), exemplified in the context of the provided example. Our approach helps to streamline the hazard analysis process in an efficient manner thus helping faster certification of system-of-systems.
The technical evolution enables the development and application of autonomous systems in various domains. In the on-road and off-road vehicle domains, autonomous vehicles are applied in different contexts. Autonomous cars are designed as single system solutions, while in other scenarios, multiple autonomous or semi-autonomous vehicles are integrated into a system-of-systems. We utilize a case from the earth-moving machinery domain, where a fleet of autonomous vehicles is used for transporting material in off-road environments. The traditional industrial development processes in the earth-moving machinery domain focus on single human-operated systems and lack clear support for autonomous system-of-systems. From our studies of industrial development of system-of-systems, we recognize the demand for guidance on how to document a system-of-systems. The goal of this work is to provide a framework using different model-based formalisms. As a structural background, we utilize the SafeSoS process, where each step specifies details about the targeted system-of-systems. Specifically, we apply model-based systems engineering to describe the structure and behavior of each SoS level. We utilize an industrial case to exemplify how model-based concepts can be applied to capture relevant information needed for designing the system-of-systems. This work provides guidelines for practitioners in developing safe system-of-systems.
Gearbox bearing maintenance is one of the major overhaul cost items for railway electric propulsion systems. They are continuously exposed to challenging working conditions, which compromise their performance and reliability. Various maintenance strategies have been introduced over time to improve the operational efficiency of such components, while lowering the cost of their maintenance. One of these is predictive maintenance, which makes use of previous historical data to estimate a component’s remaining useful life (RUL). This paper introduces a machine learning-based method for calculating the RUL of railway gearbox bearings. The method uses unlabeled mechanical vibration signals from gearbox bearings to detect patterns of increased bearing wear and predict the component’s residual life span. We combined a data smoothing method, a change point algorithm to set thresholds, and regression models for prediction. The proposed method has been validated using real-world gearbox data provided by our industrial partner, Alstom Transport AB in Sweden. The results are promising, particularly with respect to the predicted failure time. Our model predicted the failure to occur on day 330, while the gearbox bearing’s actual lifespan was 337 days. The deviation of just 7 days is a significant result, since an earlier RUL prediction value is usually preferable to avoid unexpected failure during operations. Additionally, we plan to further enhance the prediction model by including more data representing failing bearing patterns.
Autonomous and Semi-autonomous Machines (ASAM) can benefit mining operations. However, demonstrating acceptable levels of safety for ASAMs through exhaustive testing is not an easy task. A promising approach is scenario-based testing, which requires the OperationalDesign Domain (ODD) definition, i.e., environmental, time-of-day, and traffic characteristics. Currently, an ODD specification exists for automated Driving Systems (ADS), but, as it is, such specification is not adequate enough for describing the mine nuances. This paper presents a context-specific ODD taxonomy called ODD-UM, which is suitable for underground mining operational conditions. For this, we consider the taxonomy provided by the British Publicly Available SpecificationPAS 1883:2020. Then, we identify attributes included in the standard ISO 17757:2019 for ASAM safety and use them to adapt the original odd to the needs of underground mining. Finally, the adapted taxonomy is presented as a checklist, and items are selected according to the data provided by the underground mining sector. Our proposed ODDUM provides a baseline that facilitates considering the actual needs for autonomy in mines by leading to focused questions.
Assemblies of machinery commonly require control systems whose functionality is based on application software. In Europe, such software requires high safety integrity levels in accordance with the Machinery Directive (MD). However, identifying the essential regulatory requirements for the safety approval is not an easy task. To facilitate this job, this paper presents a process for Safety Argumentation for Machinery Assembly Control Software (SAMACS). We are inspired by patterns provided in the Goal Structuring Notation (GSN) and the use of contracts in safety argumentation. SAMACS contribution is aligning those methods with the MD by adopting EN ISO 13849. In particular, we define safety goals based on expected software contribution to control system safety and the standard guidance. Software safety goals are detailed into software safety requirements and expressed further as contracts, which shall be verified with prescribed techniques. We apply SAMACS to a case study from a European mining company and discuss the findings. This work aims at helping practitioners compose the safety case argumentation necessary to support machinery integration approval in Europe.
Practitioners report improved productivity as one of the main benefits of using autonomous dump trucks in underground mining. However, manned vehicles are still needed to transport materials and personnel in the tunnels, which requires practices that may diminish autonomy benefits. Thus, both fleets shall be efficiently mixed to maximize the autonomy potential. In addition, sufficient safety shall be demonstrated for operations approval. This paper proposes a strategy to populate a GSN (Goal Structuring Notation) structure to argue for the sufficient safety of mixed traffic operations in underground mining. Our strategy considers SoS (System of Systems) concepts to describe the operations baseline and the initial argumentation line, i.e., risk reduction mitigation strategies for existing SoS components. Such a strategy is further detailed with risk reduction mitigation arguments for control systems. Mitigation strategies at both levels are derived from safety analysis supported by STPA (System-Theoretic Process Analysis), a safety analysis technique that aligns well with the SoS perspective. We also incorporate regulatory frameworks addressing machinery to align the arguments with mandatory statements of the machinery directive. Our strategy combines SoS concepts with analysis techniques and regulatory frameworks to facilitate safety case argumentation for operations approval in the European mining context.
Test-driven development is an essential part of eXtreme Programming approach with the preference of being followed in other Agile methods as well. For several years, researchers are performing empirical investigations to evaluate quality improvements in the resulting code when test-driven development is being used. However, very little had been reported into investigating the quality of the testing performed in conjunction with test-driven development. In this paper we present results from an experiment specifically designed to evaluate the quality of test cases created by developers who used the test-first and the traditional test-last approaches. On an average, the quality of testing in test-driven development was almost the same as the quality of testing using test-last approach. However, detailed analysis of test cases, created by test-driven development group, revealed that 29% of test cases were "negative" test cases (based on non-specified requirements) but contributing as much as 65% to the overall tests quality score of test-first developers. We are currently investigating the possibility of extending test-driven development to facilitate non-specified requirements to a higher extent and thus minimise the impact of a potentially inherent effect of positive test bias.
Test driven development (TDD) appears not to be immune to positive test bias effects, as we observed in several empirical studies. In these studies, developers created a significantly larger set of positive tests, but at the same time the number of defects detected with negative tests is significantly higher than those detected by positive ones. In this paper we propose the concept of TDDHQ which is aimed at achieving higher quality of testing in TDD by augmenting the standard TDD with suitable test design techniques. To exemplify this concept, we present combining equivalence partitioning test design technique together with the TDD, for the purpose of improving design of test cases. Initial evaluation of this approach showed a noticeable improvement in the quality of test cases created by developers utilising TDDHQ approach.
Many challenges confront companies when they change their current software development process to an agile development methodology. Those challenges could be rather difficult but one that requires considerable attention is the integration of testing with development. This is because in heavyweight processes, as in the traditional waterfall approach, testing is a phase often conducted by testers as part of a quality assurance team towards the end of the development cycle whereas in the agile methodology testing is part of a continuous development activity with no specific tester role defined. In this paper we consider several options for testers when an organisation transit to agile methodology, and propose a new project mentor role for them. This role aims to utilize the knowledge that testers already have in both the business domain and the development technology together with their expertise in quality practices. This role will enhance the stature of testers as well as enable the company to effectively deploy the testers in the new environment. Motivations and benefits for this role are presented in this paper together with our plan for evaluation of this proposal.
Conducting empirical studies in industry always presents a major challenge for many researchers. Being a graduate student does not make things any easier. Often due to the lack of experience, credibility or just very limited networking, graduate students do not receive many opportunities to directly collaborate with industry and experiment their theoretical models in a realistic environment. On the other hand, empirical research conducted in an academic settings is often criticised for using students as subjects and working with a small sample size, thus creating major validity threat of the published results. In this paper we are presenting an experience report from an industrial empirical study conducted at Infosys Ltd., India with the support of their global internship program for graduate students, InStep. Focus of the paper is to present several challenges arisen before, during, and after the study, requiring an immediate attention in order to have a successful experiment completion. We also discuss and elaborate the data analysis results and its implication to our current research activities.
In our recent academic experiments, an existence of positive test bias, that is lack of negative test cases, was identified when a test driven development approach was used. At the same time, when defect detecting ability of individual test cases was calculated, it was noted that the probability of a negative test case to detect a defect was substantially higher than that of a positive test case. The goal of this study is to investigate the existence of positive test bias in test driven development within an industrial context, and measure defect detecting ability of both positive and negative test cases. An industrial experiment was conducted at Infosys Ltd. India, whose employees voluntarily signed up to participate in the study and were randomly assigned to groups utilizing test driven development, test driven development with negative testing, and test last development. Source code and test cases created by each participant during the study were collected and analysed. The collected data indicate a statistically significant difference between the number of positive and negative test cases created by industrial participants, confirming the existence of positive test bias. The difference in defect detecting ability of positive and negative test cases is also statistically significant. As a result, similarly to our previous academic study, 29% of all test cases were negative, contributing by revealing as much as 71% of all the defects found by all test cases. With this industrial experiment, we confirmed the existence of a positive test bias in an industrial context, as well as significantly higher defect detecting ability of negative test cases.
Software testing is a major source of expense in software projects and a proper testing process is a critical ingredient in the cost-efficient development of high-quality software. Contemporary aspects, such as the introduction of a more lightweight process, trends towards distributed development, and the rapid increase of software in embedded and safety-critical systems, challenge the testing process in unexpected manners. To our knowledge, there are very few studies focusing on these aspects in relation to testing as perceived by different contributors in the software development process. This paper qualitatively and quantitatively analyses data from an industrial questionnaire survey, with a focus on current practices and preferences on contemporary aspects of software testing. Specifically, the analysis focuses on perceptions of the software testing process in different categories of respondents. Categorization of respondents is based on safety-criticality, agility, distribution of development, and application domain. While confirming some of the commonly acknowledged facts, our findings also reveal notable discrepancies between preferred and actual testing practices. We believe continued research efforts are essential to provide guidelines in the adaptation of the testing process to take care of these discrepancies, thus improving the quality and efficiency of the software development.
Test driven development (TDD) is one of the basic practices of agile software development and both academia and practitioners claim that TDD, to a certain extent, improves the quality of the code produced by developers. However, recent results suggest that this practice is not followed to the extent preferred by industry. In order to pinpoint specific obstacles limiting its industrial adoption we have conducted a systematic literature review on empirical studies explicitly focusing on TDD as well as indirectly addressing TDD. Our review has identified seven limiting factors viz., increased development time, insufficient TDD experience/knowledge, lack of upfront design, domain and tool specific issues, lack of developer skill in writing test cases, insufficient adherence to TDD protocol, and legacy code. The results of this study is of special importance to the testing community, since it outlines the direction for further detailed scientific investigations as well as highlights the requirement of guidelines to overcome these limiting factors for successful industrial adoption of TDD.
Agile development approaches are increasingly being followed and favored by the industry. Test Driven Development (TDD) is a key agile practice and recent research results suggest that the successful adoption of TDD depends on different limiting factors, one of them being insufficient developer testing skills. The goal of this paper is to investigate if developers who are educated on general testing knowledge will be able to utilize TDD more efectively. We conducted a controlled experiment with master students during the course on Software Verification & Validation (V&V) where source code and test cases created by each participant during the labs as well as their answers on a survey questionnaire were collected and analyzed. Descriptive statistics indicate improvements in statement coverage. However, no statistically significant differences could be established between the pre- and post-course groups of students. By qualitative analysis of students’ tests, we noticed a lack of test cases for non-stated requirements ("negative" tests) resulting in a non-detection of bugs. Students did show preference towards TDD in surveys. Although further research is required to fully establish this, we believe that identifying specific testing knowledge which is complementary to the testing skills of a new TDD developer would enable developers to perform their tasks in a more eficient manner.
Background: Test driven development, as a side-effect of developing software, will produce a set of accompanied test cases which can protect implemented features during code refactoring. However, recent research results point out that successful adoption of test driven development might be limited by the testing skills of developers using it. Aim: Main goal of this paper is to investigate if there is a difference between the quality of test cases created while using test-first and test-last approaches. Additional goal of this paper is to measure the code quality produced using test-first and test-last approaches. Method: A pilot study was conducted during the master level course on Software Verification & Validation at Mälardalen University. Students were working individually on the problem implementation by being randomly assigned to a test-first or a test-last (control) group. Source code and test cases created by each participant during the study, as well as their answers on a survey questionnaire after the study, were collected and analysed. The quality of the test cases is analysed from three perspectives: (i) code coverage, (ii) mutation score and (iii) the total number of failing assertions. Results: The total number of test cases with failing assertions (test cases revealing an error in the code) was nearly the same for both test-first and test-last groups. This can be interpreted as "test cases created by test-first developers were as good as (or as bad as) test cases created by test-last developers". On the contrary, solutions created by test-first developers had, on average, 27% less failing assertions when compared to solutions created by the test-last group. Conclusions: Though the study provided some interesting observations, it needs to be conducted as a fully controlled experiment with a higher number of participants in order to validate statistical significance of the presented results.
The increasing usage of computer based systems for safety critical operations in applications such as nuclear, space, and automotive systems demands a systematic way of estimating software reliability. The high reliability requirements of safety critical software systems make this task imperative as well. Due to the specifics of software systems and the lack of any universally accepted models it is very difficult to predict the true reliability value of the system. Unfortunately none of the existing software reliability models neither acknowledge nor address this fact. There exist multiple uncertainty factors that influence reliability estimation of safety critical software systems. In this paper, we first define the scope of the important factors in the reliability models and describe a new approach to obtain a realistic estimate for system reliability. For this purpose, we consider different kinds of reliability models also taking into account the system architecture. The influence of uncertainty factors in the models is analyzed to obtain uncertainty bounds. They show an interval, where the true reliability should lie within it. This way system architects may use a so-called worst-case reliability estimation, given by the lower interval bound, for system analysis. We also demonstrate our proposed approach with real data taken from safety-critical applications.