Industrial control systems have been targeted by numerous cyber attacks over the past few decades which causes different problems related to data privacy, financial losses and operational failures. One potential approach to detect these attacks is by analyzing network data using machine learning and employing network anomaly detection techniques. However, the nature of these systems often involves their geographical dispersion across multiple zones, which poses a challenge in applying local machine learning methods for detecting anomalies. Additionally, there are instances where sharing complete operational data between different zones is restricted due to security concerns. As a result, a promising solution emerges by implementing a federated model for anomaly detection in these systems. In this study, we investigate the application of machine learning techniques for anomaly detection in network data, considering centralized, local, and federated approaches. We implemented the local and centralized methods using several simple machine-learning techniques and observed that Random Forest and Artificial Neural Networks exhibited superior performance compared to other methods. As a result, we extended our analysis to develop a federated version of Random Forest and Artificial Neural Network. Our findings reveal that the federated model surpasses the performance of the local models, and achieves comparable or even superior results compared to the centralized model, while it ensures data privacy and maintains the confidentiality of sensitive information.
When developing products or performing experimental research studies, the simulation of physical or logical systems is of great importance for evaluation and verification purposes. For research-, and development-related distributed control systems, there is a need to simulate common physical environments with separate interconnected modules independently controlled, and orchestrated using standardized network communication protocols.The simulation environment presented in this paper is a bespoke solution precisely for these conditions, based on the Modular Automation design strategy. It allows easy configuration and combination of simple modules into complex production processes, with support for individual low-level control of modules, as well as recipe-orchestration for high-level coordination. The use of the environment is exemplified in a configuration of a modular ice-cream factory, used for cybersecurity-related research.
Modular automation provides a challenge for traditional physics simulators, especially if they are used as a simulator in the loop of a development or research project looking at behavior from a systems level. In this paper, we present extensions of a previously developed simulation environment that is tailored to provide these characteristics. The extensions include simulation engine level improvements, such as including better modeling of the material flow, and sensor anomaly injections to model sensor faults or tampering, as well as system-level enhancements and functionality including certificate handling and anomaly detection methods using machine learning. This simulation environment has proven useful for education as well as research and engineering work, and with the provided extensions several new directions of use can be envisioned. The system is demonstrated in the use case of a modular ice-cream factory, including all the new and enhanced functionalities.
With the increasing use of the internet and reliance on computer-based systems for our daily lives, any vulnerability in those systems is one of the most important issues for the community. For this reason, the need for intelligent models that detect malicious intrusions is important to keep our personal information safe. In this paper, we investigate several supervised (Artificial Neural Network, Support Vector Machine, Random Forest, Linear Discriminant Analysis, and K-Nearest Neighbors) and unsupervised (K-means, Mean-shift, and DBSCAN) machine learning algorithms, in the context of anomaly-based Intrusion Detection Systems. We are using four different IDS benchmark datasets (KDD99, NSL-KDD, UNSW-NB15, and CIC-IDS-2017) to evaluate the performance of the selected machine learning algorithms for both intrusion detection and attack classification. The results have shown that Random Forest is the most suitable algorithm regarding model accuracy and execution time.
With the increasing use of computer networks and distributed systems, network security and data privacy are becoming major concerns for our society. In this paper, we present an approach based on an autoencoder trained with differential evolution for feature encoding of network data with the goal of improving security and reducing data transfers. One of the novel elements used in differential evolution for intrusion detection is the enhancements in the fitness function by adding the performance of a machine learning algorithm. We conducted an extensive evaluation of six machine learning algorithms for network intrusion detection using encoded data from well-known publicly available network datasets UNSW-NB15. The experiments clearly showed the supremacy of random forest, support vector machine, and K-nearest neighbors in terms of accuracy, and this was not affected to a high degree by reducing the number of features. Furthermore, the machine learning algorithm that was used during training (Linear Discriminant Analysis classifier) got a 14 percentage points increase in accuracy. Our results also showed clear improvements in execution times in addition to the obvious secure aspects of encoded data. Additionally, the performance of the proposed method outperformed one of the most commonly used feature reduction methods, Principal Component Analysis.
There is a growing number of network attacks and the data on the network is more exposed than ever with the increased activity on the Internet. Applying Machine Learning (ML) techniques for cyber-security is a popular and effective approach to address this problem. However, the data which is used by ML algorithms have to be protected. In this paper, we present a framework that combines autoencoder, multi-objective optimization algorithms, and different ML algorithms to encode the network data, reduce its size, and detect and classify the network attacks. The novel element used in this framework, with respect to earlier research, is the application of multi-objective optimization algorithms, such as Multi-Objective Differential Evolution or Non-dominated Sorting Genetic Algorithm-II, to handle the different objectives in the fitness function of the autoencoder (autoencoder decoding error and accuracy of ML algorithm). We evaluated six different ML algorithms for attack detection and classification on network dataset UNSWNB15. The performance of the proposed framework is compared with single-objective Differential Evolution. The results showed that Multi-Objective Differential Evolution outperforms the counterparts for attack detection, while all the evaluated algorithms showed similar performance for attack classification.
The rapid growth of the Internet has led to the evolution of sophisticated security threats that exploit vulnerabilities within networks. The defence mechanisms must quickly adapt to these new threats to ensure that networks stay secure. One possible mechanism is to use Machine Learning (ML) algorithms to detect malicious activities. The edge devices that control and manage the network, such as routers, already have access to the data that is flowing through the network and may utilize its own computational resources to host ML algorithms and use them to detect intrusions. This paper presents a system for network intrusion detection which is deployed to an edge device and evaluated for live binary classification of network traffic. Different ML algorithms (Decision Tree, Random Forest, and Artificial Neural Network) are evaluated on existing datasets (Westermo and CIC-IDS-2017). Flow-based data pre-processing is performed and different labeling strategies and flow durations are used and compared. The most effective version of each algorithm is implemented and deployed on the Westermo Lynx- 3510 routing-capable network switch and system performance is assessed across various scenarios with simulated network attacks. The experiments showed that Random Forest is the best option, closely followed by Decision Tree.
Modern manufacturing systems collect a huge amount of data which gives an opportunity to apply various Machine Learning (ML) techniques. The focus of this paper is on the detection of anomalous behavior in industrial manufacturing systems by considering the temporal nature of the manufacturing process. Long Short-Term Memory (LSTM) networks are applied on a publicly available dataset called Modular Ice-cream factory Dataset on Anomalies in Sensors (MIDAS), which is created using a simulation of a modular manufacturing system for ice cream production. Two different problems are addressed: anomaly detection and anomaly classification. LSTM performance is analysed in terms of accuracy, execution time, and memory consumption and compared with non-time-series ML algorithms including Logistic Regression, Decision Tree, Random Forest, and Multi-Layer Perceptron. The experiments demonstrate the importance of considering the temporal nature of the manufacturing process in detecting anomalous behavior and the superiority in accuracy of LSTM over non-time-series ML algorithms. Additionally, runtime adaptation of the predictions produced by LSTM is proposed to enhance its applicability in a real system.
The Gaussian Conditional Random Fields (GCRF) algorithm and its extensions are used for machine learning regression problems in which the attributes of objects and the correlation between objects should be considered when making predictions. These algorithms can be applied in different domains where problems can be seen as graphs, but their implementation requires complex calculations and good programming skills. This paper presents an open source software package that includes a tool with graphical user interface (GCRFs tool) and Java library (GCRFs library). GCRFs tool is software that integrates various GCRF-based algorithms and supports training and testing of those algorithms on real-world datasets. The main goal of GCRFs tool is to provide a straightforward and user-friendly graphical user interface that will simplify the use of GCRF-based algorithms. GCRFs Java library contains basic classes for GCRF concepts and can be used by researchers who have experience in Java programming. Also, this paper presents the results of a pilot usability evaluation of the GCRFs tool, where the software was evaluated with expert and non-expert users. This evaluation gave us detailed insight into the experiences and opinions of the users and helped us outline priorities for future development.
Vulnerability of important data is increasing everyday with the constant evolution and increase of sophisticated cyber security threats that can seriously affect the business processes. Hence, it is important for organizations to define and implement appropriate mechanisms such as intrusion detection systems to protect their valuable data. In recent years, various machine learning approaches were proposed for intrusion detection, where Random Forest (RF) is recognized as one of the most suitable algorithms. Machine learning algorithms are data-oriented and storing data for training on the centralized server can increase the vulnerability of the whole system. In this paper, we are using a federated learning approach that independently trains data subsets on multiple clients and sends only the resulting models for aggregation to a server. This considerably reduces the need for sending all data to a centralised server. Different RF-based federated learning versions were evaluated on four intrusion detection benchmark datasets (KDD, NSL-KDD, UNSW-NB15, and CIC-IDS-2017). In our experiments, the global RF on the server achieved higher accuracy than the maximum achieved with individual RFs on the clients in the case of two out of four datasets, and it was very close to the maximum for the third dataset. Even in the fourth case, the global RF performed better than the average accuracy, although it fell behind the maximum.
Communication networks are crucial components of the underlying digital infrastructure in any smart city setup. The increasing usage of computer networks brings additional cyber security concerns, and every organization has to implement preventive measures to protect valuable data and business processes. Due to the inherent distributed nature of the city infrastructures as well as the critical nature of its resources and data, any solution to the attack detection calls for distributed, efficient and privacy preserving solutions. In this paper, we extend the evaluation of our federated learning framework for network attacks detection and classification based on random forest. Previously the framework was evaluated only for attack detection using four well-known intrusion detection datasets (KDD, NSL-KDD, UNSW-NB15, and CIC-IDS-2017). In this paper, we extend the evaluation for attack classification. We also evaluate how adding differential privacy into random forest, as an additional protective mechanism, affects the framework performances. The results show that the framework outperforms the average performance of independent random forests on clients for both attack detection and classification. Adding differential privacy penalizes the performance of random forest, as expected, but the use of the proposed framework still brings benefits in comparison to the use of independent local models. The code used in this paper is publicly available, to enable transparency and facilitate reproducibility within the research community.
A small deviation in manufacturing systems can cause huge economic losses, and all components and sensors in the system must be continuously monitored to provide an immediate response. The usual industrial practice is rather simplistic based on brute force checking of limited set of parameters often with pessimistic pre-defined bounds. The usage of appropriate machine learning techniques can be very valuable in this context to narrow down the set of parameters to monitor, define more refined bounds, and forecast impending issues. One of the factors hampering progress in this field is the lack of datasets that can realistically mimic the behaviours of manufacturing systems. In this paper, we propose a new dataset called MIDAS (Modular Ice cream factory Dataset on Anomalies in Sensors) to support machine learning research in analog sensor data. MIDAS is created using a modular manufacturing simulation environment that simulates the ice cream-making process. Using MIDAS, we evaluated four different supervised machine learning algorithms (Logistic Regression, Decision Tree, Random Forest, and Multilayer Perceptron) for two different problems: anomaly detection and anomaly classification. The results showed that multilayer perceptron is the most suitable algorithm with respect to model accuracy and execution time. We have made the data set and the code for the experiments publicly available, to enable interested researchers to enhance the state of the art by conducting further studies.
The recent advances in digitalization, improved connectivity and cloud based services are making a huge revolution in manufacturing domain. In spite of the huge potential benefits in productivity, these trends also bring in some concerns related to safety and security to the traditionally closed industrial operation scenarios. This paper presents a high-level view of some of the research results and technological contributions of the InSecTT Project for meeting safety/security goals. These technology contributions are expected to support both the design and operational phases in the production life cycle. Specifically, our contributions spans (a) enforcing stricter but flexible access control, (b) evaluation of machine learning techniques for intrusion detection, (c) generation of realistic process control and network oriented datasets with injected anomalies and (d) performing safety and security analysis on automated guided vehicle platoons.
Cybersecurity is of increasing importance in industrial automation systems. The use of fine-grained and intelligent access control is paramount in emerging manufacturing systems as implicit trust is no longer a viable assumption for interactions within industrial systems. An authorization service is a central component of an access control enforcement architecture, to which resource servers may outsource parts of the policy decision functionality. This paper investigates how to create and integrate an authorization service in an industrial manufacturing system, which uses workflow descriptions combined with operational system states for policy decisions. The implementation is demonstrated in the use case of recipe orchestration in a modular automation system, and a few key quality metrics of the authorization service are evaluated.
There is a growing body of knowledge on network intrusion detection, and several open data sets with network traffic and cyber-security threats have been released in the past decades. However, many data sets have aged, were not collected in a contemporary industrial communication system, or do not easily support research focusing on distributed anomaly detection. This paper presents the Westermo network traffic data set, 1.8 million network packets recorded in over 90 minutes in a network built up of twelve hardware devices. In addition to the raw data in PCAP format, the data set also contains pre-processed data in the form of network flows in CSV files. This data set can support the research community for topics such as intrusion detection, anomaly detection, misconfiguration detection, distributed or federated artificial intelligence, and attack classification. In particular, we aim to use the data set to continue work on resource-constrained distributed artificial intelligence in edge devices. The data set contains six types of events: harmless SSH, bad SSH, misconfigured IP address, duplicated IP address, port scan, and man in the middle attack.