https://www.mdu.se/

mdu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
NLP-based Failure log Clustering to Enable Batch Log Processing in Industrial DevOps Setting
Mälardalen University, School of Innovation, Design and Engineering.
2022 (English)Independent thesis Advanced level (degree of Master (One Year)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

The rapid development, updating, and maintenance of industrial software systems have increased the necessity for software artifact testing. Some medium and large industries are forced to automate the test analysis process due to the proliferation of test data. The examination of test results can be automated by grouping them into subsets comprised of comparable test outcomes and their batch analysis. In this instance, the first step is to identify a precise and reliable categorization mechanism based on structural similarities and error categories. In addition, since errors and the number of subgroups are not specified, a method that does not require prior knowledge of the target subsets should be implemented. Clustering is one of the appropriate methods for separating test results, given this description. This work presents an appropriate approach for grouping test results and accelerating the test analysis process by implementing multiple clustering algorithms (K-means, Agglomerative, DBSCAN, Fuzzy-c-means, and Spectral) on test results from industrial contexts and comparing their time and efficiency in outputs. The lack of organization and textual character of the test findings is one of the primary obstacles in this study, necessitating the implementation of feature selection methods.

Consequently, this study employs three distinct approaches to feature selection (TF-IDF, FastText, and Bert). This research was conducted by implementing a series of trials in a controlled and isolated environment, with the assistance of Westermo Technologies AB's test process results, as part of the AIDOaRT Project, in order to establish an acceptable way for clustering industrial test results. The conclusion of this thesis shows that K-means and Agglomerative yield the highest performance and evaluation scores; however, the K-means is superior in terms of execution time and speed. In addition, by organizing a Focus Group meeting to qualitatively examine the results from the perspective of engineers and experts, it can be determined that, from their perspective, clustering results increases the speed of test analysis and decreases the review workload.

Place, publisher, year, edition, pages
2022. , p. 37
Keywords [en]
natural language processing, failure clustering, nightly testing, industrial devops
National Category
Computer Engineering
Identifiers
URN: urn:nbn:se:mdh:diva-59203OAI: oai:DiVA.org:mdh-59203DiVA, id: diva2:1673361
Subject / course
Computer Science
Presentation
2022-06-02, Högskoleplan 1, 722 20 Västerås, Västerås, 21:01 (English)
Supervisors
Examiners
Available from: 2022-06-28 Created: 2022-06-20 Last updated: 2022-06-28Bibliographically approved

Open Access in DiVA

NLP-based Failure log Clustering(1337 kB)306 downloads
File information
File name FULLTEXT01.pdfFile size 1337 kBChecksum SHA-512
162dc3ba5ea582db72a86801d140580e09a83e7113c093b06307fa700b66976bcf18d9311d3f5048f3b46c04d7a48fa1c578111e367a9f34b685472a6497b621
Type fulltextMimetype application/pdf

By organisation
School of Innovation, Design and Engineering
Computer Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 306 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 2168 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf