https://www.mdu.se/

mdu.sePublications
123 3 of 3
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Synthetic Data in Data-driven Systems
Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems. Saab, Sweden. (HERO)
2025 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

Dataset generation is cumbersome yet of great importance for successful training of machine learning models. Collecting real-world data is expensive and sometimes prohibited, considering e.g. safety aspects or legal restrictions. By generating the bulk of training data by synthetic means it is possible to impose arbitrary and extensive scene randomization for increased data diversity.

Methods to quantify similarity between datasets on a statistical level are important tools to detect Out-of-Distribution (OOD) data and domain alignment. We have studied how such methods can be used to correlate model prediction accuracy drop when exposed to OOD-data.

Domain adaptation can be applied as an additional step to synthetic data, to decrease the gap to real world datasets, however it can introduce inadvertent label-flipping, a sort of semantic inconsistency between synthetic source and domain adapted output. Therefore, we pursuit another way of reducing the domain gap, by generating high-fidelity digital representations of real-world scenes and objects. We do this through the use of Neural Radience Fields and Gaussian Splats. These methods allow us to render objects of interest for a detection problem, with the perfect annotation of synthetically produced data, and a high degree of realism which we show improves detection accuracy compared to traditionally generated visual content.

Abstract [sv]

Generering av data för AI-modeller är besvärligt men av stor betydelse för väl-fungerande träning av maskininlärningsmodeller. Att samla in riktig sensordata är dyrt och ibland inte möjligt, med hänsyn till exempelvis säkerhetsaspekter eller juridiska begränsningar. Genom att generera huvuddelen av träningsdata på syntetisk väg är det möjligt att införa omfattande scenrandomisering vilket leder till ökad datadiversifiering. Metoder för att kvantifiera likheter mellan datamängder på statistisk nivå är viktiga verktyg för att identifiera när data ligger utanför den tänkta distributionen. Vi har studerat hur sådana metoder kan användas för att korrelera hur en modellsprecision sjunker när den exponeras för osedd data. Domänanpassning kan tillämpas som ett ytterligare steg till syntetisk data, för att minska gapet till riktig sensordata, men detta kan innebära att man introducerar oavsiktliga annoteringsfel, en sorts semantisk inkonsistens mellan syntetisk källdata och domänanpassad utdata. Därför går vi en annan väg för att minska domängapet genom att generera digitala representationer med hög kvalitet av verkliga scener och föremål. Vi gör detta genom att använda Neural Radience Fields (NeRF) och Gaussiska Splats. Dessa metoder gör det möjligt för oss att skapa objekt av intresse för ett detektionsproblem, med automatisk annotering baserad på syntetiskt framställda data, och en hög grad av realism som vi visar förbättrar detektionsnoggrannheten jämfört med traditionellt genererat visuellt innehåll.

Place, publisher, year, edition, pages
Västerås: Mälardalens Universitet , 2025. , p. 186
Series
Mälardalen University Press Licentiate Theses, ISSN 1651-9256 ; 370
Keywords [en]
datasets, neural networks, synthetic data generation, automatic annotation, dataset generation
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:mdh:diva-69154ISBN: 978-91-7485-689-7 (print)OAI: oai:DiVA.org:mdh-69154DiVA, id: diva2:1913968
Presentation
2025-01-30, Delta, Mälardalens universitet, Västerås, 13:00 (English)
Opponent
Supervisors
Available from: 2024-11-18 Created: 2024-11-18 Last updated: 2024-12-09Bibliographically approved
List of papers
1. Evaluating the Robustness of ML Models to Out-of-Distribution Data Through Similarity Analysis
Open this publication in new window or tab >>Evaluating the Robustness of ML Models to Out-of-Distribution Data Through Similarity Analysis
2023 (English)In: Commun. Comput. Info. Sci., Springer Science and Business Media Deutschland GmbH , 2023, p. 348-359Conference paper, Published paper (Refereed)
Abstract [en]

In Machine Learning systems, several factors impact the performance of a trained model. The most important ones include model architecture, the amount of training time, the dataset size and diversity. We present a method for analyzing datasets from a use-case scenario perspective, detecting and quantifying out-of-distribution (OOD) data on dataset level. Our main contribution is the novel use of similarity metrics for the evaluation of the robustness of a model by introducing relative Fréchet Inception Distance (FID) and relative Kernel Inception Distance (KID) measures. These relative measures are relative to a baseline in-distribution dataset and are used to estimate how the model will perform on OOD data (i.e. estimate the model accuracy drop). We find a correlation between our proposed relative FID/relative KID measure and the drop in Average Precision (AP) accuracy on unseen data.

Place, publisher, year, edition, pages
Springer Science and Business Media Deutschland GmbH, 2023
Keywords
accuracy estimation, datasets, neural networks, similarity metrics, Learning systems, Dataset, Distance measure, Frechet, Machine learning systems, Modeling architecture, Neural-networks, Performance, Similarity analysis, Drops
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:mdh:diva-64446 (URN)10.1007/978-3-031-42941-5_30 (DOI)2-s2.0-85171979824 (Scopus ID)9783031429408 (ISBN)
Conference
Communications in Computer and Information Science
Available from: 2023-10-05 Created: 2023-10-05 Last updated: 2024-11-18Bibliographically approved
2. Enhancing Drone Surveillance with NeRF: Real-World Applications and Simulated Environments
Open this publication in new window or tab >>Enhancing Drone Surveillance with NeRF: Real-World Applications and Simulated Environments
Show others...
2024 (English)In: 2024 AIAA DATC/IEEE 43rd Digital Avionics Systems Conference (DASC), 2024Conference paper, Published paper (Refereed)
Abstract [en]

Machine Learning (ML) systems require representative and diverse datasets to accurately learn the objective task. Insupervised learning data needs to be accurately annotated, whichis an expensive and error-prone process. We present a methodfor generating synthetic data tailored to the use-case achievingexcellent performance in a real-world usecase. We provide amethod for producing automatically annotated synthetic visualdata of multirotor unmanned aerial vehicles (UAV) and otherairborne objects in a simulated environment with a high degreeof scene diversity, from collection of 3D models to generation ofannotated synthetic datasets (synthsets). In our data generationframework SynRender we introduce a novel method of usingNeural Radiance Field (NeRF) methods to capture photo-realistichigh-fidelity 3D-models of multirotor UAVs in order to automatedata generation for an object detection task in diverse environments. By producing data tailored to the real-world setting, ourNeRF-derived results show an advantage over generic 3D assetcollection-based methods where the domain gap between thesimulated and real-world is unacceptably large. In the spirit ofkeeping research open and accessible to the research communitywe release our dataset VISER DroneDiversity used in this project,where visual images, annotated boxes, instance segmentation anddepth maps are all generated for each image sample.

Keywords
datasets, neural networks, synthetic data generation, automatic annotation, dataset generation
National Category
Computer Vision and Robotics (Autonomous Systems)
Research subject
Computer Science
Identifiers
urn:nbn:se:mdh:diva-69153 (URN)10.1109/DASC62030.2024.10749011 (DOI)
Conference
2024 AIAA DATC/IEEE 43rd Digital Avionics Systems Conference (DASC), San Diego, CA, USA, 2024
Available from: 2024-11-18 Created: 2024-11-18 Last updated: 2024-11-19Bibliographically approved
3. Curating Datasets for Visual Runway Detection
Open this publication in new window or tab >>Curating Datasets for Visual Runway Detection
Show others...
2021 (English)In: 2021 IEEE/AIAA 40TH DIGITAL AVIONICS SYSTEMS CONFERENCE (DASC), IEEE , 2021Conference paper, Published paper (Refereed)
Abstract [en]

In Machine Learning systems, several factors impact the performance of a trained model. The most important ones include model architecture, the amount of training time, the dataset size and diversity. In the realm of safety-critical machine learning the used datasets need to reflect the environment in which the system is intended to operate, in order to minimize the generalization gap between trained and real-world inputs. Datasets should be thoroughly prepared and requirements on the properties and characteristics of the collected data need to be specified. In our work we present a case study in which generating a synthetic dataset is accomplished based on real-world flight data from the ADS-B system, containing thousands of approaches to several airports to identify real-world statistical distributions of relevant variables to vary within our dataset sampling space. We also investigate what the effects are of training a model on synthetic data to different extents, including training on translated image sets (using domain adaptation). Our results indicate airport location to be the most critical parameter to vary. We also conclude that all experiments did benefit in performance from pre-training on synthetic data rather than using only real data, however this did not hold true in general for domain adaptation-translated images.

Place, publisher, year, edition, pages
IEEE, 2021
Series
IEEE-AIAA Digital Avionics Systems Conference, ISSN 2155-7195
Keywords
avionics, safety-critical, machine learning, deep neural networks, dataset, synthetic data, domain adaptation
National Category
Computer Vision and Robotics (Autonomous Systems) Computer Sciences
Identifiers
urn:nbn:se:mdh:diva-57095 (URN)10.1109/DASC52595.2021.9594400 (DOI)000739652600102 ()2-s2.0-85122788862 (Scopus ID)978-1-6654-3420-1 (ISBN)
Conference
IEEE/AIAA 40th Digital Avionics Systems Conference (DASC), OCT 03-07, 2021, ELECTR NETWORK
Available from: 2022-01-26 Created: 2022-01-26 Last updated: 2024-11-18Bibliographically approved
4. Challenges in using neural networks in safety-critical applications
Open this publication in new window or tab >>Challenges in using neural networks in safety-critical applications
Show others...
2020 (English)In: AIAA/IEEE Digital Avionics Systems Conference - Proceedings, Institute of Electrical and Electronics Engineers Inc. , 2020Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we discuss challenges when using neural networks (NNs) in safety-critical applications. We address the challenges one by one, with aviation safety in mind. We then introduce a possible implementation to overcome the challenges. Only a small portion of the solution has been implemented physically and much work is considered as future work. Our current understanding is that a real implementation in a safety-critical system would be extremely difficult. Firstly, to design the intended function of the NN, and secondly, designing monitors needed to achieve a deterministic and fail-safe behavior of the system. We conclude that only the most valuable implementations of NNs should be considered as meaningful to implement in safety-critical systems.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc., 2020
Keywords
Avionics, Deep neural networks, Machine learning, Safety-critical, Digital avionics, Safety engineering, Security systems, Aviation safety, Fail safes, Neural networks (NNS), Safety critical applications, Safety critical systems, Neural networks
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:mdh:diva-52970 (URN)10.1109/DASC50938.2020.9256519 (DOI)000646035600048 ()2-s2.0-85097976487 (Scopus ID)9781728198255 (ISBN)
Conference
39th AIAA/IEEE Digital Avionics Systems Conference, DASC 2020, 11 October 2020 through 16 October 2020
Available from: 2021-01-07 Created: 2021-01-07 Last updated: 2024-11-18Bibliographically approved

Open Access in DiVA

The full text will be freely available from 2025-01-09 08:00
Available from 2025-01-09 08:00

Authority records

Lindén, Joakim

Search in DiVA

By author/editor
Lindén, Joakim
By organisation
Embedded Systems
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 44 hits
123 3 of 3
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf