https://www.mdu.se/

mdu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Curating Datasets for Visual Runway Detection
Saab Aeronaut, Jarfalla, Sweden..
Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
Saab Aeronaut, Jarfalla, Sweden..
Saab Aeronaut, Jarfalla, Sweden..
Show others and affiliations
2021 (English)In: 2021 IEEE/AIAA 40TH DIGITAL AVIONICS SYSTEMS CONFERENCE (DASC), IEEE , 2021Conference paper, Published paper (Refereed)
Abstract [en]

In Machine Learning systems, several factors impact the performance of a trained model. The most important ones include model architecture, the amount of training time, the dataset size and diversity. In the realm of safety-critical machine learning the used datasets need to reflect the environment in which the system is intended to operate, in order to minimize the generalization gap between trained and real-world inputs. Datasets should be thoroughly prepared and requirements on the properties and characteristics of the collected data need to be specified. In our work we present a case study in which generating a synthetic dataset is accomplished based on real-world flight data from the ADS-B system, containing thousands of approaches to several airports to identify real-world statistical distributions of relevant variables to vary within our dataset sampling space. We also investigate what the effects are of training a model on synthetic data to different extents, including training on translated image sets (using domain adaptation). Our results indicate airport location to be the most critical parameter to vary. We also conclude that all experiments did benefit in performance from pre-training on synthetic data rather than using only real data, however this did not hold true in general for domain adaptation-translated images.

Place, publisher, year, edition, pages
IEEE , 2021.
Series
IEEE-AIAA Digital Avionics Systems Conference, ISSN 2155-7195
Keywords [en]
avionics, safety-critical, machine learning, deep neural networks, dataset, synthetic data, domain adaptation
National Category
Computer Vision and Robotics (Autonomous Systems) Computer Sciences
Identifiers
URN: urn:nbn:se:mdh:diva-57095DOI: 10.1109/DASC52595.2021.9594400ISI: 000739652600102Scopus ID: 2-s2.0-85122788862ISBN: 978-1-6654-3420-1 (print)OAI: oai:DiVA.org:mdh-57095DiVA, id: diva2:1632167
Conference
IEEE/AIAA 40th Digital Avionics Systems Conference (DASC), OCT 03-07, 2021, ELECTR NETWORK
Available from: 2022-01-26 Created: 2022-01-26 Last updated: 2024-11-18Bibliographically approved
In thesis
1. Synthetic Data in Data-driven Systems
Open this publication in new window or tab >>Synthetic Data in Data-driven Systems
2025 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

Dataset generation is cumbersome yet of great importance for successful training of machine learning models. Collecting real-world data is expensive and sometimes prohibited, considering e.g. safety aspects or legal restrictions. By generating the bulk of training data by synthetic means it is possible to impose arbitrary and extensive scene randomization for increased data diversity.

Methods to quantify similarity between datasets on a statistical level are important tools to detect Out-of-Distribution (OOD) data and domain alignment. We have studied how such methods can be used to correlate model prediction accuracy drop when exposed to OOD-data.

Domain adaptation can be applied as an additional step to synthetic data, to decrease the gap to real world datasets, however it can introduce inadvertent label-flipping, a sort of semantic inconsistency between synthetic source and domain adapted output. Therefore, we pursuit another way of reducing the domain gap, by generating high-fidelity digital representations of real-world scenes and objects. We do this through the use of Neural Radience Fields and Gaussian Splats. These methods allow us to render objects of interest for a detection problem, with the perfect annotation of synthetically produced data, and a high degree of realism which we show improves detection accuracy compared to traditionally generated visual content.

Abstract [sv]

Generering av data för AI-modeller är besvärligt men av stor betydelse för väl-fungerande träning av maskininlärningsmodeller. Att samla in riktig sensordata är dyrt och ibland inte möjligt, med hänsyn till exempelvis säkerhetsaspekter eller juridiska begränsningar. Genom att generera huvuddelen av träningsdata på syntetisk väg är det möjligt att införa omfattande scenrandomisering vilket leder till ökad datadiversifiering. Metoder för att kvantifiera likheter mellan datamängder på statistisk nivå är viktiga verktyg för att identifiera när data ligger utanför den tänkta distributionen. Vi har studerat hur sådana metoder kan användas för att korrelera hur en modellsprecision sjunker när den exponeras för osedd data. Domänanpassning kan tillämpas som ett ytterligare steg till syntetisk data, för att minska gapet till riktig sensordata, men detta kan innebära att man introducerar oavsiktliga annoteringsfel, en sorts semantisk inkonsistens mellan syntetisk källdata och domänanpassad utdata. Därför går vi en annan väg för att minska domängapet genom att generera digitala representationer med hög kvalitet av verkliga scener och föremål. Vi gör detta genom att använda Neural Radience Fields (NeRF) och Gaussiska Splats. Dessa metoder gör det möjligt för oss att skapa objekt av intresse för ett detektionsproblem, med automatisk annotering baserad på syntetiskt framställda data, och en hög grad av realism som vi visar förbättrar detektionsnoggrannheten jämfört med traditionellt genererat visuellt innehåll.

Place, publisher, year, edition, pages
Västerås: Mälardalens Universitet, 2025. p. 186
Series
Mälardalen University Press Licentiate Theses, ISSN 1651-9256 ; 370
Keywords
datasets, neural networks, synthetic data generation, automatic annotation, dataset generation
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:mdh:diva-69154 (URN)978-91-7485-689-7 (ISBN)
Presentation
2025-01-30, Delta, Mälardalens universitet, Västerås, 13:00 (English)
Opponent
Supervisors
Available from: 2024-11-18 Created: 2024-11-18 Last updated: 2024-11-18Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Forsberg, HåkanDaneshtalab, Masoud

Search in DiVA

By author/editor
Forsberg, HåkanDaneshtalab, Masoud
By organisation
Embedded Systems
Computer Vision and Robotics (Autonomous Systems)Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 204 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf