https://www.mdu.se/

mdu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Second-Order Learning with Grounding Alignment: A Multimodal Reasoning Approach to Handle Unlabelled Data
Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.ORCID iD: 0000-0003-3802-4721
Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.ORCID iD: 0000-0002-7305-7169
Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.ORCID iD: 0000-0002-1212-7637
Show others and affiliations
2024 (English)In: International Conference on Agents and Artificial Intelligence, Science and Technology Publications, Lda , 2024, Vol. 2, p. 561-572Conference paper, Published paper (Refereed)
Abstract [en]

Multimodal machine learning is a critical aspect in the development and advancement of AI systems. However, it encounters significant challenges while working with multimodal data, where one of the major issues is dealing with unlabelled multimodal data, which can hinder effective analysis. To address the challenge, this paper proposes a multimodal reasoning approach adopting second-order learning, incorporating grounding alignment and semi-supervised learning methods. The proposed approach illustrates using unlabelled vehicular telemetry data. During the process, features were extracted from unlabelled telemetry data using an autoencoder and then clustered and aligned with true labels of neurophysiological data to create labelled and unlabelled datasets. In the semi-supervised approach, the Random Forest (RF) and eXtreme Gradient Boosting (XGBoost) algorithms are applied to the labelled dataset, achieving a test accuracy of over 97%. These algorithms are then used to predict labels for the unlabelled dataset, which is later added to the labelled dataset to retrain the model. With the additional prior labelled data, both algorithms achieved a 99% test accuracy. Confidence in predictions for unlabelled data was validated using counting samples based on the prediction score and Bayesian probability. RF and XGBoost scored 91.26% and 97.87% in counting samples and 98.67% and 99.77% in Bayesian probability, respectively.

Place, publisher, year, edition, pages
Science and Technology Publications, Lda , 2024. Vol. 2, p. 561-572
Keywords [en]
Autoencoder, Multimodal Reasoning, Semi-Supervised, Supervised Alignment
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:mdh:diva-66579DOI: 10.5220/0012466500003636Scopus ID: 2-s2.0-85190658759OAI: oai:DiVA.org:mdh-66579DiVA, id: diva2:1856904
Conference
16th International Conference on Agents and Artificial Intelligence, ICAART 2024. Rome 24 February 2024 through 26 February 2024
Available from: 2024-05-08 Created: 2024-05-08 Last updated: 2024-12-19Bibliographically approved
In thesis
1.
The record could not be found. The reason may be that the record is no longer available or you may have typed in a wrong id in the address field.
2. Enhancing Multimodal Reasoning with Data Alignment and Fusion
Open this publication in new window or tab >>Enhancing Multimodal Reasoning with Data Alignment and Fusion
2024 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

Multimodal machine learning (MML) significantly transformed the development of artificial intelligence (AI) systems. Instead of working with single source data, it integrated and analysed information from multiple modalities, such as images, audio, text, sensors, and more. The volume of labelled and unlabelled multimodal data increased rapidly, but effectively using them, especially managing unlabelled multimodal data, poses significant challenges. Existing approaches usually depend on supervised learning and struggle to handle the heterogeneity and complexity of such data. These limitations interrupted the creation of good, scalable, and generalised MML systems that can use the full potential of this diverse data.

This thesis addressed this demanding challenge by using multimodal reasoning. To this end, a scheme was introduced for advancing multimodal reasoning by effectively using unlabelled multimodal data. The scheme was designed on inferential steps to use the latent knowledge and patterns hidden within these vast unlabelled datasets. These inferential steps mitigated the limitation of supervised methods, which solely depend on a vast amount of labelled data, which is difficult to get in real-world scenarios. The selection of unique inferential steps was based on their specific strengths in addressing challenges in unlabelled multimodal data. The scheme starts with using the unsupervised approach to extract features, which are then used as input for a clustering approach to group similar data points based on their hidden characteristics. This clustering approach sets the stage for applying a semi-supervised approach to intelligently assign labels to the clustered data, efficiently converting unlabelled data into a useful and structured resource. 

The validity of the proposed approach is carefully evaluated on unlabelled vehicular datasets collected in real time. The proposed approach showed the ability to achieve more than 90% accuracy by using a newly labelled dataset. Furthermore, this research dove into the exciting field of transfer learning. It explored its potential to enhance multimodal reasoning by using knowledge gained from one dataset to improve performance on another. A novel model based on the transformer architecture is specifically designed to handle continuous features available in multimodal data. The result of the model was satisfactory and showed that the performance of the state-of-the-art was better than traditional machine learning (ML) algorithms.

This thesis research made significant and multifaceted contributions to the research on MML. It provided an extensive analysis of MML and its challenges, including existing approaches on alignment and fusion, by focusing on their limitations and identifying gaps in current research. Moreover, it introduced an effective approach for labelling unlabelled datasets through a series of carefully designed inferential steps, which shows a path for more efficient and scalable multimodal learning. Finally, it presented the outstanding potential of transfer learning, particularly with a transformer-based model, to advance multimodal reasoning. The insights, techniques, and results presented in this thesis held the potential to reveal a new edge in MML research and provide an opportunity to develop more useful, scalable, and data-efficient models to tackle real-world challenges across a wide range of applications.

Place, publisher, year, edition, pages
Västerås: Mälardalen University, 2024. p. 336
Series
Mälardalen University Press Licentiate Theses, ISSN 1651-9256 ; 367
Keywords
Multimodal Machine Learning, Transfer Learning, Multimodal Reasoning, Data Alignment, Data Fusion
National Category
Computer and Information Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:mdh:diva-69156 (URN)978-91-7485-691-0 (ISBN)
Presentation
2025-01-13, Kappa och digitalt via Zoom, Mälardalens Universitet, Västerås, 09:00 (English)
Opponent
Supervisors
Projects
FitDrive
Funder
EU, Horizon 2020, 953432
Available from: 2024-11-18 Created: 2024-11-18 Last updated: 2024-12-23Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Barua, ArnabAhmed, Mobyen UddinBarua, ShaibalBegum, Shahina

Search in DiVA

By author/editor
Barua, ArnabAhmed, Mobyen UddinBarua, ShaibalBegum, Shahina
By organisation
Embedded Systems
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 99 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf