https://www.mdu.se/

mdu.sePublications
1231 of 3
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Enhancing Multimodal Reasoning with Data Alignment and Fusion
Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems. (Artificial Intelligence and Intelligent Systems)
2024 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

Multimodal machine learning (MML) significantly transformed the development of artificial intelligence (AI) systems. Instead of working with single source data, it integrated and analysed information from multiple modalities, such as images, audio, text, sensors, and more. The volume of labelled and unlabelled multimodal data increased rapidly, but effectively using them, especially managing unlabelled multimodal data, poses significant challenges. Existing approaches usually depend on supervised learning and struggle to handle the heterogeneity and complexity of such data. These limitations interrupted the creation of good, scalable, and generalised MML systems that can use the full potential of this diverse data.

This thesis addressed this demanding challenge by using multimodal reasoning. To this end, a scheme was introduced for advancing multimodal reasoning by effectively using unlabelled multimodal data. The scheme was designed on inferential steps to use the latent knowledge and patterns hidden within these vast unlabelled datasets. These inferential steps mitigated the limitation of supervised methods, which solely depend on a vast amount of labelled data, which is difficult to get in real-world scenarios. The selection of unique inferential steps was based on their specific strengths in addressing challenges in unlabelled multimodal data. The scheme starts with using the unsupervised approach to extract features, which are then used as input for a clustering approach to group similar data points based on their hidden characteristics. This clustering approach sets the stage for applying a semi-supervised approach to intelligently assign labels to the clustered data, efficiently converting unlabelled data into a useful and structured resource. 

The validity of the proposed approach is carefully evaluated on unlabelled vehicular datasets collected in real time. The proposed approach showed the ability to achieve more than 90% accuracy by using a newly labelled dataset. Furthermore, this research dove into the exciting field of transfer learning. It explored its potential to enhance multimodal reasoning by using knowledge gained from one dataset to improve performance on another. A novel model based on the transformer architecture is specifically designed to handle continuous features available in multimodal data. The result of the model was satisfactory and showed that the performance of the state-of-the-art was better than traditional machine learning (ML) algorithms.

This thesis research made significant and multifaceted contributions to the research on MML. It provided an extensive analysis of MML and its challenges, including existing approaches on alignment and fusion, by focusing on their limitations and identifying gaps in current research. Moreover, it introduced an effective approach for labelling unlabelled datasets through a series of carefully designed inferential steps, which shows a path for more efficient and scalable multimodal learning. Finally, it presented the outstanding potential of transfer learning, particularly with a transformer-based model, to advance multimodal reasoning. The insights, techniques, and results presented in this thesis held the potential to reveal a new edge in MML research and provide an opportunity to develop more useful, scalable, and data-efficient models to tackle real-world challenges across a wide range of applications.

Place, publisher, year, edition, pages
Västerås: Mälardalen University , 2024. , p. 336
Series
Mälardalen University Press Licentiate Theses, ISSN 1651-9256 ; 367
Keywords [en]
Multimodal Machine Learning, Transfer Learning, Multimodal Reasoning, Data Alignment, Data Fusion
National Category
Computer and Information Sciences
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:mdh:diva-69156ISBN: 978-91-7485-691-0 (print)OAI: oai:DiVA.org:mdh-69156DiVA, id: diva2:1914093
Presentation
2025-01-13, Kappa och digitalt via Zoom, Mälardalens Universitet, Västerås, 09:00 (English)
Opponent
Supervisors
Projects
FitDrive
Funder
EU, Horizon 2020, 953432Available from: 2024-11-18 Created: 2024-11-18 Last updated: 2024-12-02Bibliographically approved
List of papers
1. A Systematic Literature Review on Multimodal Machine Learning: Applications, Challenges, Gaps and Future Directions
Open this publication in new window or tab >>A Systematic Literature Review on Multimodal Machine Learning: Applications, Challenges, Gaps and Future Directions
2023 (English)In: IEEE Access, E-ISSN 2169-3536, Vol. 11, p. 14804-14831Article, review/survey (Refereed) Published
Abstract [en]

Multimodal machine learning (MML) is a tempting multidisciplinary research area where heterogeneous data from multiple modalities and machine learning (ML) are combined to solve critical problems. Usually, research works use data from a single modality, such as images, audio, text, and signals. However, real-world issues have become critical now, and handling them using multiple modalities of data instead of a single modality can significantly impact finding solutions. ML algorithms play an essential role in tuning parameters in developing MML models. This paper reviews recent advancements in the challenges of MML, namely: representation, translation, alignment, fusion and co-learning, and presents the gaps and challenges. A systematic literature review (SLR) was applied to define the progress and trends on those challenges in the MML domain. In total, 1032 articles were examined in this review to extract features like source, domain, application, modality, etc. This research article will help researchers understand the constant state of MML and navigate the selection of future research directions.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc., 2023
Keywords
alignment, co-learning, fusion, Multimodal machine learning, representation, systematic literature review, translation, Machine learning applications, Machine-learning, Multi-disciplinary research, Multi-modal, Multiple modalities, Machine learning
National Category
Mechanical Engineering
Identifiers
urn:nbn:se:mdh:diva-62038 (URN)10.1109/ACCESS.2023.3243854 (DOI)000936312800001 ()2-s2.0-85149020788 (Scopus ID)
Available from: 2023-03-08 Created: 2023-03-08 Last updated: 2024-11-18Bibliographically approved
2. Multi-scale Data Fusion and Machine Learning for Vehicle Manoeuvre Classification
Open this publication in new window or tab >>Multi-scale Data Fusion and Machine Learning for Vehicle Manoeuvre Classification
2023 (English)In: ICSET 2023 - 2023 IEEE 13th International Conference on System Engineering and Technology, Proceeding, Institute of Electrical and Electronics Engineers Inc. , 2023, p. 296-301Conference paper, Published paper (Refereed)
Abstract [en]

Vehicle manoeuvre analysis is vital for road safety as it helps understand driver behaviour, traffic flow, and road conditions. However, classifying data from in-vehicle acquisition systems or simulators for manoeuvre recognition is complex, requiring data fusion and machine learning (ML) algorithms. This paper proposes a hybrid approach that combines multivariate multiscale entropy (MMSE) and one-dimensional convolutional neural networks (1D-CNNs). MMSE is utilised for early feature extraction and data fusion, and the extracted features are classified using 1D-CNNs, achieving an impressive 87% test accuracy in multiclass classification. This paper provides insights into improving vehicle manoeuvre classification using advanced ML techniques and data fusion methods to handle complex data sets effectively. Ultimately, this approach can enhance the understanding of driver behaviour, inform policy decisions, and develop more effective strategies to enhance road safety. 

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc., 2023
Keywords
Data Extraction, Data Fusion, Multivariate Multiscale Entropy (MMSE), Vehicle Manoeuvre, Accident prevention, Classification (of information), Complex networks, Data mining, Entropy, Extraction, Learning algorithms, Machine learning, Motor transportation, Roads and streets, Driver's behavior, Flow condition, Machine-learning, Multi-scale datum, Multivariate multiscale entropies, Multivariate multiscale entropy, Road safety, Traffic flow, Vehicle maneuver, Vehicles
National Category
Vehicle Engineering
Identifiers
urn:nbn:se:mdh:diva-65013 (URN)10.1109/ICSET59111.2023.10295109 (DOI)2-s2.0-85178031651 (Scopus ID)9798350340891 (ISBN)
Conference
13th IEEE International Conference on System Engineering and Technology, ICSET 2023, Shah Alam, 2 October 2023
Available from: 2023-12-13 Created: 2023-12-13 Last updated: 2024-11-18Bibliographically approved
3. Second-Order Learning with Grounding Alignment: A Multimodal Reasoning Approach to Handle Unlabelled Data
Open this publication in new window or tab >>Second-Order Learning with Grounding Alignment: A Multimodal Reasoning Approach to Handle Unlabelled Data
Show others...
2024 (English)In: International Conference on Agents and Artificial Intelligence, Science and Technology Publications, Lda , 2024, Vol. 2, p. 561-572Conference paper, Published paper (Refereed)
Abstract [en]

Multimodal machine learning is a critical aspect in the development and advancement of AI systems. However, it encounters significant challenges while working with multimodal data, where one of the major issues is dealing with unlabelled multimodal data, which can hinder effective analysis. To address the challenge, this paper proposes a multimodal reasoning approach adopting second-order learning, incorporating grounding alignment and semi-supervised learning methods. The proposed approach illustrates using unlabelled vehicular telemetry data. During the process, features were extracted from unlabelled telemetry data using an autoencoder and then clustered and aligned with true labels of neurophysiological data to create labelled and unlabelled datasets. In the semi-supervised approach, the Random Forest (RF) and eXtreme Gradient Boosting (XGBoost) algorithms are applied to the labelled dataset, achieving a test accuracy of over 97%. These algorithms are then used to predict labels for the unlabelled dataset, which is later added to the labelled dataset to retrain the model. With the additional prior labelled data, both algorithms achieved a 99% test accuracy. Confidence in predictions for unlabelled data was validated using counting samples based on the prediction score and Bayesian probability. RF and XGBoost scored 91.26% and 97.87% in counting samples and 98.67% and 99.77% in Bayesian probability, respectively.

Place, publisher, year, edition, pages
Science and Technology Publications, Lda, 2024
Keywords
Autoencoder, Multimodal Reasoning, Semi-Supervised, Supervised Alignment
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:mdh:diva-66579 (URN)10.5220/0012466500003636 (DOI)2-s2.0-85190658759 (Scopus ID)
Conference
16th International Conference on Agents and Artificial Intelligence, ICAART 2024. Rome 24 February 2024 through 26 February 2024
Available from: 2024-05-08 Created: 2024-05-08 Last updated: 2024-11-18Bibliographically approved
4. Advanced Hybrid Reasoning and Transfer Learning on Multimodal Data with Transformers
Open this publication in new window or tab >>Advanced Hybrid Reasoning and Transfer Learning on Multimodal Data with Transformers
(English)In: Springer Nature Computer ScienceArticle in journal (Refereed) Submitted
National Category
Computer and Information Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:mdh:diva-69147 (URN)
Funder
EU, Horizon 2020, 953432
Available from: 2024-11-15 Created: 2024-11-15 Last updated: 2024-12-04Bibliographically approved

Open Access in DiVA

The full text will be freely available from 2024-12-23 08:00
Available from 2024-12-23 08:00

Authority records

Barua, Arnab

Search in DiVA

By author/editor
Barua, Arnab
By organisation
Embedded Systems
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 58 hits
1231 of 3
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf