https://www.mdu.se/

mdu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Enhancing Speech Emotion Recognition Using Deep Convolutional Neural Networks
Intelligent System Research Centre, Ulster University, UK.ORCID iD: 0000-0002-1823-1304
Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.ORCID iD: 0000-0002-7136-6339
American International University-Bangladesh, Bangladesh.ORCID iD: 0009-0003-1481-593X
American International University-Bangladesh, Bangladesh.ORCID iD: 0009-0007-1515-8731
Show others and affiliations
2024 (English)In: ICMLT '24: Proceedings of the 2024 9th International Conference on Machine Learning Technologies, ISSN 979-8-4007-1637-9, p. 95-100Article in journal (Other academic) Published
Abstract [en]

Speech emotion recognition (SER) is considered a pivotal area of research that holds significant importance in a variety of real-time applications, such as assessing human behavior and analyzing the emotional states of speakers in emergency situations. This paper assesses the capabilities of deep convolutional neural networks (CNNs) in this context. Both CNNs and Long Short-Term Memory (LSTM) based deep neural networks are evaluated for voice emotion identification. In our empirical evaluation, we utilize the Toronto Emotional Speech Set (TESS) database, which comprises speech samples from both young and old individuals, encompassing seven distinct emotions: anger, happiness, sadness, fear, surprise, disgust, and neutrality. To augment the dataset, variations in voice are introduced along with the addition of white noise. The empirical findings indicate that the CNN model outperforms existing studies on SER using the TESS corpus, yielding a noteworthy 21% improvement in average recognition accuracy. This work underscores SER’s significance and highlights the transformative potential of deep CNNs for enhancing its effectiveness in real-time applications, particularly in high-stakes emergency situations.

Place, publisher, year, edition, pages
2024. p. 95-100
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:mdh:diva-68446DOI: 10.1145/3674029.3674045ISI: 001342512100016Scopus ID: 2-s2.0-85204683049OAI: oai:DiVA.org:mdh-68446DiVA, id: diva2:1897260
Conference
International Conference on Machine Learning Technologies (ICMLT)
Available from: 2024-09-12 Created: 2024-09-12 Last updated: 2024-12-04Bibliographically approved

Open Access in DiVA

fulltext(1759 kB)290 downloads
File information
File name FULLTEXT01.pdfFile size 1759 kBChecksum SHA-512
7f559cda51b241077faf36f0b09379d86db869791c25fac5bf1de75f7232587622482583a1eb7082e8a3d2c78c7269d0a387c61977f7d35263ee1c90724c7646
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopus

Authority records

Kabir, Md AlamgirAbdelakram, HafidAbdullah, Saad

Search in DiVA

By author/editor
Islam, M M ManjurulKabir, Md AlamgirSheikh, AlaminSaiduzzaman, MuhammadAbdelakram, HafidAbdullah, Saad
By organisation
Embedded Systems
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 291 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 148 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf