mdh.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Optimizing text-independent speaker recognition using an LSTM neural network
Mälardalen University, School of Innovation, Design and Engineering.
2014 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

In this paper a novel speaker recognition system is introduced. Automated speaker recognition has become increasingly popular to aid in crime investigations and authorization processes with the advances in computer science. Here, a recurrent neural network approach is used to learn to identify ten speakers within a set of 21 audio books. Audio signals are processed via spectral analysis into Mel Frequency Cepstral Coefficients that serve as speaker specific features, which are input to the neural network. The Long Short-Term Memory algorithm is examined for the first time within this area, with interesting results. Experiments are made as to find the optimum network model for the problem. These show that the network learns to identify the speakers well, text-independently, when the recording situation is the same. However the system has problems to recognize speakers from different recordings, which is probably due to noise sensitivity of the speech processing algorithm in use.

Place, publisher, year, edition, pages
2014. , p. 52
Keywords [en]
speaker recognition, speaker identification, text-independent, long short-term memory, lstm, mel frequency cepstral coefficients, mfcc, recurrent neural network, speech processing, spectral analysis, rnnlib, htktoolkit
National Category
Other Engineering and Technologies not elsewhere specified
Identifiers
URN: urn:nbn:se:mdh:diva-26312OAI: oai:DiVA.org:mdh-26312DiVA, id: diva2:759404
External cooperation
Ss. Cyril and Methodius University in Skopje, Macedonia
Presentation
2014-09-05, Skopje, Macedonia, 17:25 (English)
Supervisors
Examiners
Available from: 2014-10-30 Created: 2014-10-29 Last updated: 2014-10-30Bibliographically approved

Open Access in DiVA

Optimizing text-independent speaker recognition using an LSTM neural network(1092 kB)789 downloads
File information
File name FULLTEXT01.pdfFile size 1092 kBChecksum SHA-512
8cb1ff78fe9f9ae607ba0202ca0839465e56a2f7768d4a0f9a3ab43d05413292ddf97f9a7b2aeff4e6520856b84843b7403d86b17a71ca9c9009e35e27e656f2
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Larsson, Joel
By organisation
School of Innovation, Design and Engineering
Other Engineering and Technologies not elsewhere specified

Search outside of DiVA

GoogleGoogle Scholar
Total: 789 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 2905 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf