Enhancing Speaker Identification System Based on MFCC Feature Extraction and Gated Recurrent Unit Network

AuthorsSeyyed Mohammad Razavi,Mehran Taghipour
JournalJournal of Information Systems and Telecommunication
Page number254-263
Serial number12
Volume number4
Paper TypeFull Paper
Published At2025
Journal GradeScientific - research
Journal TypeElectronic
Journal CountryIran, Islamic Republic Of
Journal Indexisc،Scopus

Abstract

One of the biometric detection methods is to identify people based on speech signals. The implementation of a speaker identification (SI) system can be done in many different ways, and recently, many researchers have been focusing on using deep neural networks. One of the types of deep neural networks is recurrent neural networks, where memory and recurrent parts are handled by layers such as LSTM or Gated Recurrent Unit (GRU). In this paper, we propose a new structure as a classifier in the speaker identification system, which significantly improves the recognition rate by combining a convolutional neural network with two layers of GRU (CNN+ GRU). MFCC coefficients that have been extracted as cell arrays from each period of Pt speech will be used as sequence vectors for the input of proposed classifier. The performance of the SI system has improved in comparison to basic methods according to experiments conducted on two databases, LibriSpeech and VoxCeleb1. When Pt is longer, the system performs better, so that on the LibriSpeech database with 251 speakers, recognition accuracy is equal to 92.94% for Pt=1s, and it rises to 99.92% for Pt=9s. The proposed CNN+GRU classifier has a low sensitivity to specific genders, which can be said to be almost zero.

Paper URL

tags: speaker identification, gated recurrent unit network (GRU), convolutional neural network (CNN), MFCC,