Enhancing Speaker Identification System Based on MFCC Feature Extraction and Gated Recurrent Unit Network

Seyyed Mohammad Razavi,Mehran Taghipour

Authors	Seyyed Mohammad Razavi,Mehran Taghipour
Journal	Journal of Information Systems and Telecommunication
Page number	254-263
Serial number	12
Volume number	4
Paper Type	Full Paper
Published At	2025
Journal Grade	Scientific - research
Journal Type	Electronic
Journal Country	Iran, Islamic Republic Of
Journal Index	isc،Scopus

Abstract

One of the biometric detection methods is to identify people based on speech signals. The implementation of a speaker identification (SI) system can be done in many different ways, and recently, many researchers have been focusing on using deep neural networks. One of the types of deep neural networks is recurrent neural networks, where memory and recurrent parts are handled by layers such as LSTM or Gated Recurrent Unit (GRU). In this paper, we propose a new structure as a classifier in the speaker identification system, which significantly improves the recognition rate by combining a convolutional neural network with two layers of GRU (CNN+ GRU). MFCC coefficients that have been extracted as cell arrays from each period of Pt speech will be used as sequence vectors for the input of proposed classifier. The performance of the SI system has improved in comparison to basic methods according to experiments conducted on two databases, LibriSpeech and VoxCeleb1. When Pt is longer, the system performs better, so that on the LibriSpeech database with 251 speakers, recognition accuracy is equal to 92.94% for Pt=1s, and it rises to 99.92% for Pt=9s. The proposed CNN+GRU classifier has a low sensitivity to specific genders, which can be said to be almost zero.

Paper URL

Mehran Tghipour-Gorjikolaie

Assistant Professor Mehran Tghipour-Gorjikolaie

Enhancing Speaker Identification System Based on MFCC Feature Extraction and Gated Recurrent Unit Network

Abstract