Authors | Seyyed Mohammad Razavi,Mehran Taghipour |
---|---|
Journal | Journal of Information Systems and Telecommunication |
Page number | 254-263 |
Serial number | 12 |
Volume number | 4 |
Paper Type | Full Paper |
Published At | 2025 |
Journal Grade | Scientific - research |
Journal Type | Electronic |
Journal Country | Iran, Islamic Republic Of |
Journal Index | isc،Scopus |
Abstract
One of the biometric detection methods is to identify people based on speech signals. The implementation of a speaker identification (SI) system can be done in many different ways, and recently, many researchers have been focusing on using deep neural networks. One of the types of deep neural networks is recurrent neural networks, where memory and recurrent parts are handled by layers such as LSTM or Gated Recurrent Unit (GRU). In this paper, we propose a new structure as a classifier in the speaker identification system, which significantly improves the recognition rate by combining a convolutional neural network with two layers of GRU (CNN+ GRU). MFCC coefficients that have been extracted as cell arrays from each period of Pt speech will be used as sequence vectors for the input of proposed classifier. The performance of the SI system has improved in comparison to basic methods according to experiments conducted on two databases, LibriSpeech and VoxCeleb1. When Pt is longer, the system performs better, so that on the LibriSpeech database with 251 speakers, recognition accuracy is equal to 92.94% for Pt=1s, and it rises to 99.92% for Pt=9s. The proposed CNN+GRU classifier has a low sensitivity to specific genders, which can be said to be almost zero.
tags: speaker identification, gated recurrent unit network (GRU), convolutional neural network (CNN), MFCC,