رزومه


سجاد محمدزاده

سجاد محمدزاده

دانشیار

عضو هیئت علمی تمام وقت

دانشکده: مهندسی برق و کامپیوتر

گروه: الکترونیک

مقطع تحصیلی: دکتری

رزومه
سجاد محمدزاده

دانشیار سجاد محمدزاده

عضو هیئت علمی تمام وقت
دانشکده: مهندسی برق و کامپیوتر - گروه: الکترونیک مقطع تحصیلی: دکتری |

Improving the Performance of Speaker Recognition System Using Optimized VGG Convolutional Neural Network and Data Augmentation

نویسندگانSajad Mohamadzadeh,Seyyed Mohammad Razavi
نشریهInternational Journal of Engineering
شماره صفحات2414-2425
شماره سریال38
شماره مجلد10
نوع مقالهFull Paper
تاریخ انتشار2025
رتبه نشریهعلمی - پژوهشی
نوع نشریهالکترونیکی
کشور محل چاپایران
نمایه نشریهJCR،isc،Scopus

چکیده مقاله

One of the methods that have gained attention in recent years is the extraction of Mel-spectrogram images from speech signals and the use of speaker recognition systems. This permits us to utilize existing image recognition methods for this purpose. Three-second segments of the speech are randomly chosen in this paper and then the Mel-spectrogram image of that segment is derived. These images are inputted into a proposed convolutional neural network that has been designed and optimized based on VGG-13. Compared to similar tasks, this optimized classifier has fewer parameters, and it trains faster and has a higher level of accuracy. For the voxceleb1 dataset with 1251 speakers, the accuracy of top-1 = 84.25% and top-5 = 94.33% has been achieved. In addition, various methods have been employed to augment data based on these images, ensuring the speech's nature remains intact, and in most cases, it improves the system's performance. The utilization of data agumentation techniques, such as flip horizontal and time shifting of images or ES technique, led to an increase in top-1 to 91.17% and top-5 to 97.32%. Moreover, by employing the Dropout layer output of the proposed neural network as a feature vector during training of the GMM-UBM model, the EER rate in the speaker verification system is decreased. These features reduce the EER value by 9% for the MFCC feature to 3.5%.

لینک ثابت مقاله