Convolution neural networks with hybrid feature extraction methods for classification of voice sound signals

Pratibha Rashmi * and Manu Pratap Singh

Department of Computer Science, Dr. Bhimrao Ambedkar University, Khandari Campus, Agra, India.
 
Review
World Journal of Advanced Engineering Technology and Sciences, 2023, 08(02), 110–125.
Article DOI: 10.30574/wjaets.2023.8.2.0083
Publication history: 
Received on 07 February 2023; revised on 16 March 2023; accepted on 19 March 2023
 
Abstract: 
The convolutional neural networks (CNNs) lead in the domain of Sound Recognition due to its flexibility and ability with different adjusting parameters. The recognition of spoken English Alphabets by different people with deep learning techniques attracted the research community. In this paper, we are exploring the use of convolutional neural network (CNN), a deep learner that can automatically learn features directly from the dataset while training for the classification of sounds signals of English alphabets. In this proposed work, we consider two CNN architectures. In first architecture, we propose MFCC based features for pretrained two convolutional layer CNN architecture. In the second architecture, we propose a hybrid feature extraction method to train a block-based CNN architecture. The proposed systems consist of two components namely hybrid feature extraction and CNN classifier. The five auditory features log-Mel spectrogram (LM), MFCC, chroma, spectral contrast and Tonnetz features are extracted and then LM & MFCC are combined as one feature set. LM, MFCC, and CST features are aggregated as another for training to the proposed two CNNs, respectively. The different sound samples of English alphabets are collected from different people of different age groups. The feature sets collected from the hybrid feature extraction methods are presented to both the proposed CNNs and the experimental results are collected. The experimental results indicate that the taxonomic accuracy of the proposed architectures can surpass the existing methods of CNNs with single feature extraction methods. The proposed second architecture performs more effectively over the proposed first CNN architecture.
 
Keywords: 
Deep Neural Network; Convolutional Neural Networks; Sound Recognition; MFCC; Classification
 
Full text article in PDF: