A Summary of Work on Emotional Voice Recognition with Machine Learning

Authors

  • Manroop Kaur

Abstract

This chapter compares SER systems. This talk covers a theoretical definition, affective state categorization, and several techniques to express emotions. This inquiry requires an SER system with many classifiers and feature extraction methods. After extracting Mel-Frequency Cestrum Coefficients (MFCC) and Modulation Spectral (MS) properties from speech signals, they are utilized to train multiple classifiers to classify them. FS was used to find the most relevant feature subset. Emotion classification was solved using many machine learning models. The initial categorization of these seven sentiments uses an RNN classifier. After that, their results are compared to those from spoken audio signals emotion detection methods like Multivariate Linear Regression (MLR) and Support Vector Machine (SVM). Experimental data has been collected from Berlin and Spanish databases. Speaker Normalization (SN) and feature selection improve the accuracy of all Berlin database classifiers to 83%. In Spanish datasets, RNN classifiers without SN and with FS have the maximum accuracy (94%).

How to cite this article:
Kaur M. A Summary of Work on Emotional Voice Recognition with Machine Learning. J Adv Res Comp Graph Multim Tech. 2023; 5(1): 1-13.

References

Alexander I. Iliev, Michael S. Scordilis, Joao P. Papa and Alexandre X. Falcao, 2010,“Spoken emotion recognition through optimum-path forest classification using glottal features”, Computer Speech and Language 24, pp. 445 - 460.

Ashish B. Ingale and Dr.D.S.Chaudhari, 2012, “Speech Emotion Recognition Using Hidden Markov Model and Support Vector Machine”, International Journal of Advanced Engineering Research and Studies, Vol. 1.

Enrique M. Albornoz, Diego H. Milone and Hugo L. Rufiner, 2011,“Spoken emotion recognition using hierarchical classifiers”, Computer Speech and Language 25, pp. 556 –570.

Enrique M. Albornoz, Diego H. Milone and Hugo L. Rufiner, 2011,“Spoken emotion recognition using hierarchical classifiers”, Computer Speech and Language 25, pp. 556 –570.

Kwon, Oh-Wook, Chan, K. Hao, J., Lee, Te-Won, “Emotion recognition by speech signals”, EUROSPEECH - Geneva, pp 125– 128, 2003.

T. Bänziger, K. R. Scherer, “The role of intonation in emotional expression”, Proc. IEEE Int’l Conf. on Speech Communication, vol.46, pp 252-267, 2005.

C. Busso, S. Lee and S. Narayanan, “Analysis of Emotionally Salient Aspects of Fundamental Frequency for Emotion Detection”, IEEE Trans. on Audio, Speech and Language processing, vol. 17, no. 4, pp 582-596,May 2009

Kyung Hak Hyun, Eun Ho Kim, Yoon Keun Kwak, “ Improvement of Emotion Recognition by Bayesian Classifier Using Nonzero-pitch Concept” , 7803-9275-2/05/2005 IEEE(2005)

Eun Ho Kim, Kyung Hak Hyun,” Robust Emotion Recognition Feature, Frequency Range of Meaningful Signal” IEEE International Workshop on Robots and Human Interactive Communication, 0-7803-9275-2/05 2005IEEE (2005)

G. Zhou et al., ”Nonlinear feature based classification of speech under stress,” IEEE Trans. Speech, Audio Proc., vol. 9, pp. 201-216, Mar. 2001.

T. S. Polzin, A. Waibel, ”Emotion-sensitive humancomputer interfaces,” ISCA Workshop, Speech and Emotion, 2000.

https://www.analyticsinsight.net/speech-emotion-recognition-ser-through-machine-learning/

https://www.intechopen.com/chapters/65993

Peipei S, Zhou C, Xiong C. Automatic speech emotion recognition using support vector machine. IEEE. 2011;2:621-625

.Sathit P. Improvement of speech emotion recognition with neural network classifier by using speech spectrogram. International Conference on Systems, Signals and Image Processing (IWSSIP). 2015:73-76

.Alex G, Navdeep J. Towards end-to-end speech recognition with recurrent neural networks. In: International Conference on Machine Learning. Vol. 32. 2014

.Chen S, Jin Q. Multi-Modal Dimensional Emotion Recognition using Recurrent Neural Networks. Australia: Brisbane; 2015

.Lim W, Jang D, Lee T. Speech emotion recognition using convolutional and recurrent neural networks. Asia-Pacific. 2017:1-4

.Sara M, Saeed S, Rabiee A. Speech Emotion Recognition Based on a Modified Brain Emotional Learning Model. Biologically inspired cognitive architectures. Elsevier; 2017;19:32-38

.Yu G, Eric P, Hai-Xiang L, van den HJ. Speech emotion recognition using voiced segment selection algorithm. ECAI. 2016;285:1682-1683

Published

2023-05-09