Pengenalan Emosi Pembicara Menggunakan Convolutional Neural Networks

Speaker Emotion Recognition Using Convolutional Neural Networks

Keywords: Convolutional Neural Networks, Deep Learning, Keras, Speech Emotion Recognition, Tensorflow


Recognition of the speaker's emotions is an important but challenging component of Human-Computer Interaction (HCI). The need for the recognition of the speaker's emotions is also increasing related to the need for digitizing the company's operational processes related to the implementation of industry 4.0. The use of Deep Learning methods is currently increasing, especially for processing unstructured data such as data from voice signals. This study tries to apply the Deep Learning method to classify the speaker's emotions using an open dataset from SAVEE which contains seven classes of voice emotions in English. The dataset will be trained using the CNN model. The final accuracy of the model is 88% on the training data and 52% on the test data, which means the model is overfitting. This is due to the imbalance of emotion classes in the dataset, which makes the model tend to predict classes with more labels. In addition, the lack of heterogeneity of the dataset makes the character of the emotion class more different from the others so that it can reduce the bias in the model so as not to overfit the model. Further development of this research can be done, such as over-sampling the existing dataset by adding other data sources, then performing data augmentation to get the data character of each emotion class and setting hyperparameter values ​​to get better accuracy values.



Download data is not yet available.


B. W. Schuller, ``Speech emotion recognition: Two decades in a nutshell, benchmarks, and ongoing trends,'' Commun. ACM, vol. 61, no. 5, pp. 90-99, 2018.

M. S. Hossain and G. Muhammad, ``Emotion recognition using deep learning approach from audio-visual emotional big data,'' Inf. Fusion, vol. 49, pp. 69-78, Sep. 2019.

M. Chen, P. Zhou, and G. Fortino, ``Emotion communication system,'' IEEE Access, vol. 5, pp. 326-337, 2016.

N. D. Lane and P. Georgiev, ``Can deep learning revolutionize mobile sensing?'' in Proc. ACM 16th Int. Workshop Mobile Comput. Syst. Appl., 2015, pp. 117-122.

J. G. Rázuri, D. Sundgren, R. Rahmani, A. Moran, I. Bonet, and A. Larsson, ``Speech emotion recognition in emotional feedback for human-robot interaction,'' Int. J. Adv. Res. Artif. Intell., vol. 4, no. 2, pp. 20-27, 2015.

D. Le and E. M. Provost, ``Emotion recognition from spontaneous speech using hidden MARKOV models with deep belief networks,'' in Proc. IEEE Workshop Autom. Speech Recognit. Understand., Dec. 2013, pp. 216-221.

A. B. Nassif, I. Shahin, I. Attili, M. Azzeh, and K. Shaalan, ``Speech recognition using deep neural networks: A systematic review,'' IEEE Access, vol. 7, pp. 19143-19165, 2019.

S. Lalitha, A. Madhavan, B. Bhushan, and S. Saketh, ``Speech emotion recognition,'' in Proc. Int. Conf. Adv. Electron. Comput. Commun. (ICAECC), Oct. 2014, pp. 1-4.

K. R. Scherer, ``What are emotions? And how can they be measured?'' Social Sci. Inf., vol. 44, no. 4, pp. 695-729, 2005.

T. Balomenos, A. Raouzaiou, S. Ioannou, A. Drosopoulos, K. Karpouzis, and S. Kollias, ``Emotion analysis in man-machine interaction systems,'' in Proc. Int.Workshop Mach. Learn. Multimodal Interact. Springer, 2004, pp. 318-328.

R. Cowie, E. Douglas-Cowie, N. Tsapatsoulis, G. Votsis, S. Kollias, W. Fellenz, and J. G. Taylor, ``Emotion recognition in human computer interaction,'' IEEE Signal Process. Mag., vol. 18, no. 1, pp. 32-80, Jan. 2001.

O. Kwon, K. Chan, J. Hao, T. Lee, ``Emotion recognition by speech signal,'' in Proc. EUROSPEECH, Geneva, Switzerland, 2003, pp. 125-128.

R. W. Picard, ``Affective computing,'' Perceptual Comput. Sect., Media Lab., MIT, Cambridge, MA, USA, Tech. Rep., 1995.

S. G. Koolagudi and K. S. Rao, ``Emotion recognition from speech: A review,'' Int. J. speech Technol., vol. 15, no. 2, pp. 99-117, 2012.

M. El Ayadi, M. S. Kamel, and F. Karray, ``Survey on speech emotion recognition: Features, classification schemes, and databases,'' Pattern Recognit., vol. 44, no. 3, pp. 572-587, 2011.

A. D. Dileep and C. C. Sekhar, ``GMM-based intermediate matching kernel for classi-cation of varying length patterns of long duration speech using support vector machines,'' IEEE Trans. neural Netw. Learn. Syst., vol. 25, no. 8, pp. 1421-1432, Aug. 2014.

L. Deng and D. Yu, ``Deep learning: Methods and applications,'' Found. Trends Signal Process., vol. 7, nos. 3-4, pp. 197-387, Jun. 2014.

J. Schmidhuber, ``Deep learning in neural networks: An overview,'' Neural Netw., vol. 61, pp. 85-117, Jan. 2015.

T. Vogt and E. André, ``Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition,'' in Proc. IEEE Int. Conf. Multimedia Expo (ICME), Jul. 2005, pp. 474-477.

C.-N. Anagnostopoulos, T. Iliou, and I. Giannoukos, ``Features and classifiers for emotion recognition from speech: A survey from 2000 to 2011,''Artif. Intell. Rev., vol. 43, no. 2, pp. 155177, 2015.

A. Batliner, B. Schuller, D. Seppi, S. Steidl, L. Devillers, L. Vidrascu, T. Vogt, V. Aharonson, and N. Amir, ``The automatic recognition of emotions in speech,'' in Emotion-Oriented Systems. Springer, 2011, pp. 71-99.

E. Mower, M. J. Mataric, and S. Narayanan, ``A framework for automatic human emotion classication using emotion proles,'' IEEE Trans. Audio, Speech, Language Process., vol. 19, no. 5, pp. 1057-1070, Jul. 2011.

J. Han, Z. Zhang, F. Ringeval, and B. Schuller, ``Prediction-based learning for continuous emotion recognition in speech,'' in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), Mar. 2017, pp. 5005-5009.

W. Q. Zheng, J. S. Yu, and Y. X. Zou, ``An experimental study of speech emotion recognition based on deep convolutional neural networks,'' in Proc. Int. Conf. Affect. Comput. Intell. Interact. (ACII), Sep. 2015, pp. 827-831.

F. Dipl and T. Vogt, “Real-time automatic emotion recognition from speech,” 2010.

S. Lugovic, I. Dunder, and M. Horvat, “Techniques and applications of emotion recognition in speech,” 2016 39th Int. Conv. Inf. Commun. Technol. Electron. Microelectron. MIPRO 2016 - Proc., no. November 2017, pp. 1278–1283, 2016.

F. Noroozi, N. Akrami, and G. Anbarjafari, “Speech-based emotion recognition and next reaction prediction,” 2017 25th Signal Process. Commun. Appl. Conf. SIU 2017, no. 1, 2017.

C.-W. Huang and S. S. Narayanan, “Characterizing Types of Convolution in Deep Convolutional Recurrent Neural Networks for Robust Speech Emotion Recognition,” pp. 1–19, 2017.

H. M. Fayek, M. Lech, and L. Cavedon, “Evaluating deep learning architectures for Speech Emotion Recognition,” Neural Networks, vol. 92, pp. 60–68, 2017.

A. M. Badshah, J. Ahmad, N. Rahim, and S. W. Baik, “Speech Emotion Recognition from Spectrograms with Deep Convolutional Neural Network,” 2017 Int. Conf. Platf. Technol. Serv., pp. 1–5, 2017.

How to Cite
Rendi Nurcahyo, & Mohammad Iqbal. (2022). Pengenalan Emosi Pembicara Menggunakan Convolutional Neural Networks. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 6(1), 115 - 122.
Information Technology Articles