RED-LSTM: Real time emotion detection using LSTM

Citation

Abstract

The development of the Internet of Things and voice-based multimedia apps has allowed for the association and capture of several aspects of human behavior through the use of big data, which consists of trends and patterns. In the emotion of human speech, there is a latent representation of numerous aspects that are expressed. By mining audio-based data, it has been prioritized to extract sentiment from human speech. This capacity to recognize and categorize human emotion will be crucial for developing the next generation of AI. The machine will then begin to connect with human desires as a result. The audio-based data, such as voice emotion recognition, has not been able to produce results as accurate as those of text-based emotion recognition in terms of performance. For acoustic modal data, this study presents a combined strategy of feature extraction and data encoding with one hot vector embedding. When real-time data is available, LSTM has even employed an RNN based model to forecast the emotion that captures the human voice’s tone and signifies it. When predicting categorical emotion, the model has been assessed and shown to perform better than the other models by about 10%. The model has been tested against two benchmark datasets, RAVDESS and TESS, which contain voice actors’ renditions of eight different emotions. This model beat other cutting-edge models, achieving approximately 80% accuracy for weighted data and approximately 85% accuracy for unweighted data.

Description

Cataloged from PDF version of thesis.
Includes bibliographical references (pages 27-29).
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2023.

Publisher Link

Type

Thesis