AUDIO-BASED EMOTION RECOGNITION IN SPEECH USING DEEP LEARNING AND FEATURE ENGINEERING TECHNIQUES

Authors

  • Ramakrishna Gandi, Dr.A.Geetha, Dr.B.Ramasubba Reddy Author

Keywords:

Speech emotion recognition, Deep learning, Feature engineering, Affective computing, MLP, Real-time emotion detection, Sentiment Analysis

Abstract

Speech Emotion Recognition (SER) is now progressively vital for many practical uses including virtual assistants, customer service, and healthcare monitoring as well as for SER systems still suffer with environmental noise, speaker variability, and cross-lingual adaptation that affect their accuracy and generalizing power even if they have made tremendous progress. This work introduces ExpressNet, an optimum Multi-Layer Perceptron (MLP)-based SER model aimed to solve these issues by leveraging a wide range of prosodic and spectral qualities incorporating Mel-Frequency Cepstral Coefficients (MFCCs), spectral contrast, and pitch variations. ReLU activation and a softmax output layer allow the model to classify six emotional states: anger, disgust, fear, happiness, neutral, and sad using a deep learning architecture. We assess ExpressNet using the CREMA-D dataset and achieve a test accuracy of 92.97%, above the results of previous state-of-the-art approaches. Particularly real-time applications gain from the method since it helps to blend high classification accuracy with computing efficiency. Our work emphasizes the need of applying deep learning methods with enhanced feature engineering to improve SER performance. Furthermore, we show a thorough assessment over numerous benchmark datasets to show the power and applicability capacity of the model in many different settings. Apart from being better than other options, ExpressNet is a consistent choice for use in real-world settings since it has a low overfitting rate. This work advances emotional computing by providing a solid and scalable foundation for SER, which will enable further research in emotional recognition systems. The probable utilization of this technique extends to mental health monitoring and human-computer interaction because it demonstrates excellence at handling complex emotional patterns in voice signals. Extensive research into self-supervised learning and multimodal data integration and cross-lingual adaptation will improve the model's potential across multiple application scenarios.

 

Downloads

Published

2026-03-09

Issue

Section

Articles

How to Cite

AUDIO-BASED EMOTION RECOGNITION IN SPEECH USING DEEP LEARNING AND FEATURE ENGINEERING TECHNIQUES. (2026). Machine Intelligence Research, 20(1), 149-172. https://machineintelligenceresearchs.com/index.php/mir/article/view/314