TY - JOUR
T1 - Att-Net
T2 - Enhanced emotion recognition system using lightweight self-attention module
AU - Mustaqeem,
AU - Kwon, Soonil
N1 - Funding Information:
This work was supported by the National Research Foundation of Korea funded by the Korean Government through the Ministry of Science and ICT under Grant NRF-2020R1F1A1060659. All authors have read and agreed to the published version of the manuscript.
Funding Information:
This work was supported by the National Research Foundation of Korea funded by the Korean Government through the Ministry of Science and ICT under Grant NRF-2020R1F1A1060659 .
Publisher Copyright:
© 2021 Elsevier B.V.
PY - 2021/4
Y1 - 2021/4
N2 - Speech emotion recognition (SER) is an active research field of digital signal processing and plays a crucial role in numerous applications of Human–computer interaction (HCI). Nowadays, the baseline state of the art systems has quite a low accuracy and high computations, which needs upgrading to make it reasonable for real-time industrial uses such as detection of content from speech data. The main intent for low recognition rate and high computational cost is a scarceness of datasets, model configuration, and patterns recognition that is the supreme stimulating work for building a robust SER system. In this study, we address these problems and propose a simple and lightweight deep learning-based self-attention module (SAM) for SER system. The transitional features map is given to SAM, which produces efficiently the channel and spatial axes attention map with insignificant overheads. We use a multi-layer perceptron (MLP) in channel attention to extracting global cues and a special dilated convolutional neural network (CNN) in spatial attention to extract spatial info from input tensor. Moreover, we merge, spatial and channel attention maps to produce a combine attention weights as a self-attention module. We placed SAM in the middle of convolutional and connected layers and trained it in an end-to-end mode. The ablation study and comprehensive experimentations are accompanied over IEMOCAP, RAVDESS, and EMO-DB speech emotion datasets. The proposed SER system shows consistent improvements in overall experiments for all datasets and shows 78.01%, 80.00%, and 93.00% average recall, respectively.
AB - Speech emotion recognition (SER) is an active research field of digital signal processing and plays a crucial role in numerous applications of Human–computer interaction (HCI). Nowadays, the baseline state of the art systems has quite a low accuracy and high computations, which needs upgrading to make it reasonable for real-time industrial uses such as detection of content from speech data. The main intent for low recognition rate and high computational cost is a scarceness of datasets, model configuration, and patterns recognition that is the supreme stimulating work for building a robust SER system. In this study, we address these problems and propose a simple and lightweight deep learning-based self-attention module (SAM) for SER system. The transitional features map is given to SAM, which produces efficiently the channel and spatial axes attention map with insignificant overheads. We use a multi-layer perceptron (MLP) in channel attention to extracting global cues and a special dilated convolutional neural network (CNN) in spatial attention to extract spatial info from input tensor. Moreover, we merge, spatial and channel attention maps to produce a combine attention weights as a self-attention module. We placed SAM in the middle of convolutional and connected layers and trained it in an end-to-end mode. The ablation study and comprehensive experimentations are accompanied over IEMOCAP, RAVDESS, and EMO-DB speech emotion datasets. The proposed SER system shows consistent improvements in overall experiments for all datasets and shows 78.01%, 80.00%, and 93.00% average recall, respectively.
KW - Affective computing
KW - Artificial intelligence
KW - Attention mechanism
KW - Emotion recognition
KW - Lightweight CNN
KW - Self-attention module
KW - Spectrograms
UR - http://www.scopus.com/inward/record.url?scp=85100093498&partnerID=8YFLogxK
U2 - 10.1016/j.asoc.2021.107101
DO - 10.1016/j.asoc.2021.107101
M3 - Article
AN - SCOPUS:85100093498
VL - 102
JO - Applied Soft Computing
JF - Applied Soft Computing
SN - 1568-4946
M1 - 107101
ER -