Hierarchical Cross-Modal Attention Network (HCAN) is a multimodal emotion recognition framework that integrates electroencephalography (EEG) signals and speech-based emotional cues through graph neural networks, transformer-based temporal modeling, and bidirectional cross-modal attention. The proposed architecture is evaluated using the SEED-IV and RAVDESS datasets under subject-independent evaluation settings. Experimental analysis demonstrates competitive performance in physiological-acoustic emotion recognition while improving multimodal feature fusion and cross-subject generalization.
Keywords: EEG Emotion Recognition, Speech Emotion Recognition, Multimodal Learning, Graph Neural Networks, Transformers, Cross-Modal Attention, Affective Computing.
Publication Date: 2026-06-14