Virtual reality (VR) experiences and 360-degree videos are transforming viewers from passive observers into active ...
Abstract: The spiking neural networks (SNNs) that efficiently encode temporal sequences have shown great potential in extracting audio-visual joint feature representations. However, coupling SNNs ...
Abstract: Audio-visual generalized zero-shot learning (AV-GZSL) for video classification is a task where the model learns to identify unseen video classes from multimodal audio-visual inputs. This is ...