PatternIQ Mining (PIQM)

ISSN:3006-8894

Title:Deep Learning Algorithms for Multimodal Interaction Using Speech and Motion Data in Virtual Reality Systems

PatternIQ Mining
© 2024 by piqm - Sahara Digital Publications
ISSN: 3006-8894
Volume 01, Issue 04
Year of Publication : 2024
Page: [52 - 64]


Authors :

Ahmed Zubair and Fatima Al Rashed

Address :

Faculty of Computer science, American University of Sharjah, UAE

Abstract :

Multimodal interaction (MMI) represented by speech and motion data (SMD) has enormous potential in virtual reality (VR) systems. However, real-time synchronization, context-sensitive interpretation, and effective fusion of heterogeneous data modalities remain open. The study presents a deep learning-based framework that fuses speech and motion data to provide better performance in interaction. This study proposes a novel method called MMI-CNNRNN that combines a Convolutional Neural Network (CNN) that features extraction in speech with a Recurrent Neural Network (RNN) for temporal motion analysis, integrated into a Transformer-based architecture to enhance the contextual understanding and responsiveness of the system. In this regard, the performance of the proposed framework is evaluated using benchmark multimodal datasets such as the IEMOCAP dataset. These results represent a 20% increase in interaction accuracy and a 15% latency reduction compared to unimodal and early fusion methods. The fusion of CNN and RNN mechanisms translates into more natural and intuitive interactions, making both the assistive device and the VR environment more adaptive and user-friendly. Concluding from the findings of the proposed work, efficient multimodal system development supports better accessibility and engagement among users with various needs.

Keywords :

Multimodal Interaction, Deep Learning, Virtual Reality, Speech-Motion Fusion, Convolutional Neural Network (CNN), Recurrent Neural Network (RNN).

DOI :

https://doi.org/10.70023/piqm24301