10 likes | 112 Vues
This study focuses on enhancing automatic word acquisition in multimodal conversational systems by aligning speech and gaze signals in real-time. By synchronizing verbal and visual cues, the system can accurately infer user intent and improve communication efficiency. The proposed methodology explores the temporal relationship between spoken words and corresponding gaze shifts, enabling more precise interpretation of user inputs. By integrating speech and gaze data, this research aims to advance the development of interactive systems that can seamlessly understand and respond to human language in various contexts.
E N D