Abstract: In this paper, we present our work for Visual Speech Recognition (VSR) in the Mandarin Audio-Visual Speech Recognition (MAVSR) Challenge 2025, with a particular focus on improving lipreading ...
Abstract: Speech signals contain rich information, such as textual content, emotion, and speaker identity. To extract these features more efficiently, researchers are investigating joint training ...