Abstract: Audio-visual target speaker extraction (AV-TSE) aims to extract the specific person's speech from the audio mixture given auxiliary visual cues. Previous methods usually search for the ...
Abstract: This paper introduces the first audio-visual dataset for traffic anomaly detection called MAVAD, taken from real-world scenes, with a diverse range of illumination conditions. In addition, a ...
remove-circle Internet Archive's in-browser bookreader "theater" requires JavaScript to be enabled. It appears your browser does not have it turned on. Please see ...