1 code implementation • 28 Mar 2024 • Anna Kukleva, Fadime Sener, Edoardo Remelli, Bugra Tekin, Eric Sauser, Bernt Schiele, Shugao Ma
Lately, there has been growing interest in adapting vision-language models (VLMs) to image and third-person video classification due to their success in zero-shot recognition.
no code implementations • 26 Mar 2024 • Sammy Christen, Shreyas Hampali, Fadime Sener, Edoardo Remelli, Tomas Hodan, Eric Sauser, Shugao Ma, Bugra Tekin
In the grasping stage, the model only generates hand motions, whereas in the interaction phase both hand and object poses are synthesized.