no code implementations • 2 Nov 2023 • Zalan Fabian, Zhongqi Miao, Chunyuan Li, Yuanhan Zhang, Ziwei Liu, Andrés Hernández, Andrés Montes-Rojas, Rafael Escucha, Laura Siabatto, Andrés Link, Pablo Arbeláez, Rahul Dodhia, Juan Lavista Ferres
In particular, we instruction tune vision-language models to generate detailed visual descriptions of camera trap images using similar terminology to experts.