1 code implementation • 28 May 2023 • Noam Rotstein, David Bensaid, Shaked Brody, Roy Ganz, Ron Kimmel
Our proposed method, FuseCap, fuses the outputs of such vision experts with the original captions using a large language model (LLM), yielding comprehensive image descriptions.
Ranked #1 on Image Captioning on COCO Captions (CLIPScore metric)