no code implementations • 4 Oct 2023 • Nokyung Park, Daewon Chae, Jeongyong Shim, Sangpil Kim, Eun-Sol Kim, Jinkyu Kim
However, they use pivot embedding in global manner (i. e., aligning an image embedding with sentence-level text embedding), not fully utilizing the semantic cues of given text description.