no code implementations • 20 Apr 2024 • Yuang Liu, Zhiheng Qiu, Xiaokai Qin
ViT divides an image into several local patches, known as "visual sentences".
Image Classification Sentence