no code implementations • 13 Jun 2023 • Peijian Ding, Davit Soselia, Thomas Armstrong, Jiahao Su, Furong Huang
While the self-attention operator in vision transformers (ViT) is permutation-equivariant and thus shift-equivariant, patch embedding, positional encoding, and subsampled attention in ViT variants can disrupt this property, resulting in inconsistent predictions even under small shift perturbations.
no code implementations • 2 Aug 2021 • Peijian Ding
While Computerized Tomography (CT) images can help detect disease such as Covid-19, regular CT machines are large and expensive.