2021
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Introducing Vision Transformer (ViT), a pure transformer architecture for image recognition that achieves state-of-the-art results.
Explore machine learning papers and reviews related to image recognition. Find insights, analysis, and implementation details.
Introducing Vision Transformer (ViT), a pure transformer architecture for image recognition that achieves state-of-the-art results.