2021
Learning Transferable Visual Models From Natural Language Supervision
Introducing CLIP, a neural network trained on a massive dataset of image-text pairs that learns to connect images with their textual descriptions, enabling zero-shot image classification and other powerful capabilities.