2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Computer VisionNatural Language ProcessingDeep LearningMultimodal LearningBLIP-2Vision-Language Models
Introducing BLIP-2, a new vision-language model that leverages frozen image encoders and large language models to achieve improved efficiency and performance in various multimodal tasks.