BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
(2023)
Introducing BLIP-2, a new vision-language model that leverages frozen image encoders and large language models to achieve improved efficiency and performance in various multimodal tasks.