Exploring Plain Vision Transformer Backbones for Object Detection
Investigating the effectiveness of plain Vision Transformers as backbones for object detection and proposing modifications to improve their performance.
Explore machine learning papers and reviews related to object detection. Find insights, analysis, and implementation details.
Investigating the effectiveness of plain Vision Transformers as backbones for object detection and proposing modifications to improve their performance.
Introducing YOLO, a unified, real-time object detection system that frames object detection as a single regression problem.
Introducing Faster R-CNN, a significant improvement over R-CNN and Fast R-CNN that uses a Region Proposal Network (RPN) to generate object proposals, leading to faster and more accurate object detection.
Introducing DETR, a novel end-to-end object detection framework that leverages Transformers to directly predict a set of object bounding boxes.
Introducing Swin Transformer, a hierarchical Vision Transformer that uses shifted windows to achieve improved efficiency and performance in various vision tasks.