Check out RO-ViT, a simple method to pre-train vision transformers in a region-aware manner (using a novel technique called "cro

28 Aug 2023, 17:03
Check out RO-ViT, a simple method to pre-train vision transformers in a region-aware manner (using a novel technique called “cropped positional embeddings”) to improve open-vocabulary detection. Learn more and grab the code at