YOLOS Base

hustvl vision

image

A vision transformer-based object detection model that treats detection as a sequence-to-sequence problem, similar to DETR but using a pure transformer encoder-decoder architecture.