CV 任务

分类 [1] #

image-level #

image recognition
(Retrieval)image-text retrieval
Caption(image captioning)
VQA(visual question answering)

region-level #

Object Detection object detection
- DETR -> DINO -> Grounding DINO
dense caption
phrase grounding

pixel-level #

Segmentation
- generic segmetation
- referring segmetation

其他 #

对比
- [CNN 更深的网络]
- [transformer 没有局限]
CV任务
- 分类（Classification）
- 检测（Detection）
- 分割（Segmentation）
- 跟踪（Tracking）
- 行为识别（Action Recognition）

参考 #

[CVPR Tutorial Talk] Towards General Vision Understanding Interface pdf