分类 [1] #
image-level #
-
image recognition
-
(Retrieval)image-text retrieval
-
Caption(image captioning)
-
VQA(visual question answering)
region-level #
-
Object Detection object detection
- DETR -> DINO -> Grounding DINO
-
dense caption
-
phrase grounding
pixel-level #
- Segmentation
- generic segmetation
- referring segmetation
其他 #
-
对比
- [CNN 更深的网络]
- [transformer 没有局限]
-
CV任务
- 分类(Classification)
- 检测(Detection)
- 分割(Segmentation)
- 跟踪(Tracking)
- 行为识别(Action Recognition)