DETR family

Backbone

Visual Encoder

Depth encoder

Depth predictor

Untitled

Depth Encoder

Depth-aware decoder

Process of informing a fixed number of queries with the information from the depth and appearance features 3 times

  1. Depth features cross-attention
  2. Queries self-attention
  3. Visual features cross-attention