Computer Vision Toolbox Model for Vision Transformer Network

Implementation of several variants of the vision transformer (ViT) model.

Al momento, stai seguendo questo contributo

The Vision Transformer (ViT) model is a pretrained transformer model for image classification. It is also used as a backbone for other computer vision tasks such as object detection. The support package consists of three variants of the ViT model:
  • Base-16 model
  • Small-16 model
  • Tiny-16 model
Here, “base”, “small” and “tiny” represent the model architecture and size, and 16 represents the patch size hyper-parameter. Each variant has been pretrained on ImageNet data set with input resolution of 384 and is stored as a .MAT file.

Add the first tag.

Compatibilità della release di MATLAB

  • Compatibile con R2023b fino a R2026a

Compatibilità della piattaforma

  • Windows
  • macOS (Apple Silicon)
  • macOS (Intel)
  • Linux