Computer Vision Toolbox Model for Vision Transformer Network

da MathWorks Computer Vision Toolbox Team

Implementation of several variants of the vision transformer (ViT) model.

Segui

5.0

(3)

1,4K download

Aggiornato 25 nov 2025

Condividi
Scarica

The Vision Transformer (ViT) model is a pretrained transformer model for image classification. It is also used as a backbone for other computer vision tasks such as object detection. The support package consists of three variants of the ViT model:

Base-16 model
Small-16 model
Tiny-16 model

Here, “base”, “small” and “tiny” represent the model architecture and size, and 16 represents the patch size hyper-parameter. Each variant has been pretrained on ImageNet data set with input resolution of 384 and is stored as a .MAT file.

Compatibilità della release di MATLAB

Creato con R2023b

Compatibile con R2023b fino a R2026a

Compatibilità della piattaforma

Windows macOS (Apple Silicon) macOS (Intel) Linux

Computer Vision Toolbox Model for Vision Transformer Network

Obbligatorio

Compatibilità della release di MATLAB

Compatibilità della piattaforma

Tag Aggiungi tag

Scopri Live Editor