Data Layout Considerations in Deep Learning
When you build an application that uses the generated CUDA® C++ code, you must provide a CUDA C++ main function that calls the generated code. By default, for code generation
      of source code, static libraries, dynamic libraries, and executables by using the codegen command, GPU Coder™ generates example CUDA C++ main files (main.cu source file and
        main.h header file in the examples subfolder of the
      build folder). This example main file is a template that helps you incorporate generated
        CUDA code into your application. The example main function declares and initializes
      data, including dynamically allocated data. It calls entry-point functions but does not use
      values that the entry point functions return.
When generating code for deep convolutional neural networks (CNN), the code generator takes advantage of NVIDIA® cuDNN, TensorRT for NVIDIA GPUs or the ARM® Compute Library for the ARM Mali GPUs. These libraries have specific data layout requirements for the input tensor holding images, video, and other data. When authoring custom main functions for building an application, you must create input buffers that provide data to the generated entry-point functions in the format expected by these libraries.
Data Layout Format for CNN
For deep convolutional neural networks (CNN), a 4-D tensor descriptor is used to define the format for batches of 2-D images with the following letters:
- N– the batch size
- C– the number of feature maps (number of channels)
- H– the height
- W– the width
The most commonly used 4-D tensor formats is shown, where the letters are sorted in decreasing order of the strides.
- NCHW
- NHWC
- CHWN
Of these, GPU Coder uses the NCHW format (column-major layout by default). To
        use row-major layout pass the -rowmajor option to the
          codegen command. Alternatively, configure your code for row-major
        layout by modifying the cfg.RowMajor parameter in the code generation
        configuration object.
For example, consider a batch of images with the following dimensions:
          N=1, C=3, H=5,
          W=4. If the image pixel elements are represented by a sequence of
        integers, the input images can be pictorially represented as follows.

When creating the input buffer in the main function, the 4-D image is laid out in the
        memory in the NCHW format as:
- Beginning with the first channel ( - C=0), the elements are arranged contiguously in row-major order.
- Continue with second and subsequent channels until the elements of all the channels are laid out. 
- Proceed to the next batch (if - N > 1).
Data Layout Format for LSTM
A long short-term memory (LSTM) network is a type of recurrent neural network (RNN) that can learn long-term dependencies between time steps of sequence data. For LSTM, the data layout format can be described with the following letters:
- N– the batch size
- S– the sequence length (number of time steps)
- d– the number of units in one input sequence

For LSTM, GPU Coder uses the SNd format by default.
See Also
Functions
Objects
- coder.gpuConfig|- coder.CodeConfig|- coder.EmbeddedCodeConfig|- coder.gpuEnvConfig|- coder.CuDNNConfig|- coder.TensorRTConfig
Topics
- Supported Networks, Layers, and Classes
- Load Pretrained Networks for Code Generation
- Code Generation for Deep Learning Networks by Using cuDNN
- Code Generation for Deep Learning Networks by Using TensorRT
- Code Generation for Deep Learning Networks Targeting ARM Mali GPUs
- Lane Detection Optimized with GPU Coder
- Deployment and Classification of Webcam Images on NVIDIA Jetson TX2 Platform