Main Content

Get Started with Image Preprocessing and Augmentation for Deep Learning

Data preprocessing consists of a series of deterministic operations that normalize or enhance desired data features. For example, you can normalize data to a fixed range or resize data to the size required by the network input layer. Preprocessing is used for training, validation, and test data.

Preprocessing can occur at two stages in the deep learning workflow.

  • Commonly, preprocessing occurs as a separate step that you complete before preparing the data to be fed to the network. You load your original data, apply the preprocessing operations, then save the result to disk. The advantage of this approach is that the preprocessing overhead is only required once, then the preprocessed images are readily available as a starting place for all future trials of training a network.

  • If you load your data into a datastore, then you can also apply preprocessing during training by using the transform and combine functions. For more information, see Datastores for Deep Learning (Deep Learning Toolbox). The transformed images are not stored in memory. This approach is convenient to avoid writing a second copy of training data to disk if your preprocessing operations are not computationally expensive and do not noticeably impact the speed of training the network.

Common image preprocessing operations include noise removal, edge-preserving smoothing, color space conversion, contrast enhancement, and morphology. For an example that shows how to create and apply these transformations, see Augment Images for Deep Learning Workflows Using Image Processing Toolbox.

Data augmentation consists of randomized operations that are applied to the training data while the network is training.

Augmented image data can simulate variations in the image acquisition. Common types of image augmentation operations are randomized geometric transformations such as rotation and translation, which simulate variations in the camera orientation with respect to the scene. Random cropping simulates variations in the scene composition. Artificial noise simulates distortions introduced during image acquisition or upstream data processing operations. Augmentation increases the effective amount of training data and helps to make the network invariant to common variations and distortion in the data.

To augment training data, start by loading your data into a datastore. For more information, see Datastores for Deep Learning (Deep Learning Toolbox). Some built-in datastores apply a specific and limited set of augmentation to data for specific applications. You can also apply your own set of augmentation operations on data in the datastore by using the transform and combine functions. During training, the datastore randomly perturbs the training data for each epoch, so that each epoch uses a slightly different data set.

The table lists common types of preprocessing and augmentation operations applied to image data for deep learning applications.

Processing TypeDescriptionSample FunctionsSample Output
Resize imagesResize images by a fixed scaling factor or to a target size

The original image is on the left . The resized image is on the right.

Warp imagesApply random reflection, rotation, scale, shear, and translation to images

From left to right, the figure shows the original image, the reflected image, the rotated image, and the scaled image.

Crop imagesCrop an image to a target size from the center or a random position

The image cropped from the center is on the left. The image cropped from a random position is on the right.

Jitter colorRandomly adjust image hue, saturation, brightness, or contrast

From left to right, the figure shows the original image with random adjustments to the image hue, saturation, brightness, and contrast.

Simulate noiseAdd random Gaussian, Poisson, salt and pepper, or multiplicative noise

The image with randomly added salt and pepper noise is on the left. The image with randomly added Gaussian noise is on the right.

Simulate blurAdd Gaussian or directional motion blur

The image with a Gaussian blur is on the left. The image with a directional motion blur is on the right.

Related Examples

More About