Main Content

Detect Image Anomalies Using Explainable One-Class Classification Neural Network

This example shows how to detect and localize anomalies such as cracks in concrete using explainable single-class classification.

In one-class approaches to anomaly detection, training is semi-supervised, meaning that the network trains on data consisting only of examples of images without anomalies [1]. Despite training on samples only of normal scenes, the model learns how to distinguish between normal and anomalous scenes. One-class learning offers many advantages for anomaly detection problems:

  1. Representations of anomalies can be scarce.

  2. Anomalies can represent expensive or catastrophic outcomes.

  3. There can be many kinds of anomalies, and the kinds of anomalies can change over the lifetime of the model. Describing what "good" looks like is often more feasible than providing data that represents all possible anomalies in real world settings.

A crucial part of anomaly detection is for a human observer to be able to understand why a trained network classifies images as anomalies. Explainable classification supplements the class prediction with information that justifies how the neural network reached its classification decision.

This example explores how one-class deep learning can be used to create accurate anomaly detection classifiers. The example also implements explainable classification using a network that returns a heatmap with the probability that each pixel is anomalous.

Download Concrete Crack Images for Classification Data Set

This example works with the Concrete Crack Images for Classification data set. The data set contains images of two classes: Negative images without cracks present in the road and Positive images with cracks. The data set provides 20,000 images of each class. The size of the data set is 235 MB.

Set dataDir as the desired location of the data set.

dataDir = fullfile(tempdir,"ConcreteCracks");
if ~exist(dataDir,"dir")
    mkdir(dataDir);
end

To download the data set, go to this link: https://md-datasets-cache-zipfiles-prod.s3.eu-west-1.amazonaws.com/5y9wdsg2zt-2.zip. Extract the ZIP file to obtain a RAR file, then extract the contents of the RAR file into the directory specified by the dataDir variable. When extracted successfully, dataDir contains two subdirectories: Negative and Positive.

Load and Preprocess Data

Create an imageDatastore that reads and manages the image data. Label each image as Positive or Negative according to the name of its directory.

imdsP = imageDatastore(fullfile(dataDir,"Positive"),LabelSource="foldernames");
imdsN = imageDatastore(fullfile(dataDir,"Negative"),LabelSource="foldernames");

Display an example of each class. Display a negative, or good, image without crack anomalies on the left. In the good image, imperfections and deviations in texture are small. Display a positive, or anomalous, image on the right. The anomalous image shows a large black crack oriented vertically.

samplePositive = preview(imdsP);
sampleNegative = preview(imdsN);
montage({sampleNegative,samplePositive})
title("Road Images Without (Left) and With (Right) Crack Anomalies")

Partition Data into Training, Validation, and Test Sets

To simulate a more typical semi-supervised workflow, create a training set of only 250 good images. Include a small collection of anomalous training images to provide better classification results. Create a balanced test set using 1000 images from each class. Create a balanced validation set using 100 images of each class.

numTrainNormal = 250;
numTrainAnomaly = 10;
numTest = 1000;
numVal = 100;

[dsTrainPos,dsTestPos,dsValPos,~] = splitEachLabel(imdsP,numTrainAnomaly,numTest,numVal);
[dsTrainNeg,dsTestNeg,dsValNeg,~] = splitEachLabel(imdsN,numTrainNormal,numTest,numVal);

dsTrain = imageDatastore(cat(1,dsTrainPos.Files,dsTrainNeg.Files),LabelSource="foldernames");
dsTest = imageDatastore(cat(1,dsTestPos.Files,dsTestNeg.Files),LabelSource="foldernames");
dsVal = imageDatastore(cat(1,dsValPos.Files,dsValNeg.Files),LabelSource="foldernames");

Define an anonymous function, addLabelFcn, that creates a one-hot encoded representation of label information from an input image. Then, transform the datastores using the transform function such that the datastores return a cell array of image data and corresponding one-hot encoded array. The transform function applies the operations specified by the anonymous addLabelFcn function.

addLabelFcn = @(x,info) deal({x,onehotencode(info.Label,1)},info);
dsTrain = transform(dsTrain,addLabelFcn,IncludeInfo=true);
dsVal = transform(dsVal,addLabelFcn,IncludeInfo=true);
dsTest = transform(dsTest,addLabelFcn,IncludeInfo=true);

Augment Training Data

Augment the training data by using the transform function with custom preprocessing operations specified by the helper function augmentDataForConcreteClassification. The helper function is attached to the example as a supporting file.

The augmentDataForConcreteClassification function randomly applies 90 degree rotation and horizontal and vertical reflection to each input image.

dsTrain = transform(dsTrain,@augmentDataForConcreteClassification);

Batch Training Data

Create a minibatchqueue (Deep Learning Toolbox) object that manages the mini-batching of observations in a custom training loop. The minibatchqueue object also casts data to a dlarray (Deep Learning Toolbox) object that enables automatic differentiation in deep learning applications.

Specify the mini-batch data extraction format as "SSCB" (spatial, spatial, channel, batch). Set the DispatchInBackground name-value argument as the boolean returned by canUseGPU. If a supported GPU is available for computation, then the minibatchqueue object preprocesses mini-batches in the background in a parallel pool during training.

mbSize = 128;
mbqTrain = minibatchqueue(dsTrain,PartialMiniBatch="discard", ...
    MiniBatchFormat=["SSCB","CB"],MiniBatchSize=mbSize);

mbqVal = minibatchqueue(dsVal,MiniBatchFormat=["SSCB","CB"],MiniBatchSize=mbSize);

Calculate Training Set Statistics for Input Normalization

Calculate the per-channel mean of the training images for use in zero-mean normalization.

queue = copy(mbqTrain);
queue.PartialMiniBatch = "return";

X = next(queue);
sumImg = sum(X,4);

while hasdata(queue)
    X = next(queue);
    sumImg = sumImg + sum(X,4);
end
trainSetMean = sumImg ./ dsTrain.numpartitions;
trainSetMean = mean(trainSetMean,[1 2]);

Create FCDD Model

This example uses a fully convolutional data description (FCDD) model [1]. The basic idea of FCDD is to train a network to produce a heatmap from an input image.

This example uses a VGG-16 network [3] trained on ImageNet [4] as the base fully convolutional network architecture. The example freezes the majority of the model and randomly initializes and trains the final convolutional stages. This approach enables quick training with small amounts of input training data.

The vgg16 (Deep Learning Toolbox) function returns a pretrained VGG-16 network. This function requires the Deep Learning Toolbox™ Model for VGG-16 Network support package. If this support package is not installed, then the function provides a download link.

pretrainedVGG = vgg16;

Freeze the first 24 layers of the network by setting the weights and bias to 0.

numLayersToFreeze = 24;
net = pretrainedVGG.Layers(1:numLayersToFreeze);
for idx = 1:numLayersToFreeze
    if isprop(net(idx),"Weights")
        net(idx) = setLearnRateFactor(net(idx),Weights=0);
        net(idx) = setLearnRateFactor(net(idx),Bias=0);
    end
end

Add a final convolutional stage. This stage is similar to the next convolutional stage of VGG-16 but with randomly initialized and trainable convolutional layers and with batch normalization. A final 1-by-1 convolution compresses the network output into a one-channel heatmap. The final layer is a Pseudo-Huber loss function used to stabilize training with the FCDD loss [1] [2].

finalLayers = [
    convolution2dLayer(3,512,Padding="same")
    batchNormalizationLayer
    reluLayer
    convolution2dLayer(3,512,Padding="same")
    batchNormalizationLayer
    reluLayer
    convolution2dLayer(1,1)
    functionLayer(@(x)sqrt(x.^2+1)-1,Name="pseudoHuber")];

Assemble the complete network.

net = dlnetwork([net;finalLayers]);
lgraph = layerGraph(net);

Replace the image input layer in the encoder with a new input layer that performs zero-center normalization using the computed mean. Set the input size of the network equal to the size of the images in the data set.

inputSize = size(sampleNegative);
newInputLayer = imageInputLayer(inputSize,Normalization="zerocenter", ...
    Mean=extractdata(trainSetMean),Name="inputLayer");
lgraph = replaceLayer(lgraph,lgraph.Layers(1).Name,newInputLayer);

Specify Training Options

Specify the training options for Adam optimization. Train the network for 70 epochs.

learnRate = 1e-4;
averageGrad = [];
averageSqGrad = [];
numEpochs = 70;

Train Network or Download Pretrained Network

By default, this example downloads a pretrained version of the VGG-16 network using the helper function downloadTrainedConcreteClassificationNet. The pretrained network can be used to run the entire example without waiting for training to complete.

To train the network, set the doTraining variable in the following code to true. Train the model in a custom training loop. For each iteration:

  • Read the data for the current mini-batch using the next (Deep Learning Toolbox) function.

  • Evaluate the model gradients using the dlfeval (Deep Learning Toolbox) function and the modelGradients helper function. This function is defined at the end of this example.

  • Update the network parameters using the adamupdate (Deep Learning Toolbox) function.

  • Update a plot of the loss.

Train on one or more GPUs, if available. Using a GPU requires Parallel Computing Toolbox™ and a CUDA® enabled NVIDIA® GPU. For more information, see GPU Support by Release (Parallel Computing Toolbox). Training takes about 7 minutes on an NVIDIA™ Titan X.

doTraining = false;
if doTraining
   
    [hFig,batchLine] = initializeTrainingPlotConcreteClassification;
    start = tic;
    
    iteration = 0;
    for epoch = 1:numEpochs
        reset(mbqTrain);
        shuffle(mbqTrain);
    
        while hasdata(mbqTrain)
            [X,T] = next(mbqTrain);
            [grad,loss,state] = dlfeval(@modelGradients,net,X,T);
    
            iteration = iteration + 1;
    
            [net,averageGrad,averageSqGrad] = adamupdate(net, ...
                grad,averageGrad,averageSqGrad,iteration,learnRate);
    
            net.State = state;
    
            % Update the plot
            updateTrainingPlotConcreteClassification(batchLine,iteration,loss,start,epoch);
        end
    end
else
    trainedConcreteClassificationNet_url = 'https://www.mathworks.com/supportfiles/vision/data/trainedAnomalyDetectionNet.zip';
    downloadTrainedConcreteClassificationNet(trainedConcreteClassificationNet_url,dataDir);
    load(fullfile(dataDir,"trainedAnomalyDetectionNet.mat"));
end

Create Classification Model

Classify an image as good or anomalous using the anomaly heatmap predicted by the network in two steps. First, create a classifier that returns the probability that an image has an anomaly based on the heatmap predicted by the trained network. Then, you can assign a class label to test images based on the probability returned by the classifier.

Calculate the mean anomaly score and the known ground truth label (Positive or Negative) for each image in the validation set. This example chooses the anomaly score threshold using the validation set to avoid leaking information from the test set into the design of the classifier.

reset(mbqVal);

scores = zeros(dsVal.numpartitions,1);
labels = zeros(dsVal.numpartitions,1);
idx = 1;
while hasdata(mbqVal)
    [X,T] = next(mbqVal);
    batchSize = size(X,4);
    Y = predict(net,X);
    Y = mean(Y,[1 2]);
    T = onehotdecode(T,[0 1],1,"double");
    scores(idx:idx+batchSize-1) = Y;
    labels(idx:idx+batchSize-1) = T;
    idx = idx+batchSize;
end

Fit a sigmoid to the anomaly scores and associated class labels. The resulting anonymous function, anomalyProbability, takes an input anomaly score predicted by the network and returns a probability that the score describes an anomaly. Outputs from anomalyProbability greater than 0.5 are considered anomalies.

coeff = glmfit(scores,labels,"binomial","link","logit");
sigmoid = @(x) 1./(1+exp(-x));
anomalyProbability = @(x) sigmoid(coeff(2).*x + coeff(1));
fplot(anomalyProbability)
xlabel("Anomaly Score")
ylabel("Probability of Anomaly")

The default sigmoid has a value of 0.5 at a score of 0. Calculate the anomaly score at which the fitted sigmoid defined by the anomalyProbability function has a value of 0.5.

threshold = -coeff(1)/coeff(2)
threshold = 0.1567

Evaluate Classification Model

Predict the anomaly heatmap and calculate the mean anomaly score for each image in the test set. Also get the ground truth labels of each test image.

q = minibatchqueue(dsTest,MiniBatchSize=64,MiniBatchFormat=["SSCB","CB"]);
scores = zeros(dsTest.numpartitions,1);
labels = zeros(dsTest.numpartitions,1);
idx = 1;
while hasdata(q)
    [X,T] = next(q);
    batchSize = size(X,4);
    Y = predict(net,X);
    Y = mean(Y,[1 2]);
    T = onehotdecode(T,[0 1],1,"double");
    scores(idx:idx+batchSize-1) = Y;
    labels(idx:idx+batchSize-1) = T;
    idx = idx+batchSize;
end

Predict class labels for the test set using the classification model defined by the anomalyProbability function.

predictedLabels = anomalyProbability(scores) > 0.5;

Calculate the confusion matrix and the classification accuracy for the test set. The classification model in this example is very accurate and predicts a small percentage of false positives and false negatives.

targetLabels = logical(labels);
M = confusionmat(targetLabels,predictedLabels);
confusionchart(M,["Negative","Positive"])
acc = sum(diag(M)) / sum(M,'all');
title(sprintf("Accuracy: %g",acc));

Explain Classification Decisions

View Heatmap of Anomaly

Select and display an image of a correctly classified anomaly. This result is a true positive classification.

idxTruePositive = find(targetLabels & predictedLabels,1);
dsExample = subset(dsTest,idxTruePositive);
dataTP = preview(dsExample);
imgTP = dataTP{1};
imshow(imgTP)
title("True Positive Test Image")

Obtain a heatmap of an anomaly image by calling the predict (Deep Learning Toolbox) function on the trained network. The network returns a heatmap with a smaller size than the input image, so resize the heatmap to the size of the input image.

hmapTP = predict(net,gpuArray(dlarray(single(imgTP),"SSCB")));
hmapTP = gather(extractdata(hmapTP));
hmapTP = anomalyProbability(hmapTP);
hmapTP = imresize(hmapTP,OutputSize=size(imgTP,[1 2]));

Display an overlay of a colored heatmap on a grayscale representation of the color image using the heatmapOverlay helper function. This function is defined at the end of the example. Adjust the opacity of the overlaid heatmap by setting alpha to a number in the range [0, 1]. For a fully opaque heatmap, specify alpha as 1. For a fully transparent heatmap, specify alpha as 0.

alpha = 0.4;
imshow(heatmapOverlay(imgTP,hmapTP,alpha))
title("Heatmap Overlay of True Positive Result")

To quantitatively confirm the result, display the mean heatmap anomaly score of the true positive test image as predicted by the network.

disp("Mean heatmap anomaly score of test image: "+scores(idxTruePositive(1)))
Mean heatmap anomaly score of test image: 1.2185

View Heatmap of Normal Image

Select and display an image of a correctly classified normal image. This result is a true negative classification.

idxTrueNegative = find(~targetLabels & ~predictedLabels);
dsTestTN = subset(dsTest,idxTrueNegative);
dataTN = read(dsTestTN);
imgTN = dataTN{1};
imshow(imgTN)
title("True Negative Test Image")

Obtain a heatmap of the correctly predicted negative image by calling the predict (Deep Learning Toolbox) function on the trained network. Resize the heatmap to the size of the input image.

hmapTN = predict(net,gpuArray(dlarray(single(imgTN),"SSCB")));
hmapTN = gather(extractdata(hmapTN));
hmapTN = anomalyProbability(hmapTN);
hmapTN = imresize(hmapTN,OutputSize=size(imgTN,[1 2]));

Display an overlay of a colored heatmap on a grayscale representation of the color image using the heatmapOverlay helper function. This function is defined at the end of the example. Many true negative test images, such as this test image, either have small anomaly scores across the entire image or large anomaly scores in a localized portion of the image.

imshow(heatmapOverlay(imgTN,hmapTN,alpha))
title("Heatmap Overlay of True Negative Result")

Display the mean heatmap anomaly score of the true negative test image as predicted by the network.

disp("Mean heatmap anomaly score of test image: "+scores(idxTrueNegative(1)))
Mean heatmap anomaly score of test image: 0.0022878

Visualize False Positives

False positives are images without crack anomalies that the network classifies as anomalous. Display any false positive images from the test set in a montage.

falsePositiveIdx = find(predictedLabels & ~targetLabels);
dataFP = readall(subset(dsTest,falsePositiveIdx));
numFP = length(falsePositiveIdx);
if numFP>0
    montage(dataFP(:,1),Size=[1,numFP]);
    title("False Positives in Test Set")
end

Use the explanation from the network to gain insights into the misclassifications.

Calculate the heatmap overlays.

hmapOverlay = cell(1,numFP);
for idx = 1:numFP
    img = dataFP{idx,1};
    map = predict(net,gpuArray(dlarray(single(img),"SSCB")));
    map = gather(extractdata(map));
    mapProb = anomalyProbability(map);
    hmap = imresize(mapProb,OutputSize=size(img,[1 2]));
    hmapOverlay{idx} = heatmapOverlay(img,hmap,alpha);
end

The false positive images show features such as rocks that have similar visual characteristics to cracks. In all cases, these are hard classification problems in which the explainable nature of the network architecture is useful for gaining insights.

if numFP>0
    montage(hmapOverlay,Size=[1,numFP])
    title("Heatmap Overlays of False Positive Results")
end

Display the mean anomaly heatmap scores of the false positive test images as predicted by the network. The mean scores are larger than the threshold used by the classification model, resulting in misclassifications.

disp("Mean heatmap anomaly scores:"); scores(falsePositiveIdx)
Mean heatmap anomaly scores:
ans = 4×1

    0.3218
    0.1700
    0.2516
    0.1862

Visualize False Negatives

False negatives are images with anomalies that the network classifies as good. Display any false negative images from the test set in a montage. The false negative images show relatively thin visible cracks.

falseNegativeIdx = find(~predictedLabels & targetLabels);
dataFN = readall(subset(dsTest,falseNegativeIdx));
numFN = length(falseNegativeIdx);
if numFN>0
    montage(dataFN(:,1),Size=[1,numFN])
    title("False Negatives in Test Set")
end

Calculate the heatmap overlays.

hmapOverlay = cell(1,numFN);
for idx = 1:numFN
    img = dataFN{idx,1};
    map = predict(net,gpuArray(dlarray(single(img),"SSCB")));
    map = gather(extractdata(map));
    mapProb = anomalyProbability(map);
    hmap = imresize(mapProb,OutputSize=size(img,[1 2]));
    hmapOverlay{idx} = heatmapOverlay(img,hmap,alpha);
end

Display the heatmap overlays. The network assigns a large anomaly score along cracks, as expected.

if numFN>0
    montage(hmapOverlay,Size=[1,numFN])
    title("Heatmap Overlays of False Negative Results")
end

Display the mean heatmap anomaly scores of the negative positive test images as predicted by the network. The mean scores are smaller than the threshold used by the sigmoid classification model, resulting in misclassifications.

disp("Mean heatmap anomaly scores:"); scores(falseNegativeIdx)
Mean heatmap anomaly scores:
ans = 4×1

    0.1477
    0.1186
    0.1241
    0.0920

Supporting Functions

The modelGradients helper function calculates the gradients and loss for the network.

function [grad,loss,state] = modelGradients(net,X,T)
    [Y,state] = forward(net,X);
    loss = fcddLoss(Y,T);
    grad = dlgradient(loss,net.Learnables);
end

The fcddLoss helper function calculates the FDCC loss.

function loss = fcddLoss(Y,T)
    normalTerm = mean(Y,[1 2 3]);
    anomalyTerm = log(1-exp(-normalTerm));
    
    isNegative = T(1,:) == 1;
    loss = mean(single(isNegative) .* normalTerm-single(~isNegative) .* anomalyTerm);
end

The heatmapOverlay helper function overlays a colored heatmap hmap on a grayscale representation of the image img. Adjust the transparency of the heatmap overlay by specifying the alpha argument as a number in the range [0, 1].

function out = heatmapOverlay(img,hmap,alpha)

    % Normalize to the range [0, 1]
    img = mat2gray(img);
    hmap = mat2gray(hmap);
    
    % Convert image to a 3-channel grayscale representation
    imgGray = im2gray(img);
    imgGray = repmat(imgGray,[1 1 3]);
    
    % Convert heatmap to an RGB image using a colormap
    cmap = parula(256);
    hmapRGB = ind2rgb(gray2ind(hmap,size(cmap,1)),cmap);
    
    % Blend results
    hmapWeight = alpha;
    grayWeight = 1-hmapWeight;
    out = im2uint8(grayWeight.*imgGray + hmapWeight.*hmapRGB);
end

References

[1] Liznerski, Philipp, Lukas Ruff, Robert A. Vandermeulen, Billy Joe Franks, Marius Kloft, and Klaus-Robert Müller. "Explainable Deep One-Class Classification." Preprint, submitted March 18, 2021. https://arxiv.org/abs/2007.01760.

[2] Ruff, Lukas, Robert A. Vandermeulen, Billy Joe Franks, Klaus-Robert Müller, and Marius Kloft. "Rethinking Assumptions in Deep Anomaly Detection." Preprint, submitte, May 30, 2020. https://arxiv.org/abs/2006.00339.

[3] Simonyan, Karen, and Andrew Zisserman. "Very Deep Convolutional Networks for Large-Scale Image Recognition." Preprint, submitted April 10, 2015. https://arxiv.org/abs/1409.1556.

[4] ImageNet. https://www.image-net.org.

See Also

| (Deep Learning Toolbox) | (Deep Learning Toolbox) | (Deep Learning Toolbox) | (Deep Learning Toolbox)

Related Topics