Error executing of the example code for training a custom Mask R-CNN using cocodataset 2014

Question

Yi-Ping Hsueh il 23 Mar 2021

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/781148-error-executing-of-the-example-code-for-training-a-custom-mask-r-cnn-using-cocodataset-2014

Commentato: Claudia De Clemente il 26 Apr 2023

Risposta accettata: Yi-Ping Hsueh

Apri in MATLAB Online

I followed the instructions in "Instance Segmentation Using Mask R-CNN Deep Learning" (ref[1]).

All the code worked perfectly until the last section "Train network" (ref[2]).

 iteration = 1; 
    start = tic;
    
     % Create subplots for the learning rate and mini-batch loss
    fig = figure;
    [lossPlotter] = helper.configureTrainingProgressPlotter(fig);
    
    % Initialize verbose output
    helper.initializeVerboseOutput([]);
    
    % Custom training loop
    for epoch = 1:numEpochs
        reset(mbqTrain)
        shuffle(mbqTrain)
    
        while hasdata(mbqTrain)
            % Get next batch from minibatchqueue
            [X,gtBox,gtClass,gtMask] = next(mbqTrain);
        
            % Evaluate the model gradients and loss using dlfeval
            [gradients,loss,state] = dlfeval(@networkGradients,X,gtBox,gtClass,gtMask,dlnet,params);
            dlnet.State = state;
            
            % Compute the learning rate for the current iteration
            learnRate = initialLearnRate/(1 + decay*iteration);
            
            if(~isempty(gradients) && ~isempty(loss))    
                [dlnet.Learnables,velocity] = sgdmupdate(dlnet.Learnables,gradients,velocity,learnRate,momentum);
            else
                continue;
            end
            
            helper.displayVerboseOutputEveryEpoch(start,learnRate,epoch,iteration,loss);
                
            % Plot loss/accuracy metric
            D = duration(0,0,toc(start),'Format','hh:mm:ss');
            addpoints(lossPlotter,numdetectMaskRCNN,Iteration,double(gather(extractdata(loss))))
            subplot(2,1,2)
            title(strcat("Epoch: ",num2str(epoch),", Elapsed: "+string(D)))
            drawnow
            
            iteration = iteration + 1;    
        end
    
    end
    net = dlnet;
    
    % Save the trained network
    modelDateTime = string(datetime('now','Format',"yyyy-MM-dd-HH-mm-ss"));
    save(strcat("trainedMaskRCNN-",modelDateTime,"-Epoch-",num2str(numEpochs),".mat"),'net');

First, there is no "numdetectMaskRCNN" predefined.

I simply deleted it and reexecuted the section. It then showes the following error:

Error using nnet.internal.cnn.dlnetwork/forward (line 239)

Layer 'bn2a_branch2a': Invalid input data. The value of 'Variance' is invalid. Expected input to be positive.

Error in nnet.internal.cnn.dlnetwork/CodegenOptimizationStrategy/propagateWithFallback (line 122)

[varargout{1:nargout}] = fcn(net, X, layerIndices, layerOutputIndices);

Error in nnet.internal.cnn.dlnetwork/CodegenOptimizationStrategy/forward (line 62)

[varargout{1:nargout}] = propagateWithFallback(strategy, functionSlot, @forward, net, X, layerIndices, layerOutputIndices);

Error in nnet.internal.cnn.dlnetwork/DefaultOptimizationStrategy/propagate (line 143)

[varargout{1:nargout}] = inferenceMethod(strategy.CodegenStrategyOriginal,...

Error in nnet.internal.cnn.dlnetwork/DefaultOptimizationStrategy/forward (line 77)

[varargout{1:nargout}] = propagate(strategy, net, X, ...

Error in dlnetwork/forward (line 503)

[varargout{1:nargout}] = strategy.forward(net.PrivateNetwork, x, layerIndices, layerOutputIndices);

Error in networkGradients (line 21)

[YRPNRegDeltas, proposal, YRCNNClass, YRCNNReg, YRPNClass, YMask, state] = forward(...

Error in deep.internal.dlfeval (line 18)

[varargout{1:nout}] = fun(x{:});

Error in dlfeval (line 41)

[varargout{1:nout}] = deep.internal.dlfeval(fun,varargin{:});

I am wondering if there is anything I misunderstood so that the code doesn't work for me.

It will be of great help if this could be figured out or fixed. Thank you!

ref[1]: https://www.mathworks.com/help/deeplearning/ug/instance-segmentation-using-mask-rcnn.html

ref[2]: https://www.mathworks.com/help/deeplearning/ug/instance-segmentation-using-mask-rcnn.html#InstanceSegmentationUsingMaskRCNNDeepLearningExample-9

4 Commenti
Mostra 2 commenti meno recentiNascondi 2 commenti meno recenti

Yi-Ping Hsueh il 27 Mar 2021

Modificato: Yi-Ping Hsueh il 27 Mar 2021

Apri in MATLAB Online

I figured out a solution to this issue from other resource.

The problem comes from the negative value returned by "state". The original code is as below:

[gradients,loss,state] = dlfeval(@networkGradients,X,gtBox,gtClass,gtMask,dlnet,params);
dlnet.State = state;

Replace the last line (dlnet.State = state;) with the followings to ensure that all values assigned to "dlnet.State" are positive.

idx = dlnet.State.Parameter == "TrainedVariance";
boundAwayFromZero = @(X) max(X, eps('single'));
dlnet.State(idx,:) = dlupdate(boundAwayFromZero, dlnet.State(idx,:));

This will make the code work then.

But then I am now facing another problem. The training process takes so much time (days), probably because the network is really huge. I thought my GPU should be good enough but it turns out that even setting the mini-batch size to 2 requires more memory on GPU than what I have. For now, only cpu is capable of performing such computation.

My GPU is as follows:

                      Name: 'GeForce GTX 1080'
                     Index: 1
         ComputeCapability: '6.1'
            SupportsDouble: 1
             DriverVersion: 11.2000
            ToolkitVersion: 11
        MaxThreadsPerBlock: 1024
          MaxShmemPerBlock: 49152
        MaxThreadBlockSize: [1024 1024 64]
               MaxGridSize: [2.1475e+09 65535 65535]
                 SIMDWidth: 32
               TotalMemory: 8.5899e+09
           AvailableMemory: 7.4505e+09
       MultiprocessorCount: 20
              ClockRateKHz: 1771000
               ComputeMode: 'Default'
      GPUOverlapsTransfers: 1
    KernelExecutionTimeout: 1
    

Hope this information helps those who want to train their own mask R-CNN on MATLAB.

Yi-Ping Hsueh il 29 Mar 2021

OK! Thank you!

Claudia De Clemente il 26 Apr 2023

Hello, did you find a solution for the elevated computational cost? I am working with a self made dataset, a set of ca 8k images 256 x 256 x 3. I have estimated to need more than a week to complete 30 epochs, it's crazy...

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

Yi-Ping Hsueh il 29 Mar 2021

1
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/781148-error-executing-of-the-example-code-for-training-a-custom-mask-r-cnn-using-cocodataset-2014#answer_661654

Apri in MATLAB Online

(copied from my previous comment to myself...)

I figured out a solution to this issue from other resource.

The problem comes from the negative value returned by "state". The original code is as below:

[gradients,loss,state] = dlfeval(@networkGradients,X,gtBox,gtClass,gtMask,dlnet,params);
dlnet.State = state;

Replace the last line (dlnet.State = state;) with the followings to ensure that all values assigned to "dlnet.State" are positive.

idx = dlnet.State.Parameter == "TrainedVariance";
boundAwayFromZero = @(X) max(X, eps('single'));
dlnet.State(idx,:) = dlupdate(boundAwayFromZero, dlnet.State(idx,:));

This will make the code work then.

But then I am now facing another problem. The training process takes so much time (days), probably because the network is really huge. I thought my GPU should be good enough but it turns out that even setting the mini-batch size to 2 requires more memory on GPU than what I have. For now, only cpu is capable of performing such computation.

My GPU is as follows:

                      Name: 'GeForce GTX 1080'
                     Index: 1
         ComputeCapability: '6.1'
            SupportsDouble: 1
             DriverVersion: 11.2000
            ToolkitVersion: 11
        MaxThreadsPerBlock: 1024
          MaxShmemPerBlock: 49152
        MaxThreadBlockSize: [1024 1024 64]
               MaxGridSize: [2.1475e+09 65535 65535]
                 SIMDWidth: 32
               TotalMemory: 8.5899e+09
           AvailableMemory: 7.4505e+09
       MultiprocessorCount: 20
              ClockRateKHz: 1771000
               ComputeMode: 'Default'
      GPUOverlapsTransfers: 1
    KernelExecutionTimeout: 1