Why is my compressed Convolutional Neural Network (CNN) showing poor performance after projection in MATLAB R2023b?

Question

MathWorks Support Team il 4 Apr 2024

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/2107806-why-is-my-compressed-convolutional-neural-network-cnn-showing-poor-performance-after-projection-in

Risposto: MathWorks Support Team il 16 Apr 2024

Risposta accettata: MathWorks Support Team

I have trained a convolutional neural network (CNN) and a fully connected network (FCN) on the same task. I would like to apply projection to reduce the size and computation needed for these networks. Then I would like to fine-tune these networks after projection to ensure that their performance is high.
I referenced this documentation page to understand how projection works:
https://www.mathworks.com/company/technical-articles/compressing-neural-networks-using-network-projection.html
and referenced this documentation example while projecting my models:
https://www.mathworks.com/help/releases/R2023b/deeplearning/ug/compress-neural-network-using-projection.html
Here is the documentation for the projection function I am using:
https://www.mathworks.com/help/releases/R2023b/deeplearning/ref/compressnetworkusingprojection.html
I find that the projected and fine-tuned CNN experiences large performance degradation compared to the unprojected CNN even with a small learnable reduction during projection, as measured by the loss function of the network. This is a problem because I cannot use the projected and fine-tuned CNN with such low performance. I find that the projected and fine-tuned FC network does not exhibit such a large performance degradation as compared to the unprojected FC network.
I also find that, during the CNN fine-tuning process, the training and validation losses diverge.
Is the performance degradation of the projected and fine-tuned CNN expected?

Accedi per rispondere a questa domanda.

Answer 1

MathWorks Support Team il 4 Apr 2024

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/2107806-why-is-my-compressed-convolutional-neural-network-cnn-showing-poor-performance-after-projection-in#answer_1442546

The projection operation acts on a layer by performing the underlying operation in a lower dimensional space, with that space chosen by projection matrices that analyze statistics of the data. The rank of these projection matrices is directly governed by the choice of "ExplainedVarianceGoal" or equivalently "LearnablesReductionGoal" argument to the "compressNetworkUsingProjection" function. The larger the "LearnablesReductionGoal", the smaller the rank of the projection matrices and thus the more information you are throwing away in this projected layer operation. The CNN and FCN are fundamentally different architectures and the projection compression will be exploring different lower dimensional spaces and different projected operations within the layers.

It is expected though at a high level, i.e. for a small "LearnablesReductionGoal", the accuracy retention of the networks is high, but as this increases, accuracy degrades. The observation that you are seeing more degradation in the CNN for the same "LearnablesReductionGoal" as the FCN could purely be a consequence of the CNN being less projection compressible and having less redundancies present than the MLP. A very crude indication would be the number of learnable parameters in each network. I would imagine the FCN has far more than the CNN trained for the same task described above which may well lead to more success with projection. But there are also other factors like the fact that a convolution operation different to a fully connected operation, i.e., a convolution can be thought of as a fully connected operation with a symmetry imposed on the weight matrix.

However, there are several ways you could try and improve the performance for the CNN (and in general any network you are trying to project):

When projecting, the "compressNetworkUsingProjection" function has a "LayerNames" name-value argument. By default, all layers are projected together which, for several layers compressed simultaneously, can degrade performance as compared to layers being projected and compressed incrementally. One thing to try is to take the largest convolutional layer in terms of number of learnable parameters, project this layer first by specifying that layer in the "LayerNames" argument, and then fine-tune to retrain. Then repeat for the next largest layer and so on. This will fine-tune layer by layer which could help stabilize the compression. You may also identify with this approach any layers that are causing the most problems in accuracy degradation. This would indicate a layer that has poor projectability and it would be advised to not project that one. Note that not all layers have to or should be compressed in general.