Optimising LSTM training on GPU for sequence classification

5 visualizzazioni (ultimi 30 giorni)

Leo Nunnink il 9 Ago 2018

1
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/414306-optimising-lstm-training-on-gpu-for-sequence-classification

Commentato: Joss Knight il 16 Ago 2018

I'm classifying time sequences using LSTM. I have a massive dataset and training is unfeasibly slow despite using a high performance GPU with 11Gb RAM (1080Ti). The GPU is only running about 20% utilisation most of the time and I suspect transfer from memory is slowing it down. I've moved the 5Gb input array (cell array of time series arrays) to the GPU using nndata2gpu. I can't move the target/response array to the GPU as it is a categorical column, as required with an LSTM layer with 'OutputMode' of 'Last'. When I attempt to train the network it won't recognise the response variable - generates an error message that 'a column vector of categorical data' is required (which is what the array is). When I use the same response array without moving the input array to the GPU first, the network trains fine (albeit slowly). So is there a way of training a network on the GPU, using training data stored on the GPU while the response array is in RAM? Or is there another way to train a multi-feature sequence classification network with a response variable that is in a format that can reside on the GPU?

5 Commenti
Mostra 3 commenti meno recentiNascondi 3 commenti meno recenti

Paul Siefert il 16 Ago 2018

Glad to hear you were able to increase the speed. My comment aimed to point out, that your mainboard could use only x8 lanes of x16 lanes in the slot where your 1080 sits, if you have some other PCI card in another/the second PCIe x16 slot. (i.e. it's 1x16 or 2x8) You can detect the available lanes with GPUz. All the best

Joss Knight il 16 Ago 2018

Can you please give some example code because this could be a bug, but equally, if you are still using nndata2gpu, it is just a misunderstanding about how to move data to the device.

If you are getting better efficiency increasing the minibatch size then it means that you have a relatively small and fast network so if you don't process enough data at once, the performance characteristics are dominated by the overheads of the MATLAB interpreter, file I/O and so forth. Note that as a general rule, if you increase the minibatch size you can increase the learning rate in proportion. This should give faster convergence.

Accedi per commentare.

Accedi per rispondere a questa domanda.