Best way to deal with large data for deep learning?

5 visualizzazioni (ultimi 30 giorni)
Hi, I have been trying image classification with CNNs. I have some 350,000 images that I read and stored in a 4D matrix of size (170 x 170 x 3 x 350,000) in a data.mat file. I used matfile to keep adding new images to my data.mat file. The resultant file is almost 20GB
The problem now is that I cannot access the saved images because I run out of memory.
Do anyone have any suggestions for more efficient ways to build large data for deep learning?
One solution I can apply is to split the data and train two networks one with weights initialized by the others final weights, but I don't want to take that route!
  2 Commenti
KSSV
KSSV il 22 Giu 2016
You want to process the whole data (170 x 170 x 3 x 350,000) at once or you are using only one matrix (170X170X3) at one step?
Mona
Mona il 22 Giu 2016
Yes, I am classifying the images using a CNN
trainNetworkm(Xtrain, Ytrain, opt)
Where Xtrain is supposed to contain all the training examples. So yes, I wish to pass the entire (170 X 170 X 3 X 350,000) to the network!

Accedi per commentare.

Risposta accettata

Mona
Mona il 22 Giu 2016
Ok, I found a way around it. Instead of reading/writing the images to .mat files, I used imageDatastore.
So what I did is, I processed all my images (resized them to 200 x 200 then took random crops of 170 x 170) and then wrote all the processed images to .jpg files.
Then, I used imageDatastore as:
imds = imageDatastore('F:\All_train_images','IncludeSubfolders',true,...
'FileExtensions','.jpg','LabelSource', 'foldernames');
and finally trained the network with
trainNetworkm(imds,layers,opt)
turned out that writing images to .jpg files is even faster and consumes less memory on disk than saving the .mat image files .
Thanks Dr. Siva Srinivas Kolukula for attempting to help!

Più risposte (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by