Memory Usage and block proc
1 visualizzazione (ultimi 30 giorni)
Mostra commenti meno recenti
Mohammad Abouali
il 20 Ago 2014
Commentato: Ashish Uthama
il 22 Ago 2014
Hi,
Consider this command:
blockproc(inputImage.tif,blockSize,function_handle,
'UseParallel',true,
'Destination',BigTiff_Adapter);
As the function suggest the input image is a huge tiff image (44GB+). So, no way, I can load it onto memory on a machine that has 16GB memory in total.
The output of the function_handle has the same number of rows and columns of the input except it has only two layers of single precision value, so the output is a single precision floating point number which defines the computed value based on the input image. as I mentioned the input image would be 44GB+ 4 band Tiff image, where each band is uint8. Therefore, the output of the entire image would be twice that, i.e. 88GB+, (2 band x 32bits per pixel per band). So, it is also not possible to get the output as one matrix on a machine with total of 16GB memory.
Since, these are geolocated, I need to store it as tiff, and as the size of the image is too big I definitely need to write it as BigTiff; hence, I am using a customized BigTiff_Adapter. "blockSize" is set to tile size, (my input tiff image is tiled, 256x256 pixels per tile). So, pretty much one tile is loaded, processed via function_handle and then written to the output BigTiff tile by tile.
So, here is the question that I don't get it. Once I set "UseParallel" to false; everything works just fine. except that it takes a long time. You would think that using parallel proc should improve that. However, once I set the " UseParallel " to true, all my memory is used and it seems that it takes even slower to compute compared to the time that " UseParallel " is set to false. Literally, the serial version seems to be much faster.
If you are thinking of the communication cost, don't bother. The computation in the function_handle is so easy and it only needs the data within one pixel. So no communication at all. Let's say the the output pixel o(i,j)=K*reshape(I(i,j,:),[],1), where o(i,j) is the pixel on i-th row and j-th column of the output, K is a 1x3 matrix, and reshape(I(i,j,:),[],1) is a column vector of size 3x1 . As you can see, I don't need any information from the neighboring block or even neighboring pixels. So, absolutely no inter communication (seems a heaven for parallel proc). All this said, It is not the communication problem that slows down the parallel version, or better to say that the communication is not within function_handle (I don't know what MATLAB does under the hood).
Any idea why turning UseParallel increases the memory usage and causing slower calculation? On some machine I even get java.lang.OutOfMemoryError.
I have to add that once I use an image of let's say 8 or 9GB everything works just fine, both parallel set to true or false, doesn't produce any problem. But when I track the code, it appears that the memory usage goes up, it appears that the entire image seems to be loaded. I am controlling the number of workers, it changes between 4 and 12. If each block is assigned to one worker, then 12 block of 256x256x4 must be loaded at any time, i.e. 3MB, my output should be twice that so the output should be 6MB, and There are not that many intermediate variable during computation, but let's say they also take another 24MB. So, practically I shouldn't see more than 40MB memory usage, but I see at least 5 6 GB of memory usage.
So, what's going on?
0 Commenti
Risposta accettata
Più risposte (1)
Ashish Uthama
il 21 Ago 2014
blockproc will not read the full image, it ought to only read one tile each. However, since your file system is serial, the final write will have to have to be done in a serial fashion to your storage device. If you have large number of workers, processing each block in short time, then the one process which is trying to serialize this to the final output could be the main bottleneck. In this extreme case, the problem is IO bound rather than compute bound, so parallel processing might actually hurt. I know this explanation might not really 'help'.
Is there more processing you need to do on this file? You might see a parallel advantage if you can combine all your future processing into one block processing function. That way you might even out the IO/Compute balance.
5 Commenti
Ashish Uthama
il 22 Ago 2014
That makes sense to me. The overall system memory uptick might be the OS caching some parts of the file (read ahead?). In the parallel case, the master worker is the 'stitcher', so the increase and fluctuations make sense - its probably buffering completed tiles while waiting for the disk output to go through. (I hope you already use SSD's/RAID array or have the budget for it!)
Vedere anche
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!