speed up for loop

Question

Mina Mino il 7 Apr 2023

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/1943344-speed-up-for-loop

Commentato: 埃博拉酱 il 9 Apr 2023

Hi everone. I have a problem with my code which was written in the following form. it take a long time and it is too time consuming.

In this code, I need to read 700 .hgt files with the size of 3600*3600 and make distincts 3*3 matrix from each of them.

I woulde be thankful if anyone could help me. Thanks in advance for your attenuation and help.

clc
clear all
close all
format long g
fid=fopen('list2.txt','r');
if fid==-1
    disp('there is an error')
else
end
S = textscan(fid,'%s');
fclose(fid);
i=1;
fnames= S{i,i};
tic
NA=[];
for fiile=1:700
    varargout = readhgt(fnames{fiile});
    LAT=varargout.lat;LAT(end)=[];
    LON=varargout.lon;LON(end)=[];
    Z=varargout.z;Z(:,end)=[];
    Z(end,:)=[];
    %        Z_vec=reshape(Z,[12967201,1]);
    ROUGH=[];
    for row=1:3:3597
        for col=1:3:3597
            A=[Z(row,col) Z(row,col+1) Z(row,col+2);Z(row+1,col) Z(row+1,col+1) Z(row+1,col+2);Z(row+2,col) Z(row+2,col+1) Z(row+2,col+2)];
            Dif=(double(A)-mean(mean(A))).^2;
            rough=sqrt(mean(mean(Dif)));
            ROUGH=[ROUGH;mean([LAT(row) LAT(row+1) LAT(row+2)]) mean([LON(col) LON(col+1) LON(col+2)]) rough];
        end 
    end 
    NA=[NA;ROUGH];
end 
toc

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

actually the following part takes the most of time. in this part I am exploiting dependent 3*3 matrixs from a file with 3600*3600 elements.

for row=1:3:3597
    for col=1:3:3597
        A=[Z(row,col) Z(row,col+1) Z(row,col+2);Z(row+1,col) Z(row+1,col+1) Z(row+1,col+2);Z(row+2,col) Z(row+2,col+1) Z(row+2,col+2)];
        Dif=(double(A)-mean(mean(A))).^2;
        rough=sqrt(mean(mean(Dif)));
        ROUGH=[ROUGH;mean([LAT(row) LAT(row+1) LAT(row+2)]) mean([LON(col) LON(col+1) LON(col+2)]) rough];
    end 
end 

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

埃博拉酱 il 7 Apr 2023

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/1943344-speed-up-for-loop#answer_1211609

Modificato: 埃博拉酱 il 7 Apr 2023

Apri in MATLAB Online

When it comes to file I/O operations, parfor is usually not very helpful because the disk is always single-threaded. In contrast, eliminating for loops with vectorized methods can effectively improve performance.

fid=fopen('list2.txt');
if fid==-1
	error('there is an error');
end
S = textscan(fid,'%s');
fclose(fid);
fnames= S{1};
tic
NA=cell(700,1);
for file=1:700
	varargout = readhgt(fnames{file});
	Z=varargout.z(1:3600,1:3600);
	Z=std(reshape(Z',3,size(Z,1)/3,3,size(Z,2)/3),1,[1,3]);
	[LAT,LON]=meshgrid(mean(reshape(varargout.lat(1:3600),3,[]),1),mean(reshape(varargout.lon(1:3600),3,[]),1));
	NA{file}=[LAT(:),LON(:),Z(:)];
end
NA=vertcat(NA{:});
toc

Above I assume your files all contain 3601×3601 Z data. You can do more corrections and optimizations with your real data.

6 Commenti
Mostra 4 commenti meno recentiNascondi 4 commenti meno recenti

Mina Mino il 7 Apr 2023

Modificato: Walter Roberson il 8 Apr 2023

Apri in MATLAB Online

thank for your nice help. it really work for my code. I have the same problem with the following for. it is too time consuming. could you please help me with this as well?thank in advance for your consideration.

%%
% begining the code%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
FT=[];step=18/110;
parfor  d=1:12*10^(6) 
    isamap_36=find(DOY==day_of_year(d));
    isamap_9=find(DOY_R==day_of_year(d));
    
    Lon_d=Lon(isamap_36);
    Lat_d=Lat(isamap_36);
    VWC_SMAP_d=VWC(isamap_36);
    roughness_d=roughness(isamap_36);
    TM_d=TM(isamap_36);
    SM_d=SM_36(isamap_36);
    
    LAT_R_d=LAT_R(isamap_9);
    LON_R_d= LON_R(isamap_9);
    VWC_R_9_d=VWC_R(isamap_9);  
    SM_R_9_d=SM_R(isamap_9);
    ROUGH_R_9_d=ROUGH_R(isamap_9);
    
    if (~isempty(isamap_36) && ~isempty(isamap_9))
        
        lon_smap_9_index= find(LON_R_d <=lon_cy(d)+step & LON_R_d>=lon_cy(d)-step);
        lat_smap_9_index= find(LAT_R_d <=lat_cy(d)+step & LAT_R_d>=lat_cy(d)-step);
        intsec_smap_9=intersect(lon_smap_9_index,lat_smap_9_index); 
        VWC_SMAP_g_9=VWC_R_9_d(intsec_smap_9);roughness_g_9=ROUGH_R_9_d(intsec_smap_9);SM_g_9=SM_R_9_d(intsec_smap_9);
        
        lon_smap_36_index= find(Lat_d <=lat_cy (d)+step & Lat_d>=lat_cy(d)-step);
        lat_smap_36_index= find(Lon_d <=lon_cy(d)+step & Lon_d>=lon_cy(d)-step);
        intsec_smap_36=intersect(lon_smap_36_index,lat_smap_36_index); 
        
        VWC_SMAP_g_36=VWC_SMAP_d(intsec_smap_36);roughness_g_36=roughness_d(intsec_smap_36);TM_d_36=TM_d(intsec_smap_36) ;SM_g_36=SM_d(intsec_smap_36);
        
        point_1=[lat_cy(d) lon_cy(d)];
        ptCloud_1=[latitu longi];
        idx_land  = knnsearch(ptCloud_1,point_1);
        
        ptCloud_2=[clay(:,1) clay(:,2)];
        idx_clay  = knnsearch(ptCloud_2,point_1);
        
        
        if (~isempty(intsec_smap_36)&& ~isempty(intsec_smap_9)) 
            
            FT=[day_of_year(d) lon_cy(d) lat_cy(d) inc_cy(d) ref_cy(d) mean(VWC_SMAP_g_9) mean(roughness_g_9) mean(SM_g_9) mean(VWC_SMAP_g_36) mean(roughness_g_36) mean(TM_d_36) mean(SM_g_36) clay(idx_clay,3) land_cover(idx_land)];
            S=FT;
            parsave(sprintf('output%d.mat', d),S );
        end
    end
end

Walter Roberson il 8 Apr 2023

When it comes to file I/O operations, parfor is usually not very helpful because the disk is always single-threaded

Disk controllers can have multiple DMA lanes. The operating system might be single threaded in disk I/O at some level, but it can launch a DMA and then return -- some multiple DMA can be happening simultaneously. The controllers have room for several commands and buffers, and for spinning drives will actively re-order commands to improve performance. Meanwhile, each controller can be handling multiple drives, each of which can support queuing of multiple commands. Each drive can be DMA'ing with its respective controller. The connection to the individual drives does not necessarily support multiple DMA lanes.

So one connection at a time between a drive and a controller, but the controller is dealing with multiple drives simultaneously, and is quite possibly handling multiple communications simultaneously -- DMA'ing into multiple buffers getting ready to flip them to the O/S for example.

Does having multiple drives per controller channel help? Maybe. In a RAID kind of scenario, especially if you have mirrored drives then multiple drives on the channel can lead to opportunities to dispatch an I/O operation to whichever drive will be ready for it first. But on one channel, you might not be able to have multiple drives transferring data in the same direction.

This leads to the general principle that it is typically safe and productive to have one drive per controller channel, and multiple controllers, that the infrastructure can be doing simultaneous I/O in such circumstances.

For any particular drive, the general principle is that having 2 to 3 requests queued at the same time is often optimal, allowing the controller to reorder the requests according to what is spinning by.

On high performance systems, it might be practical to have one I/O stream per drive head .

Beyond that.. for any one file, unless you go to a lot of trouble, you tend to get conflict trying to do multiple I/O with the same file, as the same head might be needed (because it is on the same track). But different files tend to be in different tracks.

... which is to say that having a modest number of different files per drive being accessed by parallel workers can improve performance, and as you spread the load out over different drives you can definitely improve performance. The single threading is not as bad of a restrictions as you might think.

(Single threading can be necessary in order to provide "atomic" operations, such as the fact that operating systems must guarantee that file renames must at all user-accessible times either give the old name or the new name, that there is no user-accessible case in which both names or neither name is accessible.)

埃博拉酱 il 9 Apr 2023

@Walter Roberson For multiple files uniformly distributed across multiple drives, parallel I/O can theoretically improve performance, but this may require elaborated load balancing algorithms for these drives, which is very tricky. Judging from the code shown by OP, this is probably far beyond his programming ability.

Accedi per commentare.

Answer 2

chicken vector il 7 Apr 2023

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/1943344-speed-up-for-loop#answer_1211529

Apri in MATLAB Online

Try parallel computing:

% Get CPU's numebr of cores:
[~, numberOfCores] = evalc('feature(''numcores'')');
% Start parallel pool:
parpool(numberOfCores)
% Parallelise <for> loop:
parfor file = 1 : 700
    
   % dostuff 
end

Just be carefull and read the documentation about how to save restults from parfor iteratively, as you need to pre-allocate the output variable and you will have some other constraints.

I suggest you to read the documentation and see some examples.

3 Commenti
Mostra 1 commento meno recenteNascondi 1 commento meno recente

chicken vector il 7 Apr 2023

Modificato: chicken vector il 7 Apr 2023

Apri in MATLAB Online

The first advice that I can give you is, as Matlab suggests you, initialise your variable ROUGH before the for loop.

This is particularly important when dealing with big dataset because at each loop iteration the size of ROUGH is changed and the RAM of your computer needs to re-allocate the data inside it to have the bits representing ROUGH close to each other.

Now imagine how time expensive this could be when you do it (3597/3)^2 times, when you could save this time by "reserving" your RAM spots before the loop by substituting:

ROUGH = [];

With:

ROUGH = zeros(3597, 3);

The second advice is to use vectorisation as much as possible because it is much faster than using for loops.

Vectorisation can be seen as a sort of parallel computation rather than in sequence.

Consider as an example that I want to obtain the element-wise difference between two matrices. The following 3 methods obtain the same result, but performances are quite different:

N = 500;
x = rand(N);
y = rand(N);
% For loop without initial allocation:
tic
z = [];
for i = 1 : N
    for j = 1 : N
        z(i,j) = x(i,j) - y(i,j);
    end
end
T1 = toc
% For loop with initial allocation:
tic
z = zeros(N);
for i = 1 : N
    for j = 1 : N
        z(i,j) = x(i,j) - y(i,j);
    end
end
T2 = toc
% Vectorisation:
tic
z = x - y;
T3 = toc

On my computer I obtain:

T1 = 0.0147
T2 = 0.0064
T3 = 0.0005

To exploit vectorisation in your code, try to have a look at reshape.

First thing that comes to my mind is to re-organise your Z, LAT, LON variables into matrices of size [3, (3597^2)/3] so that you can already vectorise along one dimension.

Nontheless this might be a bit complex so I suggest you to begin with the initialisation of ROUGH.

In my code saves half the computational time with a 500x500 matrix which only has 2 % of the elements you have in a 3597x3597 matrix, since it scales with the square of the dimension.

TLDR:

Try this:

ROUGH = zeros(3597,3);
select = (0:2);
current_rough_row = 0;
for row = 1 : 3 : N
    for col = 1 : 3 : N
        current_rough_row = current_rough_row + 1;
        A = Z(row + select, col + select);
        Dif = (A - mean(A,'all')).^2;
        rough = sqrt(mean(Dif,'all'));
        ROUGH(current_rough_row,:) = [mean(LAT(row + select)) mean(LON(col + select)) rough];
    end
end

And do the same with NA:

NA = zeros(3597*700, 3); % Instead of: NA = [];
NA[700*(fiile-1)+1 : 700*fiile] = ROUGH; % Instead of NA = [NA;ROUGH];

Mina Mino il 7 Apr 2023

thanks for you response and help.

Accedi per commentare.

speed up for loop

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposta accettata

6 Commenti
Mostra 4 commenti meno recentiNascondi 4 commenti meno recenti

Più risposte (1)

3 Commenti
Mostra 1 commento meno recenteNascondi 1 commento meno recente

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

speed up for loop

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposta accettata

6 Commenti Mostra 4 commenti meno recentiNascondi 4 commenti meno recenti

Più risposte (1)

3 Commenti Mostra 1 commento meno recenteNascondi 1 commento meno recente

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

6 Commenti
Mostra 4 commenti meno recentiNascondi 4 commenti meno recenti

3 Commenti
Mostra 1 commento meno recenteNascondi 1 commento meno recente