Converting parallel CPU processing into GPU processing

11 visualizzazioni (ultimi 30 giorni)
I am trying to convert code that ran in parallel on CPU cores into parallel processing on the gpu.
I would like to process matrices in a cell array on the GPU in parallel for how many cores are present on the gpu. However, it performs significantly slower than on a parallel CPU processor of 4 cores (25 cells processed in 30 minutes on 4 CPU cores, 5 cells is currently taking over 45 minutes to process on GPU and is still not finished). I'm very new to GPU computing and nothing seemed really obvious on how to speed this up.
GPU properties:
Data to be processed:
  • series is a 568x1 cell array
  • each cell is a 60x60 double (each entry is a value between -1 and 1)
Start processing
tic % test
for i = 1:5
cell_array{i} = gpuArray(cleanSeries{i});
end
Determine size of matrix within the first cell, equivalent to number of biological cells recorded
numCells = gpuArray(length(cell_array{1}));
Preallocate arrays for data
clust_mean = gpuArray(NaN(length(cell_array{1}),length(cell_array)));
clust_std = gpuArray(NaN(length(cell_array{1}),length(cell_array)));
clust_random_mean = gpuArray(NaN(length(cell_array{1}),length(cell_array)));
clust_random_std = gpuArray(NaN(length(cell_array{1}),length(cell_array)));
Initiate the processing
parfor cellNumber = 1:length(cell_array)
threshold_clust = gpuArray(NaN(numCells,100));
random_clust = gpuArray(NaN(numCells,100));
% process data over varying proportional thresholds starting at 25%
% strongest to fully connected (%100) at 25% steps i.e. 25%, 50%, 75%,
% 100%
for threshold = 25:25:100
threshold_matrix = (threshold_proportional(cell_array{cellNumber}, threshold/100)); % proportional threshold matrix - custom function
% clustering requires that all values be between 0 and 1 so remove
% any negatives
threshold_matrix(threshold_matrix < 0) = 0;
% ensure that randomizing the matrix is possible
[rowi,coli] = find(tril(threshold_matrix));
bothi = [rowi coli];
c = bothi(1,1);
d = bothi(1,2);
e=find(c==bothi);
f=find(d==bothi);
if length(e)==length(bothi)||length(f)==length(bothi)
disp(['One cell has all the connections, skipping ', int2str(threshold), '% threshold.'])
threshold_clust(:,threshold) = NaN(numCells,1);
random_clust(:,threshold) = NaN(numCells,1);
elseif length(bothi) <=3
threshold_clust(:,threshold) = NaN(numCells,1);
random_clust(:,threshold) = NaN(numCells,1);
else
% create random matrix - custom function
random_matrix = latmio_und(threshold_matrix,1000);
% clustering coefficient per matrix - custom function
threshold_clust(:,threshold) = clustering_coef_wu(threshold_matrix);
random_clust(:,threshold) = clustering_coef_wu(random_matrix);
end % if logic end
end % for loop end
% concatenate over thresholds
clust_mean(:,cellNumber) = mean(threshold_clust,2,'omitnan');
clust_std(:,cellNumber) = std(threshold_clust,0,2,'omitnan');
clust_random_mean(:,cellNumber) = mean(random_clust,2,'omitnan');
clust_random_std(:,cellNumber) = std(random_clust,0,2,'omitnan');
end % parfor loop end
gather(clust_mean);
gather(clust_std)
gather(clust_random_std);
gather(clust_random_mean);
toc
  6 Commenti
Douglas Miller
Douglas Miller il 12 Mar 2022
According to the other post, it sounds like running in parallel isn't feasible on the GPU the way I was hoping. But I had never considered that zeros would process quicker. That will definitely help optimize the code. Thank you so much!
Walter Roberson
Walter Roberson il 12 Mar 2022
For operations other than pure copying, NaN has to go through a special "Abort" path in all calculations; calculations with it cannot stream the normal way. There also has to be special checking to see if the NaN is a "signalling NaN" as signalling NaN are required to raise exceptions whenever they occur.
inf cannot readily stream either... but I guess a bit more readily than NaN.

Accedi per commentare.

Risposte (1)

Matt J
Matt J il 12 Mar 2022
Modificato: Matt J il 12 Mar 2022
I would like to process matrices in a cell array on the GPU in parallel for how many cores are present on the gpu.
No, GPU cores cannot act like parpool workers. They are a completely different animal.

Prodotti


Release

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by