Azzera filtri
Azzera filtri

Efficiently Deleting Matrix Columns/Rows

23 visualizzazioni (ultimi 30 giorni)
Any tips on efficiently and quickly deleting a given row/column from a Matrix?
I had initially believed that deleting the last column of a given matrix would be more efficient than the first column, and all column operations would be more efficient than row operations (given MATLAB's column based memory), which I was able to confirm through testing. However, the performance I did get was rather unfortunate.
someB = rand(4,50000);
someC = someB.';
tic
while size(someB,2) > 2
someB(:,size(someB,2)) = [];
end
toc
tic
while size(someC,1) > 2
someC(size(someC,1),:) = [];
end
toc
%Elapsed time is 13.869280 seconds.
%Elapsed time is 10.198270 seconds.
I did a quick search and in this thread I found hope that through external C MEX functions there may indeed be a way to efficiently delete the last column of a matrix quickly. The code is attached below.
#include "mex.h"
// You may need to uncomment the next line
//#define mwSize int
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
mwSize n;
if( n = mxGetN(prhs[0]) )
mxSetN(prhs[0], n - 1);
}
However, I was not able to get said code running myself. If you take a quick look at the results that the author was finding, you'll find rather remarkable performance. I'm not that good at MEX myself; would anyone know how to fix above code so that it runs, or alternatively, have an equally/near equally good MEX code/MATLAB code performance-wise?
Thanks!
  1 Commento
James Tursa
James Tursa il 15 Giu 2016
Any operation at the m-file level that causes the number of elements in a variable to change will cause a data copy to take place, which can chew up your performance if the variable is large. The time taken to do this can dominate your run times.

Accedi per commentare.

Risposta accettata

James Tursa
James Tursa il 15 Giu 2016
The above code still works fine for me in R2015a Win32. Maybe you might change the mwSize to size_t, but unless the dimension is really large I don't see how this could make a difference. E.g. on my machine:
>> format debug
>> M = reshape(1:24,4,6)
M =
Structure address = 8a4fa88
m = 4
n = 6
pr = 33126c60
pi = 0
1 5 9 13 17 21
2 6 10 14 18 22
3 7 11 15 19 23
4 8 12 16 20 24
>> M_copy = M
M_copy =
Structure address = 8a4fd60
m = 4
n = 6
pr = 33126c60
pi = 0
1 5 9 13 17 21
2 6 10 14 18 22
3 7 11 15 19 23
4 8 12 16 20 24
>> deletelastcolumn(M)
>> M
M =
Structure address = 8a4fa88
m = 4
n = 5
pr = 33126c60
pi = 0
1 5 9 13 17
2 6 10 14 18
3 7 11 15 19
4 8 12 16 20
>> deletelastcolumn(M)
>> M
M =
Structure address = 8a4fa88
m = 4
n = 4
pr = 33126c60
pi = 0
1 5 9 13
2 6 10 14
3 7 11 15
4 8 12 16
>> M_copy
M_copy =
Structure address = 8a4fd60
m = 4
n = 6
pr = 33126c60
pi = 0
1 5 9 13 17 21
2 6 10 14 18 22
3 7 11 15 19 23
4 8 12 16 20 24
So things still seem to be working OK. There are no hacks in this simple code ... just an in-place dimension change. The data pointer stays the same throughout as expected. And the in-place dimension change on M does not affect the dimension of the shared data copy M_copy which has the same data pointer. Again, all is as expected. So I don't see why it would not work on any MATLAB version. What exactly are you seeing?
  6 Commenti
James Tursa
James Tursa il 16 Giu 2016
Glad to help. FYI, the reason the sdc version gets around the apparent internal protections on mxSetN is because the shared data copy we created is of type TEMPORARY (created inside the mex routine) instead of type NORMAL (incoming from the argument list). This variable type is a field of the mxArray itself, so maybe mxSetN (and friends like mxSetM) simply check that field and return without doing anything if the variable type is NORMAL. This is just a guess on my part. The undocumented API function mxCreateSharedDataCopy has been around for a long time. TMW tried to get rid of it a couple of years ago but got inundated with complaints from mex programmers who used it, so they put it back in the library and hopefully will keep it there in the future. It is an extremely useful function for advanced mex programmers who know what they are doing.
Jan
Jan il 2 Apr 2019
Some years later: It is confusing, that the MEX function to remove the last column is compared with some code, which deletes the first row:
for j=1:ncols-1
Mslow(1,:) = [];
end
Processing rows is much more expensive.

Accedi per commentare.

Più risposte (1)

Gianluca Tabella
Gianluca Tabella il 26 Mar 2019
Modificato: Gianluca Tabella il 26 Mar 2019
I experienced the same disappointment meanwhile running a while-loop on a large matrix (10000x8000) which, in the end, should have been around 8x8000 after removing some specific rows.
What I did was this:
Q=[1 1 1 2 3 4 3 3 2 1 1 2 3 4 4 5 6 6 6 6 3 3 2 3 4];
no_thr=length(Q);
big_matrix=rand(no_thr);
i=2;
j=2;
big_matrix_new(1,:)=big_matrix(1,:); % this row is necessary since the counters start from "2" becuase the condition compare an i_th element with the i-1_the element so the first row of the matrix would not be considered
while i<=no_thr
if Q(i)==Q(i-1) % a certain condition that, if true, removes an element of an array
Q(i)=[];
no_thr=length(Q);
else % the previous condition, if true, should remove also the i_th row of a matrix.
% what I did instead is, if the condition is false, to copy that row in a new matrix.
big_matrix_new(j,:)=big_matrix(i,:);
j=j+1;
i=i+1;
end
end
big_matrix=big_matrix_new; % the just created matrix containing only the wished rows can be copied so that the old name can be used
clearvars final_result_new % we can clear the matrix created in the loop so that the only matrix remaining is the short matrix having the original name
Briefly, what I did was, instead of removing the unwanted rows, copying the wanted rows in a just created matrix that substitutes the original matrix.
This method made my script usable.
  2 Commenti
Steven Lord
Steven Lord il 26 Mar 2019
If possible, try identifying and recording the rows to be included or deleted inside the loop then actually change what's in the loop afterwards.
M = magic(20)
rowsToKeep = false(size(M, 1), 1);
for therow = 1:size(M, 1)
if M(therow, 1) > 200
rowsToKeep(therow) = true;
end
end
M2 = M(rowsToKeep, :) % If you don't need M after this line,
% M = M(rowsToKeep, :) % you can use this instead
M2 contains only those rows from M whose first element is greater than 200. Alternately:
M = magic(20);
rowsToDiscard = true(size(M, 1), 1);
for therow = 1:size(M, 1)
if M(therow, 1) > 200
rowsToDiscard(therow) = false;
end
end
M3 = M; % Keep M around in case you want to use it later to check
M3(rowsToDiscard, :) = [] % If you don't need to keep M around,
% M(rowsToDiscard, :) = [] % you can use this instead
Check:
isequal(M2, M3) % true
Jan
Jan il 2 Apr 2019
Modificato: Jan il 2 Apr 2019
@Gianluca Tabella: It looks, like your code is overly complex. Wouldn't this do the same:
index = 1 + sum(diff(Q)~=0);
result = big_matrix(1:index, :);
This would avoid the iterative growing or shrinking of the arrays, which is a standard problem for the performance.
By the way, are you sure that this code does what you want? It counts, how many time neighboring elements of Q have different values and copies one more columns from the input to the output. My first guess was, that you want:
index = [true, diff(Q)~=0];
result = big_matrix(index, :);
to copy the rows from the input, at which the value of Q changes.

Accedi per commentare.

Prodotti

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by