Deleting duplicates based on conditions of multiple columns

Question

0 voti

Hi,

I have a large dataset (100m rows x 40 columns ) and I would like to delete any row that has duplicates on a few specific columns. See example below:

A = [1 10 4; 1 10 4; 1 11 5; 1 11 5; 1 12 6; 1 12 7; 1 13 8; 2 4 25; 2 10 28; 2 10 28; 3 5 33; 4 25 23; 4 23 24];

I would like to delete all rows where the three columns have duplicate within each specific column. So in this example, row 2, 4 and 9 would be deleted because e.g.

row 1 and 2 have duplicates in each of the three columns and so I'd want to delete one of the two (doesn't matter which one).

I suspect the answer is somewhere along the use of unique and logical indexing but haven't managed to figure it out. Any help would be much appreciated. (I'm using Matlab 2018b)

Thanks

3 Commenti
Mostra 1 commento meno recente Nascondi 1 commento meno recente

Nick il 28 Dic 2020

Thanks for this but unfortunately, this would work for this sample only I think. The actual dataset has 40 columns and i'd like to remove the rows based on the dupicates of 3 columns only, rather than all.

Nick il 28 Dic 2020

Apri in MATLAB Online

Just found the answer. This way you can find the unique rows amongst a number of columns (in this case, columns 1, 2 and 3) and then produce the original table without the duplicate values.

[C,ia] = unique(A(:,1:3),'rows')
A_new = A(ia,:)

Accedi per commentare.

Accedi per rispondere a questa domanda.

Follow Question

Answer 1

Nick il 28 Dic 2020

0 voti

[C,ia] = unique(A(:,1:3),'rows')

A_new = A(ia,:)

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Accedi per commentare.

Answer 2

Akash kumar il 31 Lug 2022

Apri in MATLAB Online

0 voti

% With Index Number:- Shows the which index or Row value is extract from
% the A Matrix. I thinks, It can help you.
A = [1 10 4; 1 10 4; 1 11 5; 1 11 5; 1 12 6; 1 12 7; 1 13 8; 2 4 25; 2 10 28; 2 10 28; 3 5 33; 4 25 23; 4 23 24]';
[B index]=unique(AA(1:3,:).','rows', 'stable')
B = 10×3
     1    10     4
     1    11     5
     1    12     6
     1    12     7
     1    13     8
     2     4    25
     2    10    28
     3     5    33
     4    25    23
     4    23    24
index = 10×1
     1
     3
     5
     6
     7
     8
     9
    11
    12
    13

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Accedi per commentare.

Deleting duplicates based on conditions of multiple columns

3 Commenti
Mostra 1 commento meno recente Nascondi 1 commento meno recente

Risposta accettata

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Più risposte (1)

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Categorie

Prodotti

Tag

Community Treasure Hunt

Deleting duplicates based on conditions of multiple columns

3 Commenti Mostra 1 commento meno recente Nascondi 1 commento meno recente

Risposta accettata

0 Commenti Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Più risposte (1)

0 Commenti Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Categorie

Prodotti

Tag

Vedere anche

Community Treasure Hunt

3 Commenti
Mostra 1 commento meno recente Nascondi 1 commento meno recente

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti