Azzera filtri
Azzera filtri

Unique function not deleting duplicate rows.

14 visualizzazioni (ultimi 30 giorni)
luc
luc il 4 Mag 2015
Risposto: Robert il 17 Ott 2018
attached my matrix "M" and here is my code.
[trash,idx] = unique(M,'rows');
pleb=M(idx,:)
gg=sort(pleb)
When inspecting gg we see that there are still duplicate rows.
I've also tried to do it in different ways, for example;
[~, III, ~] = unique(M,'first','rows'); %removing double points
III = sort(III);
pleb = M(III,:);
gg=sort(pleb);
But they either delete non duplicate data, or delete too few data.
What am I doing wrong?
  2 Commenti
Stephen23
Stephen23 il 4 Mag 2015
Modificato: Stephen23 il 4 Mag 2015
"What am I doing wrong": not clicking on both buttons to attach the data: you need to first click Choose file and then Attach file. Please try attaching your data again.
luc
luc il 4 Mag 2015
I attached it the right way now.

Accedi per commentare.

Risposta accettata

Stephen23
Stephen23 il 4 Mag 2015
Modificato: Stephen23 il 4 Mag 2015
It is likely that the data are floating point and that they are not actually equal, which confuses many beginners and people not used to working with numeric data. Although what is displayed on the command window might look the same, floating point values can differ at the low end of their significand, so testing for equality (like unique does) does not work.
To understand more about this topic read these:
Alternatively, if the data are strings, trailing spaces are often overlooked by users...
  8 Commenti
luc
luc il 4 Mag 2015
Thanks Stephen,
I think the sorting part had me confused.
nice explanation, I learned something.
:)
Stephen23
Stephen23 il 5 Mag 2015
@luc: I'm glad to be able to help!

Accedi per commentare.

Più risposte (3)

Titus Edelhofer
Titus Edelhofer il 4 Mag 2015
Hi Luc,
I don't see duplicate data, but the data change sign ...? Take last 4 rows of pleb and it's
19.4558 -4.1355 -2.0906
19.4558 -4.1355 2.0906
19.4558 4.1355 -2.0906
19.4558 4.1355 2.0906
Look similar but all 4 are completely different - as long as -2.0906 is different from 2.0906 ;-).
Similar for the other "4-row-blocks".
When you take the abs then the story is different,
Titus
  3 Commenti
Stephen23
Stephen23 il 4 Mag 2015
Modificato: Stephen23 il 4 Mag 2015
@luc: There is no reason why those rows would be removed, as
  • all rows of M are already unique
  • sort(M) sorts each column independently, so there is no reason why these rows should be unique (or removed) either.
You need to actually describe what you are trying to achieve.
Titus Edelhofer
Titus Edelhofer il 4 Mag 2015
Modificato: Titus Edelhofer il 4 Mag 2015
Indeed. As I wrote as comment, if you would sort keeping rows as rows, i.e., using
sortrows(M)
then you would see, that there are no duplicate rows.

Accedi per commentare.


John D'Errico
John D'Errico il 4 Mag 2015
Modificato: John D'Errico il 4 Mag 2015
There are NO equal rows. I checked. They are different in sign. There are no rows that are even that close to each other, although the nearest neighbor is not uniformly close.
The check that I made was to find the point for each row that was closest in distance. I.e., the nearest neighbor. There ARE no essentially zero distances.
The overall closest pair of points are 1.7291 units apart.
Mu = unique(M,'rows');
D = ipdm(Mu,'subset','smallestfew','limit',1)
D =
(87,95) 1.7291
D = ipdm(Mu,'subset','nearest')
D =
(2,1) 4.1811
(1,2) 4.1811
(13,3) 4.1811
(14,4) 4.1811
(6,5) 4.1811
(5,6) 4.1811
(8,7) 4.1811
(7,8) 4.1811
(15,9) 4.1811
(16,10) 4.1811
(17,11) 4.1811
(18,12) 4.1811
(3,13) 4.1811
(4,14) 4.1811
(9,15) 4.1811
(10,16) 4.1811
(11,17) 4.1811
(12,18) 4.1811
(26,25) 4.1811
(25,26) 4.1811
(28,27) 4.1811
(27,28) 4.1811
(35,29) 4.1811
(36,30) 4.1811
(37,31) 4.1811
(38,32) 4.1811
(19,33) 4.1811
(20,34) 4.1811
(29,35) 4.1811
(30,36) 4.1811
(31,37) 4.1811
(32,38) 4.1811
(21,39) 4.1811
(22,40) 4.1811
(23,41) 4.1811
(24,42) 4.1811
(33,47) 4.1811
(53,47) 3.3826
(55,47) 3.3826
(34,48) 4.1811
(54,48) 3.3826
(56,48) 3.3826
(43,49) 4.1811
(44,50) 4.1811
(45,51) 4.1811
(46,52) 4.1811
(39,53) 4.1811
(47,53) 3.3826
(40,54) 4.1811
(48,54) 3.3826
(41,55) 4.1811
(42,56) 4.1811
(58,57) 4.1811
(57,58) 4.1811
(60,59) 4.1811
(59,60) 4.1811
(69,61) 4.1811
(70,62) 4.1811
(71,63) 4.1811
(72,64) 4.1811
(49,65) 4.1811
(91,65) 3.3826
(50,66) 4.1811
(92,66) 3.3826
(51,67) 4.1811
(93,67) 3.3826
(52,68) 4.1811
(94,68) 3.3826
(61,69) 4.1811
(62,70) 4.1811
(63,71) 4.1811
(64,72) 4.1811
(79,77) 1.7291
(81,77) 1.7291
(80,78) 1.7291
(82,78) 1.7291
(77,79) 1.7291
(78,80) 1.7291
(73,83) 4.1811
(74,84) 4.1811
(75,85) 4.1811
(76,86) 4.1811
(95,87) 1.7291
(96,88) 1.7291
(97,89) 1.7291
(98,90) 1.7291
(65,91) 3.3826
(83,91) 4.1811
(105,91) 3.3826
(66,92) 3.3826
(84,92) 4.1811
(106,92) 3.3826
(67,93) 3.3826
(85,93) 4.1811
(107,93) 3.3826
(68,94) 3.3826
(86,94) 4.1811
(108,94) 3.3826
(87,95) 1.7291
(101,95) 1.7291
(88,96) 1.7291
(102,96) 1.7291
(89,97) 1.7291
(103,97) 1.7291
(90,98) 1.7291
(104,98) 1.7291
(99,101) 4.1811
(100,102) 4.1811
(109,105) 4.1811
(110,106) 4.1811
(111,107) 4.1811
(112,108) 4.1811
(113,109) 4.1811
(114,110) 4.1811
(115,111) 4.1811
(116,112) 4.1811
(121,117) 4.1811
(122,118) 4.1811
(123,119) 4.1811
(124,120) 4.1811
(117,121) 4.1811
(118,122) 4.1811
(119,123) 4.1811
(120,124) 4.1811
(126,125) 4.1811
(125,126) 4.1811
(133,127) 1.7291
(134,128) 1.7291
(135,129) 1.7291
(136,130) 1.7291
(132,131) 4.1811
(131,132) 4.1811
(127,133) 1.7291
(128,134) 1.7291
(129,135) 1.7291
(130,136) 1.7291
(139,137) 1.7291
(140,138) 1.7291
(137,139) 1.7291
(138,140) 1.7291
(145,141) 4.1811
(149,141) 3.3826
(146,142) 4.1811
(150,142) 3.3826
(147,143) 4.1811
(151,143) 3.3826
(148,144) 4.1811
(152,144) 3.3826
(153,145) 4.1811
(154,146) 4.1811
(155,147) 4.1811
(156,148) 4.1811
(141,149) 3.3826
(165,149) 4.1811
(142,150) 3.3826
(166,150) 4.1811
(143,151) 3.3826
(167,151) 4.1811
(144,152) 3.3826
(168,152) 4.1811
(159,157) 3.3826
(177,157) 4.1811
(160,158) 3.3826
(178,158) 4.1811
(157,159) 3.3826
(179,159) 4.1811
(158,160) 3.3826
(180,160) 4.1811
(169,161) 4.1811
(170,162) 4.1811
(171,163) 4.1811
(172,164) 4.1811
(181,165) 4.1811
(182,166) 4.1811
(183,167) 4.1811
(184,168) 4.1811
(161,169) 4.1811
(162,170) 4.1811
(163,171) 4.1811
(164,172) 4.1811
(174,173) 4.1811
(173,174) 4.1811
(176,175) 4.1811
(175,176) 4.1811
(189,177) 4.1811
(190,178) 4.1811
(191,179) 4.1811
(192,180) 4.1811
(193,185) 4.1811
(194,186) 4.1811
(195,187) 4.1811
(196,188) 4.1811
(185,193) 4.1811
(186,194) 4.1811
(187,195) 4.1811
(188,196) 4.1811
(198,197) 4.1811
(197,198) 4.1811
(200,199) 4.1811
(199,200) 4.1811
(205,201) 4.1811
(206,202) 4.1811
(207,203) 4.1811
(208,204) 4.1811
(201,205) 4.1811
(202,206) 4.1811
(203,207) 4.1811
(204,208) 4.1811
(210,209) 4.1811
(209,210) 4.1811
(212,211) 4.1811
(211,212) 4.1811
  6 Commenti
Sean de Wolski
Sean de Wolski il 4 Mag 2015
First, your screenshot is too small to see.
Second, here's a good exercise to explain the small differences in floating point: Run this:
>> format hex
Then rerun the command. See! They're different, even if just by a little.
luc
luc il 4 Mag 2015
Hey Sean,
U can click on the screenshot to enlarge it.
But I think Stephen solved my problem. The sort functions grabs each colums independant, and not as a whole.
Thanks guys!

Accedi per commentare.


Robert
Robert il 17 Ott 2018
If anyone encounters truly duplicate rows in the output of unique like I did, this may be caused by NaN in your data being treated as distinct values. See this question for more info.

Categorie

Scopri di più su Creating and Concatenating Matrices in Help Center e File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by