Finding 'duplicate' rows which have different values in the same pattern
Mostra commenti meno recenti
For my purposes,
[1 2 2 3 1] is equivalent to
[2 3 3 1 2].
What can I do to have MATLAB recognize and eliminate these 'duplicates?'
Physically, I have 17 populations that I want to group into proportionate voting districts. One population can be 'split' to elect two representatives, and populations can be combined to elect one or more representatives, but a split population cannot be combined with any other population.
I'm using RAND to create all possible groupings, storing unique groupings in C. Unfortunately, C grows too large and I run out of memory. I can reduce the storage requirement by storing only "useful" combinations (some are very disproportionate, and I could check and drop those possibilities within the loop). However, a bigger memory problem is that one type of 'duplicate' is not recognized by UNIQUE, that is, rows that match in their pattern but with entries of different values.
I later plan to sort the list by standard deviation in population of each group, and to evaluate the top 50 or so possibilities on qualitative factors.
I would also appreciate any big-picture ideas you have for solving my problem, particularly for accommodating the possibility that populations or groups of populations could be permitted to elect more than one representative.
I initially tried to tackle this problem as a methodical, iterative grouping, but switched to the random approach after realizing the scope of the possibilities because I only need to run this code once and my coding time is limited while my processing time is not.
C = zeros(0,17);
i = 0;
while i ~= 3
sizea = size(C,1);
blockrows = ceil(rand(100000,17).*17);
C = unique([C;blockrows],'rows');
sizeb = size(C,1);
if sizeb-sizea == 0
i = i+1;
else
i = 0;
end
end
I am in R2012A, but can access a lab with R2014a if beneficial.
Risposte (2)
Why not sort each row before passing them to unique?
tempC = [C; blockrows];
[~, idx] = unique(sort(tempC, 2), 'rows');
C = tempC(idx, :);
4 Commenti
Anthony
il 13 Nov 2014
Not sure what you mean about positioning. If as you describe in your original question, [1 2 2 3 1] is equivalent to [2 3 3 1 2] then which is the right order?
In any case, my solution only uses the sorting to find which rows to retain. These rows are then returned in their original ordering.
Your addendum about classification seems completely different to your original question. It now appears that similarity is now based on the difference between consecutive elements.
Is this what you want?
tempC = [C; blockrows];
dt = mod(diff(tempC, 1, 2)-1, 16); %-1 and 16 to shift to zero-based range.
[~, idx] = unique(dt, 'rows');
C = tempC(idx, :);
Note that in your example I don't understand why the last elements are equivalent. I would have thought 17 would have been equivalent to 8, not 4.
Anthony
il 13 Nov 2014
Guillaume
il 14 Nov 2014
Yes, sorry, I didn't pay enough attention that your original example wasn't just or reordering of numbers.
I believe my diff answer (with a bug corrected) does what you want though
[~, idx] = unique(mod(diff(C, 1, 2) - 1, 16), 'rows')
C = C(idx, :);
Star Strider
il 13 Nov 2014
0 voti
I am not certain what you’re doing, but consider using perms or one of its friends (links at the end of the documentation page) to generate your permutations.
Categorie
Scopri di più su Data Type Identification in Centro assistenza e File Exchange
Prodotti
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!