Speed up search for matching strings

4 visualizzazioni (ultimi 30 giorni)
Randy Hessel
Randy Hessel il 5 Giu 2021
Commentato: Randy Hessel il 5 Giu 2021
The following code works, but I would like to make if faster. Here, bowl_nodes and tmp_nodes are 1D arrays of the same size. Assume size(bowl_nodes) = size(tmp_nodes) = 1000. node_string is a string array. Each string in bowl_nodes matches with one and only one string in tmp_nodes. For example, the string in node_string( bowl_node(1) ) could be the same as the string in node_string( tmp_node(5) ). When matching strings are found, node_match stores the matching array indices and the element in tmp_nodes is removed. This way, every time the for-loop over m executes, it executes over fewer elements. When the code is done executing, tmp_nodes is empty. Can anyone recommend a way to speed up this code? The code's purpose is to fill up the node_match array, which links elements of bowl_nodes to elements of tmp_nodes and visa versa. (Note: tmp_nodes is a copy of a different array, so even though tmp_nodes gets removed by this code, the original array is still intact and is used later in the program.) Thanks.
for n= 1:size( bowl_nodes)
for m= 1:size( tmp_nodes)
if node_string( bowl_nodes(n) ) == node_string( tmp_nodes(m) )
node_match ( bowl_nodes(n) ) = tmp_nodes(m);
node_match ( tmp_nodes(m) ) = bowl_nodes(n);
tmp_nodes(m)= [];
break;
end
end
end

Risposta accettata

dpb
dpb il 5 Giu 2021
Only need one loop and I'd guess it'd be faster as written above if you didn't bother to remove the found elements--I suspect the memory reallocation is far more expensive than the extra looping/searching is--but I didn't try timing it to see.
The code snippet shown also doesn't preallocate the index array so it is being reallocated while going through the loop as well unless that is in the actual code but just not shown in the posting.
Try something like
node_match=arrayfun(@(s)find(matches(tmp_nodes,s)),bowl_nodes);
  1 Commento
Randy Hessel
Randy Hessel il 5 Giu 2021
dpb, thank you for your quick response. Interestingly, removing the found elements cut the run time approximately in half, when compared to not removing them. After some rethinking, this code was tried:
[~, bowl_indices]= sort( node_string( bowl_nodes) );
[~, opposed_indices]= sort( node_string( opposed_nodes) );
node_match( bowl_nodes( bowl_indices) )= opposed_nodes(opposed_indices);
node_match( opposed_nodes(opposed_indices) )= bowl_nodes( bowl_indices);
This code requires no loops and takes a fraction of a second to run. Now there is also no need to use the intermediate 'tmp_nodes' and the original array 'opposed_nodes' can be used directly. But thanks again for your repsonse, it is much appreciated.

Accedi per commentare.

Più risposte (0)

Categorie

Scopri di più su Data Type Identification in Help Center e File Exchange

Prodotti


Release

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by