Matching combinations of strings

6 visualizzazioni (ultimi 30 giorni)
Marcus Glover
Marcus Glover il 17 Giu 2024
Modificato: DGM il 22 Giu 2024
I have a table TT with a string variable TT.name. I want to return true if TT.name matches any entry in another table variable OK.name. However, I have some complications I am having a hard time parsing.
Many of the strings in TT.name are combinations of strings that appear in OK.name. I want to include these as a true match. Sometimes they have a + symbol, sometimes just a space. Further complicating matters, the table OK contains some entries with spaces, and if they do I want to treat them as an entire entry, and not break them up at the spaces.
I believe I will usually have a combination of 2 strings only, though 3 and 4 may be possible.
TT = table(["Green"; "Red"; "Blue"; "Black Blue"; "Black"; "Blue Green"; "Red + Blue"; "Red Orange"; "Red + White"; "Black Blue Red"], 'VariableNames', {'name'})
TT = 10x1 table
name ________________ "Green" "Red" "Blue" "Black Blue" "Black" "Blue Green" "Red + Blue" "Red Orange" "Red + White" "Black Blue Red"
OK = table(["Red"; "Green"; "Blue"; "Black Blue"], 'VariableNames', {'name'})
OK = 4x1 table
name ____________ "Red" "Green" "Blue" "Black Blue"
This is the output I would want, but not by manually changing rows 6 and 7:
TT.match=ismember(TT.name,OK.name);
TT.match([6 7 10])=1
TT = 10x2 table
name match ________________ _____ "Green" true "Red" true "Blue" true "Black Blue" true "Black" false "Blue Green" true "Red + Blue" true "Red Orange" false "Red + White" false "Black Blue Red" true
In the example, "Blue Green" and "Red + Blue" are true matchs, because "Blue" "Green" and "Red" all appear as entries in OK.name.
SImilarly, "Black Blue Red" is ok because it is a combination of "Black Blue" and "Red"
"Black" is not a match, because the only entry in OK.name is "Black Blue" and I do not want to separate the words from this table.
"Red Orange" and "Red + Orange" are not matches because only "Red" is in the OK table.
  2 Commenti
Stephen23
Stephen23 il 18 Giu 2024
Modificato: Stephen23 il 18 Giu 2024
The task is ill-defined, and most likely impossible in a general sense: this is due to the same delimiters being used to separate words in OK as well as to separate combinations from TT. Consider:
TT = "black blue" + "red" -> "black blue red"
OK = ["black", "blue red"]
Also note that a naive approach considering all permutations of OK will quickly become intractable.
Questions:
  • what size is OK ?
  • what size is TT ?
Marcus Glover
Marcus Glover il 18 Giu 2024
Modificato: Marcus Glover il 18 Giu 2024
I think the size of OK (~250) is indeed going to make this intractable. (TT is hundreds of thousands of entries) The solution is to fix the issue with delimiters in the data.

Accedi per commentare.

Risposte (1)

Umar
Umar il 18 Giu 2024
Hi Marcus,To achieve this, you can use a combination of string manipulation functions and logical comparisons in MATLAB. Here's a step-by-step approach to solving this problem: 1. Iterate through each row in the `TT.name` table. 2. For each row, split the string into individual words based on spaces or the "+" symbol. 3. Check if each individual word exists as an entry in the `OK.name` table. 4. If all words in the split string are found in the `OK.name` table, consider it a match. 5. Update the `TT.match` column accordingly. Here's some MATLAB code that implements this logic: ```matlab TT.match = false(size(TT, 1), 1); for i = 1:size(TT, 1) words = strsplit(TT.name{i}, {' ', '+'}); match_count = sum(ismember(words, OK.name)); if match_count == numel(words) TT.match(i) = true; end end ``` By following these steps, you can efficiently handle combinations of strings and spaces within the `TT.name` table and accurately identify matches based on the entries in the `OK.name` table. This approach ensures that you can automatically identify true matches without manually changing rows, as demonstrated in your desired output example. Additionally, it considers multiple strings combinations while respecting the specific conditions outlined for matching entries.
  9 Commenti
Umar
Umar il 22 Giu 2024
Apology accepted
DGM
DGM il 22 Giu 2024
Modificato: DGM il 22 Giu 2024
It's okay. You're still free to think of me as a jerk. I mean, it's fair. Just please try to work on the formatting and stuff.
FWIW, also if you don't have MATLAB, I'm pretty sure you can use MATLAB Online for free for something like 20h a month. It doesn't have as many toolboxes installed as the forum editor, but it does allow the use of certain things (interactive tools) that the forum editor can't use.

Accedi per commentare.

Categorie

Scopri di più su Startup and Shutdown in Help Center e File Exchange

Prodotti


Release

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by