Counting occurance of exact word from string array

12 visualizzazioni (ultimi 30 giorni)
Hi,
I am having trouble counting how many times a certain word appears in a string. I need to count the 'exact match' only, ignoring substrings. I tried 'count' and some other options but couldn't figure this out. The strings are chemical equations and the words are chemical species. Here's an example:
R1 = ["Ar* + Ar* "," Ar+ + Ar + e "]; % chemical reaction (string array; col1=reactants, col2=products)
S = {'Ar';'Ar*';'Ar+';'e'}; % species (cell array)
I want to get two variables as a result (below).
NumR = [0, 2, 0, 0]; % Number of occurance of each species as reactants
NumP = [1, 0, 1, 1]; % Number of occurance of each species as products
% each column represent number of occurance of Ar, Ar*, Ar+, e respectively
I tried 'count' function but this gives me a wrong result. For instance, if I try to count the number of occurance of 'Ar', Matlab also counts 'Ar+' and 'Ar*' as 'Ar' even though they are different species. How do I ignore substrings (i.e. 'Ar+' and 'Ar*' in this case)?
Thank you!

Risposta accettata

Stephen23
Stephen23 il 4 Ott 2020
>> C = {'Ar* + Ar* ',' Ar+ + Ar + e '};
>> S = {'Ar';'Ar*';'Ar+';'e'};
>> rgx = strcat('(?<=\s|^)',regexptranslate('escape',S),'(?=\s|$)');
>> NumR = cellfun(@numel,regexp(C{1},rgx))
NumR =
0
2
0
0
>> NumP = cellfun(@numel,regexp(C{2},rgx))
NumP =
1
0
1
1
  2 Commenti
Tae Lim
Tae Lim il 4 Ott 2020
Thank you for your reponse. This works great! Thank you. Could you explain further how 'rgx' works? I do not follow the logic behind it.
Stephen23
Stephen23 il 4 Ott 2020
Modificato: Stephen23 il 4 Ott 2020
"Could you explain further how 'rgx' works?"
rgx is a cell array of regular expressions, one for each corresponding cell in S.
A simple break-down of the regular expression, where XXX is the escaped string from S:
'(?<=\s|^)XXX(?=\s|$)'
'(?<=\s|^) ' % lookbehind: must be space or start of string
' XXX ' % string from S (special characters are escaped first)
' (?=\s|$)' % lookahead: must be space or end of string
These regular expressions are used by regexp to search the strings in C. When a substring matches this regular expression, regexp returns the indices. Then cellfun is used to count those indices (i.e. how many matches).

Accedi per commentare.

Più risposte (1)

KSSV
KSSV il 4 Ott 2020
R1 = ["Ar* + Ar* "," Ar+ + Ar + e "]; % chemical reaction (string array; col1=reactants, col2=products)
s = "Ar" ;
n = nnz(strfind(R1,s))
  1 Commento
Tae Lim
Tae Lim il 4 Ott 2020
Thank you for your response. I actually get an error saying that I have either missing or incorrect argument for nnz. The 'strfind(R1,s)' gives me 1x2 cell array. Is there a way to fix this?

Accedi per commentare.

Categorie

Scopri di più su Chemistry in Help Center e File Exchange

Prodotti

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by