How to convert text line into numbers

4 visualizzazioni (ultimi 30 giorni)
Wisam
Wisam il 21 Set 2014
Commentato: Wisam il 22 Set 2014
I am trying to read this text and put it in a vector, some of the elements must be repeated according to the numbers before * symbol, for example the first five elements should have a value of 10 and so on:
5*10 6*10.65 4.82 5*10.65 6*10.91 12.62 6*10.91 6*10.74 11.51 5*10.74 6*16.57
perm_i=[];
fid=fopen(file_name_out);
textscan(fid, '%s', 1, 'delimiter', '\n', 'headerlines', row_permi_start-1);
for j=1:row_permi_end-row_permi_start
c=textscan(fid, '%s', 1, 'delimiter', '\n');
astring=cell2mat(c{1});
ind1=find(astring=='*');
ind_temp=[];
if ~isempty(ind1)
for k=1:length(ind1)
indspace=find(astring==' ');
indspace1=indspace(indspace<ind1(k));
display (indspace);
if isempty(indspace1)
indspace1=0;
else
indspace1=indspace1(end);
end
display (indspace1);
num_loc(k)=length(indspace1)+1;
indspace1=indspace1(end);
display (indspace1);
num_1(k)=str2double(astring(indspace1+1:ind1(k)-1))-1;
ind_temp=[ind_temp,indspace1+1:ind1(k)];
display (num_loc);
end
astring(ind_temp)=[];
end
acell=textscan(astring,'%f');
var_temp=acell{1,1};
if ~isempty(ind1)
var_temp_1=var_temp;
for k=1:length(ind1)
var_temp(num_loc(k)+num_1(k) :end+num_1(k))=var_temp(num_loc(k):end);
var_temp(num_loc(k)+1:num_loc(k)+num_1(k))=var_temp(num_loc(k));
display (var_temp);
num_loc=num_loc+num_1(k);
end
  2 Commenti
John
John il 21 Set 2014
I have not tried the above solutions/suggestions, but this is a natural job for regular expressions. MATLAB, the most versatile numerical computing package, provides extensive regular expression (regex) functionality. It does not have the utility of Perl, but there are enough regex varieties in MATLAB to collapse those loops into a few lines of regex code.
To get you started on regex in MATLAB:
Some of the regex functions you will likely have to use to craft a concise solution: regexp, regexprep
You will have to do a bit of reading and practising to get the hang of it. To give you an idea of how regex can serve you in parsing and manipulating the string, consider these few lines of code which give you the starting indices of the tokens -whether they have a multiplier prepended or not- you would probably want to manipulate:
myString = '5*10 6*10.65 4.82 5*10.65 6*10.91 12.62 6*10.91 6*10.74 11.51 5*10.74 6*16.57'
regexQuery = '((\d+)\*)?\d+(\.\d+)?'
indices = regexp(myString, regexQuery)
% indices = 1 6 14 19 27 35 41 49 57 63 71
The elements of indices point to the starting indices of the tokens you would be interested in. To achieve the effect of repeating numbers prepended with multipliers, you would have to look into the more advanced features of 'regexprep'.
These, and not code that ordinarily parses string tokens, are more likely to give you graceful solutions that are maintainable and readable.
You may find MATLAB's string functions useful as well:
Wisam
Wisam il 22 Set 2014
I appreciate your support, thanks

Accedi per commentare.

Risposta accettata

Guillaume
Guillaume il 21 Set 2014
Modificato: Guillaume il 22 Set 2014
I've not looked at your code (which is badly formatted), but to convert your example into a vector of numbers I would do:
str = '5*10 6*10.65 4.82 5*10.65 6*10.91 12.62 6*10.91 6*10.74 11.51 5*10.74 6*16.57';
v = [];
for group = strsplit(str) %split string at spaces into groups
groupparts = strsplit(group{1}, '*'); %split group at * (if no *, no split)
if numel(groupparts) == 1
v = [v str2num(groupparts{1})];
else
v = [v repmat(str2num(groupparts{2}), 1, str2num(groupparts{1}))];
end
end
Or as I said in my comment to John's answer, if you want to use a regexprep one liner:
v = str2num(regexprep(str, '([^ ]+)\*([^ ]+)', '${repmat([$2 '' ''], 1, str2double($1))}'));

Più risposte (1)

John
John il 21 Set 2014
Modificato: John il 21 Set 2014
As mentioned before, regular expressions provide more intuitive solutions (once you get the hang of the basics). This short snippet below, which returns the answer as a numeric vector, seems to work:
input = '5*10 6*10.65 4.82 5*10.65 6*10.91 12.62 6*10.91 6*10.74 11.51 5*10.74 6*16.57';
regexQuery = '(?<pre>(\d+))?(\*)?(?<post>\d+(\.\d+)?)'
matches = regexp(input, regexQuery, 'names')
res = ''
for i = 1:size(matches, 2)
if (isempty(matches(i).pre))
matches(1).pre = 1;
end
res = [res repmat([' ' matches(i).post ' '], [1 str2num(matches(i).pre)])];
end
res = str2num(res)
It uses regexp once and the results of that in a simple loop that concatenates the nascent string. And I would consider this a crude solution (if it actually works :-) ) with a lot of superfluous code. My guess is that exploiting named captures and the command substitution functionality in regexprep could collapse all that into 2 or 3 commands.
  1 Commento
Guillaume
Guillaume il 22 Set 2014
Modificato: Guillaume il 22 Set 2014
I would argue that regular expressions are overkill in this case, considering you only need two strsplit, one to break the string at every space and one to break those split at the '*'.
You could indeed do it with a single line regexprep, but this involve a dynamic regular expression replacement string which is not particularly cheap in term of computation time (and not particularly easy to comprehend. For the record, the one liner is:
v = str2num(regexprep(str, '([^ ]+)\*([^ ]+)', '${repmat([$2 '' ''], 1, str2double($1))}'));
edit: On the other hand the regexprep is much faster than my strsplit solution.

Accedi per commentare.

Categorie

Scopri di più su Data Type Conversion in Help Center e File Exchange

Tag

Non è stata ancora inserito alcun tag.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by