How to extract specific data from a .txt file

Hi Everyone,
I am trying to extract certain data which is location specific from a .txt file. The text file is not uniformly arranged & contains all kind of characters.
In my case I am looking to extract value of Fx & Fy in row 17 of the FILE.txt file. I have also attached the .txt file.
Looking forward to get it resolved.
Thank you & Reagrds,

 Risposta accettata

type FILE.TXT % show the contents of FILE.TXT, for reference
******************************************************************************* BCASE=1, RPM: 0.1000E+05 FREQ: 0.1667 [Hz] PS,PA: 2.758 2.758 [bar] ******************************************************************************* ------------------------------------------------- Option (32): Ex,Ey: Centered or Off-centered seal ------------------------------------------------- +-----------------------------------------------------------------------------+ |HsealH=> SOLN.FOUND IN 86 Iter | | Ex=0.050,Ey=-.050; Ec=0.071; Ax= 0.0000E+00,Ay= 0.0000E+00 ZO= 0.1511E-01| | rpm= 0.10000E+05 PS= 0.27579E+06 PA= 0.27579E+06 Cseal= 0.00000E+00 | +-----------------------------------------------------------------------------+ | Exo=0.05000, Eyo=-.05000 Preload d/C=0.00000 | | Mass Flow= 0.22167E-06 Kg/s = 0.13300E-04 Kg/min | | Mass Flow/Rhos= 0.16729E-04 lt/min | +-----------------------------------------------------------------------------+ | FX= 0.21993E+03 N ; FY= 0.21822E+03 N LOAD= 0.30982E+03 N;Angle= 224.78D| +-----------------------------------------------------------------------------+ | MX=-0.11444E-01 Nm; MY= 0.11704E-01 Nm | +-----------------------------------------------------------------------------+ | Torque on Film Lands= 0.42679E+00 N-m | +-----------------------------------------------------------------------------+ ...............................................................................
data = readlines('FILE.TXT'); % read the file
Here's one way to get the values of FX and FY from line 17:
line_to_get = 17;
result = regexp(data{line_to_get},'(\w+)= ?([\.\dE+-]+)','tokens');
result = vertcat(result{:})
result = 4×2 cell array
{'FX' } {'0.21993E+03'} {'FY' } {'0.21822E+03'} {'LOAD' } {'0.30982E+03'} {'Angle'} {'224.78' }
result = str2double(result(ismember(result(:,1),{'FX','FY'}),2))
result = 2×1
219.9300 218.2200
The same regular expression can be used to get other stuff out of the file too:
line_to_get = 10;
result = regexp(data{line_to_get},'(\w+)= ?([\.\dE+-]+)','tokens');
result = vertcat(result{:})
result = 6×2 cell array
{'Ex'} {'0.050' } {'Ey'} {'-.050' } {'Ec'} {'0.071' } {'Ax'} {'0.0000E+00'} {'Ay'} {'0.0000E+00'} {'ZO'} {'0.1511E-01'}
line_to_get = 11;
result = regexp(data{line_to_get},'(\w+)= ?([\.\dE+-]+)','tokens');
result = vertcat(result{:})
result = 4×2 cell array
{'rpm' } {'0.10000E+05'} {'PS' } {'0.27579E+06'} {'PA' } {'0.27579E+06'} {'Cseal'} {'0.00000E+00'}
You would use str2double to convert what you need from the second columns of those cell arrays into numbers.

7 Commenti

Or use the same regular expression to parse what you can from the file all at once:
fid = fopen('FILE.TXT');
data = fread(fid,'*char').';
fclose(fid);
result = regexp(data,'(\w+)= ?([\.\dE+-]+)','tokens');
result = vertcat(result{:})
result = 23×2 cell array
{'BCASE'} {'1' } {'Ex' } {'0.050' } {'Ey' } {'-.050' } {'Ec' } {'0.071' } {'Ax' } {'0.0000E+00' } {'Ay' } {'0.0000E+00' } {'ZO' } {'0.1511E-01' } {'rpm' } {'0.10000E+05' } {'PS' } {'0.27579E+06' } {'PA' } {'0.27579E+06' } {'Cseal'} {'0.00000E+00' } {'Exo' } {'0.05000' } {'Eyo' } {'-.05000' } {'C' } {'0.00000' } {'Flow' } {'0.22167E-06' } {'Rhos' } {'0.16729E-04' } {'FX' } {'0.21993E+03' } {'FY' } {'0.21822E+03' } {'LOAD' } {'0.30982E+03' } {'Angle'} {'224.78' } {'MX' } {'-0.11444E-01'} {'MY' } {'0.11704E-01' } {'Lands'} {'0.42679E+00' }
And then put everything into a struct of parameter values:
result(:,2) = num2cell(str2double(result(:,2)));
result = result.';
parameters = struct(result{:})
parameters = struct with fields:
BCASE: 1 Ex: 0.0500 Ey: -0.0500 Ec: 0.0710 Ax: 0 Ay: 0 ZO: 0.0151 rpm: 10000 PS: 275790 PA: 275790 Cseal: 0 Exo: 0.0500 Eyo: -0.0500 C: 0 Flow: 2.2167e-07 Rhos: 1.6729e-05 FX: 219.9300 FY: 218.2200 LOAD: 309.8200 Angle: 224.7800 MX: -0.0114 MY: 0.0117 Lands: 0.4268
Style notes:
The technical difference between using fileread() or fopen/fread/fclose is that fileread() always treats the input as "characters", but that fread() only treats the input as "characters" by default or when you use a 'char' format item. fread() would give you the option of reading as bytes instead of as characters or to designate a different character encoding. It would be uncommon to need this, but it can happen, especially if the original is ISO-8859-* other than ISO-8859-1 .
For large files, explicit fopen() and read as uint8->char can potentially be faster, as doing so can avoid the overhead of having MATLAB scan a ways into the file to try to figure out what the character encoding is -- though I see that these days fileread() supports an encoding parameter.
For smaller files... the convenience of fileread() usually wins out over fopen()/fread()/fclose()
Thanks, Walter. Good points.
Another advantage of fileread() over fread() is that fileread() returns a row vector of characters whereas fread() returns a column vector, so transposing the result in order to be able to use any function that expects char vectors to be rows (like regexp, strfind or whatever) is not necessary with fileread().
To me, avoiding that transpose alone is worth using fileread(); I'll have to remember it next time I'm about to fread() a text file.
data = fread(fid, [1 inf], '*char');
saves doing the transpose.
Of course! Very good.
Hi Voss,
Your method seems really helpful. Can you explain this line
result = regexp(data{line_to_get},'(\w+)= ?([\.\dE+-]+)','tokens');
Also, I tried it on line 18, but it didn't work. Can you please explain? Thank you!
result = regexp(data{line_to_get},'(\w+)= ?([\.\dE+-]+)','tokens');
Uses regexp to match the regular expression, '(\w+)= ?([\.\dE+-]+)'.
Here's a breakdown of that regular expression:
(\w+)= ?([\.\dE+-]+)
% ^^^ match 1 or more alphabetic, numeric, or underscore character(s)
% ^ followed by an equal sign
% ^^ followed by an optional space
% ^^^^^^^^^^ followed by one or more: periods (\.), decimal digits (\d), capital "E"s, plus signs (+), and/or minus signs (-)
% ^ ^ use parentheses to capture and return the parameter name
% ^ ^ use parentheses to capture and return the parameter value
Line 18 looks like this:
+-----------------------------------------------------------------------------+
so it should not match that regular expression. In other words, it's not surprising that regexp returned no matches with that line.
Note that you can use that same regular expression on the contents of the entire file to capture all the parameter names and values, as I showed in the first comment to my answer.

Accedi per commentare.

Più risposte (1)

filename = 'FILE.TXT';
S = fileread(filename);
XY = str2double(regexp(S, '(?<=F[XY]= )\S+', 'match'))
XY = 1×2
219.9300 218.2200

3 Commenti

Hi Walter,
Thank you for your kind response.
Can you please explain this following line:
XY = str2double(regexp(S, '(?<=F[XY]= )\S+', 'match'))
Thank you!
[XY] matches something that is either X or Y. F[XY]= matches F then either X or Y then = then space. The (?<= ) part means that the FX= or FY= must be present in the input stream and to position immediately after that, but that the text is not to be included in what is returned from the function. It positions the input stream but it does not extract anything from the input stream.
The \S+ matches any number of non-whitespace characters.
So the code looks for FX= or FY= in the input stream, and extracts the next column to be returned from the function. This searching is repeated as long as there are more occurances in the input stream, so both FX and FY would be extracted.
The return from regexp() with 'match' is going to be a cell array of character vectors. Those character vectors are passed to str2double() to be converted to numeric form.
A B C FX= 1325.3 Q= -5 FY= -23.2 P ZFY= 83
The expression as coded does not know to look for whitespace before the variable, so the ZFY= would be matched as being an occurance of FY= . So the regexp() would return {'1325.3', '-23.2', '83'} in this particular case. That could be adjusted if it mattered (but it doesn't matter to you.)
Okay thank you!
I highly appreciate your efforts. It makes a lot of sense now

Accedi per commentare.

Categorie

Scopri di più su Environment and Settings in Centro assistenza e File Exchange

Prodotti

Richiesto:

il 31 Gen 2023

Commentato:

il 1 Feb 2023

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by