Read data from .txt with regexp

I want to load numeric data from a .txt file, which contains both string a numeric and contains non standard arranging. I am attaching an example file for you to see.
The data is divided in sections, but all the numeric data I am interested in have the same structure: '->' + 'some string' + ':' + 'numeric value with format %.4f' (Lines in the .txt not containing the desired expression should not be considered).
I tested different manners of doing so, with readfile, testscan etc., however due to the complexitu of the input file I could not reach my objective.
I think the most appropiate way to try to solve the problem is by using regexp and properly telling matlab which is the expression it has to look for in the .txt file.
I am not familarised with regexp and cannot properly understand how to code the proper expression for me to work, and would greatly appreaicate any help with that.
Thank you very much!
============================================================================
| DATA VALUES FOR INPUT VARIABLES |
============================================================================
Model: Example
(data file produced by on 29-Mar-2022 17:30:14)
============================================================================
FIRST DATA SECTION:
============================================================================
-> z [val]: 13.0000
-> m [in mm]: 3.0000
-> Type [str]: Ball
-> c_s [valval]: 0 0 0
----------------------------------------------------------------------------
SECOND DATA SECTION:
============================================================================
-> r_b [strandnumber]: C50
-> s0_r [in mm]: 0.0000
-> I_r [val]: 15.0000
-> Only text line
----------------------------------------------------------------------------
THIRD DATA SECTION:
============================================================================
-> Values found = 0.
----------------------------------------------------------------------------
FOURTH DATA SECTION:
============================================================================
-> n_1 [val_1]: 1.0000

5 Commenti

Jan
Jan il 29 Mar 2022
Modificato: Jan il 29 Mar 2022
The person, who has invented this file format, hates programmers. It looks smart with the pile of horizontal lines, but it is as hard to read as mud.
What is the desired output?
Dou you have a documentation, which explains the magic keywords: val, valval, val_1, str, in mm.
I detest violence. Please give the inventor of this format a respectful hint, that there are standard output formats as XML, which avoids the troubles you have.
Aurea94
Aurea94 il 29 Mar 2022
@Jan I guess you are right, and this .txt file is not surely the most apropiate for programmers. However, I cannot change that, but I will keep in mind the problems it generates when it will be me, the one coding this type of files.
[val, valval, val_1, str, in mm] are just dummy text. Descriptions were deleted for confidencial reasons.
Jan
Jan il 30 Mar 2022
We can be glad, that Stephen used his artistic power to solve the problem!
Aurea94
Aurea94 il 30 Mar 2022
Yes that was really really fantastic!!
Rik
Rik il 30 Mar 2022
For basic data like this, even JSON should be preferred. Just about every half-decent programming language can read it. Composing matrices is a bit tricky, but as long as you stick to scalars or vectors everything should be fine.

Accedi per commentare.

 Risposta accettata

Stephen23
Stephen23 il 29 Mar 2022
Modificato: Stephen23 il 29 Mar 2022
rgx = '^\s*->\s*(\w+)\s*\[[\w\s]+\]:\s*([^\r\n]+)';
str = fileread('ExampleTextFile.txt');
tkn = regexp(str,rgx,'tokens','lineanchors');
tkn = vertcat(tkn{:})
tkn = 8×2 cell array
{'z' } {'13.0000'} {'m' } {'3.0000' } {'Type'} {'Ball' } {'c_s' } {'0 0 0' } {'r_b' } {'C50' } {'s0_r'} {'0.0000' } {'I_r' } {'15.0000'} {'n_1' } {'1.0000' }
vec = str2double(tkn(:,2));
idx = ~isnan(vec);
tkn(idx,2) = num2cell(vec(idx))
tkn = 8×2 cell array
{'z' } {[ 13]} {'m' } {[ 3]} {'Type'} {'Ball' } {'c_s' } {'0 0 0'} {'r_b' } {'C50' } {'s0_r'} {[ 0]} {'I_r' } {[ 15]} {'n_1' } {[ 1]}
out = cell2struct(tkn(:,2),tkn(:,1),1)
out = struct with fields:
z: 13 m: 3 Type: 'Ball' c_s: '0 0 0' r_b: 'C50' s0_r: 0 I_r: 15 n_1: 1

4 Commenti

Very nice, @Stephen! You might want to also convert the c_s field to double like this:
out.c_s = sscanf(out.c_s, '%f')
At least it appears to me like this field was meant to be numeric.
Aurea94
Aurea94 il 29 Mar 2022
Modificato: Aurea94 il 29 Mar 2022
@Stephen this is just increible! Thank you soo much! It works perfectly for my file.
You saved me hours of work!!!
@Stephen; Can I get your email. I need a favour from you.
Rik
Rik il 18 Apr 2022
I expect your chances would be much better if you posted your request as a separate question and post the link here. People tend to be protective of their inbox.

Accedi per commentare.

Più risposte (0)

Categorie

Prodotti

Release

R2022a

Richiesto:

il 29 Mar 2022

Commentato:

Rik
il 18 Apr 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by