Read file with non-uniform lines?

Question

bene1 il 25 Ott 2020

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/625818-read-file-with-non-uniform-lines

Commentato: bene1 il 27 Ott 2020

Hi. I'm a Matlab newbie. I would like to read in a file where the lines have different formats, as below.

% Coordinates
%   Code    ID      X         Y
    C       101     0.001     0.001
    C       102     1.002     0.002
    C       103     1.003     1.003
    C       104     0.004     1.004
% Distances
%   Code    ID      From      To      Dist
    D       201     101       103     1.417
    D       202     102       104     1.414

If the first character is C, use...

A = textscan(fid,'%c %d %f %f')

If the first character is D, use...

A = textscan(fid,'%c %d %d %d %f')

After, I'd like to assign the data to structs (c.id, c.x, c.y, d.id, d.from, d.to, d.dist), but first I think I just need to get it scanned in. Is it possible to apply some logic to reading the file? Thank you.

5 Commenti
Mostra 3 commenti meno recentiNascondi 3 commenti meno recenti

Walter Roberson il 26 Ott 2020

Apri in MATLAB Online

'^\s*C.*$', 'dotexceptnewline', 'lineachors'

or

'(?<=(^|\n))\s*C[^\n]*'

with no additional options needed

bene1 il 26 Ott 2020

Apri in MATLAB Online

Great, thanks again. Now have...

C =
  4×1 cell array
    {'    C       101     0.001     0.001←'}
    {'    C       102     1.002     0.002←'}
    {'    C       103     1.003     1.003←'}
    {'    C       104     0.004     1.004←'}

With C as a 4x1, I believe my next step is to extract out the columns. My first thought was

A = textscan(C,'%c %d %f %f')

but I see I can't do that. Looking into cell2struct?

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

Walter Roberson il 26 Ott 2020

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/625818-read-file-with-non-uniform-lines#answer_524468

Apri in MATLAB Online

Named tokens, I said. Do not extract the lines ahead of time.

FileText = fileread(YourFileName);
Ctokens = regexp(FileText, '^\s*C\s+(?<ID>\d+)\s+(?<X>\S+)\s+(?<Y>\S+)', 'names', 'lineanchors');
%Ctokens will now be a struct array with field names ID, X, and Y, each of which are character vectors.
C.ID = str2double({Ctokens.ID});
C.X = str2double({Ctokens.X});
C.Y = str2double({Ctokens.Y});
Dtokens = regexp(FileText, '^\s*D\s+(?<ID>\d+)\s+(?<From>\d+)\s+(?<To>\d+)\s+(?<Dist>\S+)', 'names', 'lineanchors');
%Dtokens will now be a struct array with field names ID, From, To, Dist, each of which are character vectors.
D.ID = str2double({Dtokens.ID});
D.From = str2double({Dtokens.From});
D.To = str2double({Dtokens.To});
D.Dist = str2double({Dtokens.Dist});

Amount of processing work is pretty minimial. Pretty much all of the effort is in figuring out the proper regexp patterns to use (which can be pretty tricky when there are variant lines.)

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

bene1 il 27 Ott 2020

Cool, thank you kindly!

Accedi per commentare.

Answer 2

per isakson il 26 Ott 2020

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/625818-read-file-with-non-uniform-lines#answer_524413

Apri in MATLAB Online

>> S = cssm( 'd:\m\cssm\cssm.txt' )
S = 
  1×2 struct array with fields:
    header
    colhead
    Code
    data
>> S(1)
ans = 
  struct with fields:
     header: "Coordinates"
    colhead: ["Code"    "ID"    "X"    "Y"]
       Code: [4×1 string]
       data: [4×3 double]
>> S(2)
ans = 
  struct with fields:
     header: "Distances"
    colhead: ["Code"    "ID"    "From"    "To"    "Dist"]
       Code: [2×1 string]
       data: [2×4 double]

where

function    sas = cssm( ffs )
    
    chr = fileread( ffs );
    str = string( chr );
    str = replace( str, char([13,10]), newline );   % get rid of the carriage return
   
    % split the string into blocks. Use the block header as delimiter. 
    [blk,del] = strsplit( str, '(?m)^\x20*%\x20\w+\x20*\n'  ...      
                        , 'DelimiterType','RegularExpression' );
                    
    blk(1) = [];  % remove empty block before the first delimiter                    
    
    len = numel( del );
    sas(1,len) = struct( 'header',"", 'colhead',"", 'Code',"", 'data',nan );
    
    for jj = 1 : len    % loop over all blocks
        
        sas(jj).header = regexp( del(jj), '\w+', 'match','once' );  % match the name
        
        cac = textscan( blk(jj), "%[^\n]", 1 ); % read the first row
        tmp = strsplit( string(cac{1}) );       % split the row into column headers
        tmp(1) = [];                            % remove the comment character, "%"
        sas(jj).colhead = tmp;
        
        cac = textscan( blk(jj), ['%s',repmat('%f',1,numel(tmp)-1)] ...
                    ,   'Headerlines',1, 'CollectOutput',true );
        sas(jj).Code = string(cac{1});
        sas(jj).data = cac{2};
    end
end

and where cssm.txt contains the data given in of your question.

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

bene1 il 27 Ott 2020

Thank you for the idea. :-)

Accedi per commentare.

Read file with non-uniform lines?

5 Commenti
Mostra 3 commenti meno recentiNascondi 3 commenti meno recenti

Risposta accettata

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Più risposte (1)

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

Read file with non-uniform lines?

5 Commenti Mostra 3 commenti meno recentiNascondi 3 commenti meno recenti

Risposta accettata

1 Commento Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Più risposte (1)

1 Commento Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

5 Commenti
Mostra 3 commenti meno recentiNascondi 3 commenti meno recenti

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti