Read file with non-uniform lines?

7 visualizzazioni (ultimi 30 giorni)
bene1
bene1 il 25 Ott 2020
Commentato: bene1 il 27 Ott 2020
Hi. I'm a Matlab newbie. I would like to read in a file where the lines have different formats, as below.
% Coordinates
% Code ID X Y
C 101 0.001 0.001
C 102 1.002 0.002
C 103 1.003 1.003
C 104 0.004 1.004
% Distances
% Code ID From To Dist
D 201 101 103 1.417
D 202 102 104 1.414
If the first character is C, use...
A = textscan(fid,'%c %d %f %f')
If the first character is D, use...
A = textscan(fid,'%c %d %d %d %f')
After, I'd like to assign the data to structs (c.id, c.x, c.y, d.id, d.from, d.to, d.dist), but first I think I just need to get it scanned in. Is it possible to apply some logic to reading the file? Thank you.
  5 Commenti
Walter Roberson
Walter Roberson il 26 Ott 2020
'^\s*C.*$', 'dotexceptnewline', 'lineachors'
or
'(?<=(^|\n))\s*C[^\n]*'
with no additional options needed
bene1
bene1 il 26 Ott 2020
Great, thanks again. Now have...
C =
4×1 cell array
{' C 101 0.001 0.001←'}
{' C 102 1.002 0.002←'}
{' C 103 1.003 1.003←'}
{' C 104 0.004 1.004←'}
With C as a 4x1, I believe my next step is to extract out the columns. My first thought was
A = textscan(C,'%c %d %f %f')
but I see I can't do that. Looking into cell2struct?

Accedi per commentare.

Risposta accettata

Walter Roberson
Walter Roberson il 26 Ott 2020
Named tokens, I said. Do not extract the lines ahead of time.
FileText = fileread(YourFileName);
Ctokens = regexp(FileText, '^\s*C\s+(?<ID>\d+)\s+(?<X>\S+)\s+(?<Y>\S+)', 'names', 'lineanchors');
%Ctokens will now be a struct array with field names ID, X, and Y, each of which are character vectors.
C.ID = str2double({Ctokens.ID});
C.X = str2double({Ctokens.X});
C.Y = str2double({Ctokens.Y});
Dtokens = regexp(FileText, '^\s*D\s+(?<ID>\d+)\s+(?<From>\d+)\s+(?<To>\d+)\s+(?<Dist>\S+)', 'names', 'lineanchors');
%Dtokens will now be a struct array with field names ID, From, To, Dist, each of which are character vectors.
D.ID = str2double({Dtokens.ID});
D.From = str2double({Dtokens.From});
D.To = str2double({Dtokens.To});
D.Dist = str2double({Dtokens.Dist});
Amount of processing work is pretty minimial. Pretty much all of the effort is in figuring out the proper regexp patterns to use (which can be pretty tricky when there are variant lines.)

Più risposte (1)

per isakson
per isakson il 26 Ott 2020
>> S = cssm( 'd:\m\cssm\cssm.txt' )
S =
1×2 struct array with fields:
header
colhead
Code
data
>> S(1)
ans =
struct with fields:
header: "Coordinates"
colhead: ["Code" "ID" "X" "Y"]
Code: [4×1 string]
data: [4×3 double]
>> S(2)
ans =
struct with fields:
header: "Distances"
colhead: ["Code" "ID" "From" "To" "Dist"]
Code: [2×1 string]
data: [2×4 double]
where
function sas = cssm( ffs )
chr = fileread( ffs );
str = string( chr );
str = replace( str, char([13,10]), newline ); % get rid of the carriage return
% split the string into blocks. Use the block header as delimiter.
[blk,del] = strsplit( str, '(?m)^\x20*%\x20\w+\x20*\n' ...
, 'DelimiterType','RegularExpression' );
blk(1) = []; % remove empty block before the first delimiter
len = numel( del );
sas(1,len) = struct( 'header',"", 'colhead',"", 'Code',"", 'data',nan );
for jj = 1 : len % loop over all blocks
sas(jj).header = regexp( del(jj), '\w+', 'match','once' ); % match the name
cac = textscan( blk(jj), "%[^\n]", 1 ); % read the first row
tmp = strsplit( string(cac{1}) ); % split the row into column headers
tmp(1) = []; % remove the comment character, "%"
sas(jj).colhead = tmp;
cac = textscan( blk(jj), ['%s',repmat('%f',1,numel(tmp)-1)] ...
, 'Headerlines',1, 'CollectOutput',true );
sas(jj).Code = string(cac{1});
sas(jj).data = cac{2};
end
end
and where cssm.txt contains the data given in of your question.

Categorie

Scopri di più su Large Files and Big Data in Help Center e File Exchange

Prodotti


Release

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by