Extracting data using regular expression
2 visualizzazioni (ultimi 30 giorni)
Mostra commenti meno recenti
Shuvashish Roy
il 20 Mag 2021
Commentato: Shuvashish Roy
il 21 Mag 2021
Hi,
I have the attached text file. I want to extract all the columns starting from line 1472(if used notepad) named "Physics", "Time", "dt", "Progress", "Nonlinear Iteration" "Linear Iterations"...."Nodes After Adaption". I don't know how to specify the header names so that only the numeric values after that headers are extracted in a dataframe or matrix format. Thanks a lot for your help.
Input file format:
Unnecessary lines with text
Unnevessary lines with text
................................
many unnecessay lines............
adh_run_func :: tfinal = 12513600.000000
Physics Time dt Progress Nonlinear Iteration Linear Iteration Max Resid Norm ... Nodes After Adaption
HYD_1 11908800 5 0 1 ........ ...65926
HYD_1 11908800 5 0 2 ...... ...65926
............................................................................................. ................................
............................................................................................. ................................
100% COMPLETE
output file format:
Physics Time dt Progress Nonlinear Iteration Linear Iteration Max Resid Norm ... Nodes After Adaption
HYD_1 11908800 5 0 1 ........ ...65926
HYD_1 11908800 5 0 2 ...... ...65926
............................................................................................. ................................
0 Commenti
Risposta accettata
per isakson
il 21 Mag 2021
Modificato: per isakson
il 21 Mag 2021
"all the columns [...] named "Physics", "Time", "dt", "Progress", "Nonlinear Iteration" "Linear Iterations"...."Nodes After Adaption" " I understand that as all the columns, none excluded.
There is a choice. Shall we use readtable() or textscan()? I don't think readtable() can handle this file without relying on the critical line numbers, which I hessitate to do. It is however possible to determine the line numbers needed in a separate step and then use readtable(). textscan() is able to parse a 1D character array, which readtabe() is not. Only TMW knows why.
I choose textscan().
%% Read file
chr = fileread('AR_20base_201214_adh.txt');
%% Remove meta data
% Using 'adh_run_func :: tfinal' feels more robust than using the line number
pos = regexp( chr, '^adh_run_func :: tfinal', 'once', 'lineanchors' );
chr(1:pos-1) = []; % remove until the first line that begins with 'adh_run_func :: tfinal'
%% Remove the summary lines at the end
pos = regexp( chr, '^\d+[\% ]+COMPLETE', 'once', 'lineanchors' );
chr(pos:end) = [];
%% Get the column headers
txt = regexp( chr, '^Physics.+?$', 'match', 'once', 'lineanchors' );
column_headers = strsplit( txt, '\t' );
%%
cac = textscan( chr, ['%s',repmat('%f',1,numel(column_headers)-1)] ...
, 'Headerlines' , 2 ... two remains after meta-data is removed
, 'Delimiter' , '\t' ...
, 'Whitespace' , ' %' ... ignore the %-sign in Progress
, 'CollectOutput' , true );
Physics = cac{1};
matrix = cac{2};
whos Physics matrix column_headers
Più risposte (0)
Vedere anche
Categorie
Scopri di più su Text Data Preparation in Help Center e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!