Read text file after a specific text line but avoiding only the next line

Hello, I am collecting data after "# HHE HHN HHZ" (I only copy the first 3 rows after "# HHE HHN HHZ" as an example as there could be hundreds) and the position of these columns can vary. I have made a script for a specific text file (see example 1)
Example1:
#
# 4. COMMENTS
# BASELINE CORRECTED
#
# 5. ACCELERATION DATA
# HHE HHN HHZ
-0.02104708 -0.02134472 0.00412299
-0.00340606 0.08357343 0.02083563
-0.02940362 0.00093856 0.00505147
The script is the following for the case of one combination of columns defined as textline1, textline2 and so on, which are neccesary so that the data can be unified (rearranged) to a specific position as output:
textline1 = '# HNE HNN HNZ';
%First mixed data%
if index==0
index = strcmp(tline,textline1); %%EO NS UD
if index ==1; index=1; end
elseif index ==1
tmp=sscanf(tline,'%f %f %f %f');
tmp1 = [tmp(1); tmp(2); tmp(3)]; % rearrange to EO=X NS=Y UD=Y
Output = [Output; tmp1'];
end
However, the some records present the following text format where there is a "T" before the data to be collected (after "# HHE HHN HHZ"):
#
# 4. COMMENTS
# BASELINE CORRECTED
#
# 5. ACCELERATION DATA
# HHE HHN HHZ
T
-0.02104708 -0.02134472 0.00412299
-0.00340606 0.08357343 0.02083563
-0.02940362 0.00093856 0.00505147
Any help to fix the coding for that case. Thank you very much.

11 Commenti

I would suggest to you that it might be easiest to read the entire file as a character vector and then do text manipulation such as regexp() to extract parts from it.
I would suggest you attach a couple sample files instead of just a snippet from each and explain precisely what is the end objective...
From the two snippets above, one could simply use
data=readmatrix(fullfile('YourDataDir','YourFileName'),'CommentStyle',{'T','#'});
Unfortunately, there's probably other stuff in the file, too, but we can't see that to know what could be done efficiently...
Hi, Thank you for your reply. I have attached an example file
The objective is to get the data after
# HHE HHN HHZ
T
as indicated above. Thank you
Well, those show that there isn't anything else in the file after that so the above would work just fine...
Thank you for your reply. I tried this code as you kindly suggested
%First mixed data%
if index==0
index = strcmp(tline,textline1); %%EO NS UD
if index ==1; index=1; end
elseif index ==1
tmp=readmatrix(fullfile(textline1,filename'),'CommentStyle',{'T','#'});
%%tmp=sscanf(tline,'%f %f %f %f');
tmp1 = [tmp(1); tmp(2); tmp(3)]; % rearrange to EO=X NS=Y UD=Y
Output = [Output; tmp1'];
end
However, "tmp" is empty when I run it.
tmp1, tmp2, tmp3 and so on are sorted out according to the arrangement of the columns indicated in textline1 = '# HNE HNN HNZ'; textline2 = '# HNE HNZ HNN'; textline3 = '# HNN HNE HNZ'; and so on that are found in the textfile. For instance, the second case arragement of columns for another case of text file is the following:
textline2 = '# HNE HNZ HNN';
%Second mixed data%
if index==0
index = strcmp(tline,textline2); %% EO UD NS
if index ==1; index=2; end
elseif index==2
tmp=readmatrix(fullfile(textline2,filename'),'CommentStyle',{'T','#'});
tmp=sscanf(tline,'%f %f %f %f');
tmp2 = [tmp(1); tmp(3); tmp(2)]; % rearrange to EO=X NS=Y UD=Y
Output = [Output; tmp2'];
end
Thak you for your response. It works perfectly when I run it alone. However, when I put the code to work all together there is some issues. The output is still empty. You can see the whole script function in detail below, where "filename" is the name of the file. This functions extract 3 informaton for each record: fs, output and STATION. Output is the acceleration records extracted after "# HHE HHN HHZ
T"
function [fs, Output, STATION] = import_from_CISMID_new4(filename)
textline1 = '# HNE HNN HNZ';
textline2 = '# HNE HNZ HNN';
textline3 = '# HNN HNE HNZ';
textline4 = '# HNN HNZ HNE';
textline5 = '# HNZ HNE HNN';
textline6 = '# HNZ HNN HNE';
fid = fopen(filename,'r');
tline = fgetl(fid);
i = 1;
Output = [];
index = 0;
index_fs = 0;
index_station = 0; %index added for stations
while ischar(tline)
%new condition added for stations
if index_station == 0
if strfind(tline,'# STATION: ') > 0
index_station = 1;
t = extractAfter(tline,"# STATION: ");
if length(t)> 10
index_braket = strfind(tline,"("); %include character until where it should be considered the name of the station
STATION = tline(length('# STATION: ')+1 : index_braket-2);
else
STACODE = extractAfter(tline,"# STATION: ");
STATION = STACODE(find(~isspace(STACODE)));
end
end
end
%find sampling frequency
if length(tline)>27 %%COUNT NUMBER OF CHARACTERS AND CHANGE IT AFTER >%%%%%%
index_fs = strcmp(tline(1:27),'# SAMPLING FREQUENCY (Hz): ');
if index_fs == 1
str_output = remove_letters_1(tline);
fs = str2double(str_output);
index_fs = 0;
end
end
%Getting acceleration
%First mixed data%
if index==0
index = strcmp(tline,textline1); %%EO NS UD
if index ==1; index=1; end
elseif index ==1
%fid=fopen(filename,'r'); % opent the file for low-level i/o
n=0; % initialize line counter
tline=''; % preset line content to nothing
while ~contains(tline,'ACCELERATION DATA') % look for the acceleration data section
tline=fgetl(fid);
n=n+1;
end
for ii=1:3 % after found it, look for the data with, without a "T" record
tline=fgetl(fid);
if strcmp(tline(1),blanks(1)) | ii>5; break; n=n-1; end % test for the record beginning of data; bail out if something goes wrong
n=n+1;
end
%fid=fclose(fid); % ok, close the file and do high-level read
data=readmatrix(filename,'NumHeaderLines',n);
whos data
tmp=data(1:end,:);
tmp1 = [tmp(:,1) tmp(:,2) tmp(:,3)]; % rearrange to EO=X NS=Y UD=Y
Output = [Output; tmp1];
end
%Second mixed data%
if index==0
index = strcmp(tline,textline2); %% EO UD NS
if index ==1; index=2; end
elseif index==2
%fid=fopen(filename,'r'); % opent the file for low-level i/o
n=0; % initialize line counter
tline=''; % preset line content to nothing
while ~contains(tline,'ACCELERATION DATA') % look for the acceleration data section
tline=fgetl(fid);
n=n+1;
end
for ii=1:3 % after found it, look for the data with, without a "T" record
tline=fgetl(fid);
if strcmp(tline(1),blanks(1)) | ii>5; break; n=n-1; end % test for the record beginning of data; bail out if something goes wrong
n=n+1;
end
%fid=fclose(fid); % ok, close the file and do high-level read
data=readmatrix(filename,'NumHeaderLines',n);
whos data
tmp=data(1:end,:);
tmp2 = [tmp(1); tmp(3); tmp(2)]; % rearrange to EO=X NS=Y UD=Y
Output = [Output; tmp2];
end
%Third mixed data%
if index==0
index = strcmp(tline,textline3); %% NS EO UD
if index ==1; index=3; end
elseif index==3
%fid=fopen(filename,'r'); % opent the file for low-level i/o
n=0; % initialize line counter
tline=''; % preset line content to nothing
while ~contains(tline,'ACCELERATION DATA') % look for the acceleration data section
tline=fgetl(fid);
n=n+1;
end
for ii=1:3 % after found it, look for the data with, without a "T" record
tline=fgetl(fid);
if strcmp(tline(1),blanks(1)) | ii>5; break; n=n-1; end % test for the record beginning of data; bail out if something goes wrong
n=n+1;
end
%fid=fclose(fid); % ok, close the file and do high-level read
data=readmatrix(filename,'NumHeaderLines',n);
whos data
tmp=data(1:end,:);
tmp3 = [tmp(2); tmp(1); tmp(3)]; % rearrange to EO=X NS=Y UD=Y
Output = [Output; tmp3];
end
%Fourth mixed data%
if index==0
index = strcmp(tline,textline4); % NS UD EO
if index ==1; index=4; end
elseif index==4
%fid=fopen(filename,'r'); % opent the file for low-level i/o
n=0; % initialize line counter
tline=''; % preset line content to nothing
while ~contains(tline,'ACCELERATION DATA') % look for the acceleration data section
tline=fgetl(fid);
n=n+1;
end
for ii=1:3 % after found it, look for the data with, without a "T" record
tline=fgetl(fid);
if strcmp(tline(1),blanks(1)) | ii>5; break; n=n-1; end % test for the record beginning of data; bail out if something goes wrong
n=n+1;
end
%fid=fclose(fid); % ok, close the file and do high-level read
data=readmatrix(filename,'NumHeaderLines',n);
whos data
tmp=data(1:end,:);
tmp4 = [tmp(3); tmp(1); tmp(2)]; % rearrange to EO=X NS=Y UD=Y
Output = [Output; tmp4];
end
%Fith mixed data%
if index==0
index = strcmp(tline,textline5); % UD EO NS
if index ==1; index=5; end
elseif index==5
%fid=fopen(filename,'r'); % opent the file for low-level i/o
n=0; % initialize line counter
tline=''; % preset line content to nothing
while ~contains(tline,'ACCELERATION DATA') % look for the acceleration data section
tline=fgetl(fid);
n=n+1;
end
for ii=1:3 % after found it, look for the data with, without a "T" record
tline=fgetl(fid);
if strcmp(tline(1),blanks(1)) | ii>5; break; n=n-1; end % test for the record beginning of data; bail out if something goes wrong
n=n+1;
end
%fid=fclose(fid); % ok, close the file and do high-level read
data=readmatrix(filename,'NumHeaderLines',n);
whos data
tmp=data(1:end,:);
tmp5 = [tmp(2); tmp(3); tmp(1)]; % rearrange to EO=X NS=Y UD=Y
Output = [Output; tmp5];
end
%Sixth mixed data%
if index==0
index = strcmp(tline,textline6); % UD NS EO
if index ==1; index=6; end
elseif index==6
%fid=fopen(filename,'r'); % opent the file for low-level i/o
n=0; % initialize line counter
tline=''; % preset line content to nothing
while ~contains(tline,'ACCELERATION DATA') % look for the acceleration data section
tline=fgetl(fid);
n=n+1;
end
for ii=1:3 % after found it, look for the data with, without a "T" record
tline=fgetl(fid);
if strcmp(tline(1),blanks(1)) | ii>5; break; n=n-1; end % test for the record beginning of data; bail out if something goes wrong
n=n+1;
end
%fid=fclose(fid); % ok, close the file and do high-level read
data=readmatrix(filename,'NumHeaderLines',n);
whos data
tmp=data(1:end,:);
tmp6 = [tmp(3); tmp(2); tmp(1)]; % rearrange to EO=X NS=Y UD=Y
Output = [Output; tmp6];
end
tline = fgetl(fid);
i = i+1;
end
fclose(fid);
end
Encapsulate the pieces to do the various parts as functions; don't repeat the same code over and over again in line; that is very time-consuming to do initially and makes for impossible-to-maintain/modify/debug later...
I asked for the complete requirments initially and didn't get anything back in response except to read the numeric array after the given header -- as suspected, more than that is needed.
Don't build in the data into the code; read the data and utilize it to make the decisions -- start out by locating the pieces of information needed and build a table record for each file that identifies it, including reading the channel record. You can then reorder the columns in a specific order for each file from that found in the file to build the consistent dataset for analysis. Depending on how the analyses will be carried out, one could either save the Nx3 array as the array or as three channel Nx1 vectors by channel name.
function [chn,idx]=getChannels(fid) % presume file already open, pass handle
% find channel record of form
% # CHANNEL: HNE HNN HNZ
% return identified channels and alphabetical order to rearrange data columns by
MATCHSTR='# CHANNEL: ';
l=fgetl(fid);
while ~startsWith(l,MATCHSTR)
l=fgetl(fid);
end
chn=strtrim(extracAfter,l,MATCHSTR);
chn=split(chn);
[chn,idx]=sort(chn);
end
When this is done, then move on to finding the sampling frequency in similar fashion. While the given file shows it is the next record and likely will always be, don't presume that to always be the case; it looks as though the file structure is one that can be somewhat flexible so there may be some that have other information as well (unless there is a document that describes the format that says otherwise).
I'd probably choose to save the date/time data as well as the magnitude and locations; likely will turn out to want just for the annotation later, if nothing else.
You might choose to also return the channel string as it exists before splitting/sorting; you could then use that as the key to find the beginning of the acceleration data.
The Q? about the existence or not of the "T" in each file is still open -- is it the case that some do and some don't have it? The key trick there is that you can't search for what isn't there except by the exhaustive search that fails which is very expensive. You can, of course always first presume it isn't and try to convert the first record and catch the error when it fails. The pain with reading data record-by-record is that there isn't a very convenient way to resynch back to the beginning of the record just read when did find it to read the whole set of data in one fscanf operation. When the "T" does exist and the conversion fails, then the next record on are the data and it's easy; when it didn't exist and the conversion succeeded, then read the rest and catenate that result to the first record.
You'll have much better success if you factorize the code into small pieces, each of which does its one task and then hands off to the next.
Thank you very much for your detail explanation. I really appreciated. I am going to modify the code as you suggested and try to fix the issue of getting more data.
NOTA BENE: In initial code above there was a typo/mismatch between the returned indexing variable and the variable used as the return value in the sort call -- I fixed above, but the original would have an issue...

Accedi per commentare.

 Risposta accettata

fn='https://www.mathworks.com/matlabcentral/answers/uploaded_files/1376874/CISMID_SC_SCARQ_NEW_TOCHECH.txt';
data=readmatrix(fn,'CommentStyle',{'#','T'});
whos data
Name Size Bytes Class Attributes data 49626x4 1588032 double
[data(1:5,:); nan(1,size(data,2)) ; data(end-4:end,:)]
ans = 11×4
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN -20.2870 NaN NaN NaN NaN NaN NaN 0.0448 -0.0758 -0.0541 NaN -0.0259 -0.0098 0.0058 NaN -0.0848 0.0277 -0.0031 NaN -0.0596 0.0094 -0.0153 NaN
Well, that's a spectacular failure in that the published/documented comment style didn't seem to work well at all...would have to delve into that some more, but may be worthy of a support ticket if don't find an obvious cause that I don't see just looking at the file in the browser.
BTW, since there isn't anythng after the section, you could shorten the file significantly before posting and not lose anything; I was presuming there were probably other sections after the data.
Anyways, let's do something a little different...
opt=detectImportOptions(fn,'Readvariablenames',0,'ExpectedNumVariables',3)
opt =
DelimitedTextImportOptions with properties: Format Properties: Delimiter: {'\t' ' '} Whitespace: '\b' LineEnding: {'\n' '\r' '\r\n'} CommentStyle: {} ConsecutiveDelimitersRule: 'join' LeadingDelimitersRule: 'ignore' TrailingDelimitersRule: 'ignore' EmptyLineRule: 'skip' Encoding: 'UTF-8' Replacement Properties: MissingRule: 'fill' ImportErrorRule: 'fill' ExtraColumnsRule: 'ignore' Variable Import Properties: Set types by name using setvartype VariableNames: {'Var1', 'Var2', 'Var3'} VariableTypes: {'double', 'double', 'double'} SelectedVariableNames: {'Var1', 'Var2', 'Var3'} VariableOptions: Show all 3 VariableOptions Access VariableOptions sub-properties using setvaropts/getvaropts VariableNamingRule: 'modify' Location Properties: DataLines: [15 Inf] VariableNamesLine: 0 RowNamesColumn: 0 VariableUnitsLine: 0 VariableDescriptionsLine: 0 To display a preview of the table, use preview
data=readmatrix(fn,opt);
whos data
Name Size Bytes Class Attributes data 49631x3 1191144 double
data(1:5,:)
ans = 5×3
NaN NaN NaN NaN 2.0000 NaN NaN NaN NaN NaN NaN NaN NaN NaN -20.2870
Well, now we've again illustrated the import detection tool isn't all that great sometimes; particularly for text files...always like to try the higher-level things first, but when they don't work, revert to brute force to find the header..
fid=fopen('CISMID_SC_SCAR...W_TOCHECH.txt','r'); % opent the file for low-level i/o
n=0; % initialize line counter
l=''; % preset line content to nothing
while ~contains(l,'ACCELERATION DATA') % look for the acceleration data section
l=fgetl(fid);
n=n+1;
end
for i=1:3 % after found it, look for the data with, without a "T" record
l=fgetl(fid);
if strcmp(l(1),blanks(1)) | i>5; break; n=n-1; end % test for the record beginning of data; bail out if something goes wrong
n=n+1;
end
l = '# HNE HNN HNZ'
l = 'T'
l = ' 0.02165885 -0.06615625 0.00254670'
ans = 32
n = 37
fid=fclose(fid); % ok, close the file and do high-level read
data=readmatrix(fn,'NumHeaderLines',n);
whos data
Name Size Bytes Class Attributes data 49608x3 1190592 double
data(1:5,:)
ans = 5×3
0.0217 -0.0662 0.0025 0.1372 -0.0853 -0.0040 0.0745 -0.0395 0.0133 -0.0195 0.0550 0.0390 -0.0766 0.0929 0.0681
Could also use low-level read to scan the rest of the file from that point on, but it's somewhat of a pain to resynch the filepointer to the betinning of the previous record to resan it, so I just saved the header line count and read with high-level routine.

Più risposte (0)

Prodotti

Release

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by