Read text file after a specific text line but avoiding only the next line
Mostra commenti meno recenti
Hello, I am collecting data after "# HHE HHN HHZ" (I only copy the first 3 rows after "# HHE HHN HHZ" as an example as there could be hundreds) and the position of these columns can vary. I have made a script for a specific text file (see example 1)
Example1:
#
# 4. COMMENTS
# BASELINE CORRECTED
#
# 5. ACCELERATION DATA
# HHE HHN HHZ
-0.02104708 -0.02134472 0.00412299
-0.00340606 0.08357343 0.02083563
-0.02940362 0.00093856 0.00505147
The script is the following for the case of one combination of columns defined as textline1, textline2 and so on, which are neccesary so that the data can be unified (rearranged) to a specific position as output:
textline1 = '# HNE HNN HNZ';
%First mixed data%
if index==0
index = strcmp(tline,textline1); %%EO NS UD
if index ==1; index=1; end
elseif index ==1
tmp=sscanf(tline,'%f %f %f %f');
tmp1 = [tmp(1); tmp(2); tmp(3)]; % rearrange to EO=X NS=Y UD=Y
Output = [Output; tmp1'];
end
However, the some records present the following text format where there is a "T" before the data to be collected (after "# HHE HHN HHZ"):
#
# 4. COMMENTS
# BASELINE CORRECTED
#
# 5. ACCELERATION DATA
# HHE HHN HHZ
T
-0.02104708 -0.02134472 0.00412299
-0.00340606 0.08357343 0.02083563
-0.02940362 0.00093856 0.00505147
Any help to fix the coding for that case. Thank you very much.
11 Commenti
Walter Roberson
il 7 Mag 2023
I would suggest to you that it might be easiest to read the entire file as a character vector and then do text manipulation such as regexp() to extract parts from it.
dpb
il 7 Mag 2023
I would suggest you attach a couple sample files instead of just a snippet from each and explain precisely what is the end objective...
From the two snippets above, one could simply use
data=readmatrix(fullfile('YourDataDir','YourFileName'),'CommentStyle',{'T','#'});
Unfortunately, there's probably other stuff in the file, too, but we can't see that to know what could be done efficiently...
Jorge Luis Paredes Estacio
il 7 Mag 2023
Jorge Luis Paredes Estacio
il 7 Mag 2023
Modificato: Walter Roberson
il 7 Mag 2023
Jorge Luis Paredes Estacio
il 7 Mag 2023
Jorge Luis Paredes Estacio
il 7 Mag 2023
Modificato: per isakson
il 8 Mag 2023
Jorge Luis Paredes Estacio
il 7 Mag 2023
Modificato: per isakson
il 8 Mag 2023
Encapsulate the pieces to do the various parts as functions; don't repeat the same code over and over again in line; that is very time-consuming to do initially and makes for impossible-to-maintain/modify/debug later...
I asked for the complete requirments initially and didn't get anything back in response except to read the numeric array after the given header -- as suspected, more than that is needed.
Don't build in the data into the code; read the data and utilize it to make the decisions -- start out by locating the pieces of information needed and build a table record for each file that identifies it, including reading the channel record. You can then reorder the columns in a specific order for each file from that found in the file to build the consistent dataset for analysis. Depending on how the analyses will be carried out, one could either save the Nx3 array as the array or as three channel Nx1 vectors by channel name.
function [chn,idx]=getChannels(fid) % presume file already open, pass handle
% find channel record of form
% # CHANNEL: HNE HNN HNZ
% return identified channels and alphabetical order to rearrange data columns by
MATCHSTR='# CHANNEL: ';
l=fgetl(fid);
while ~startsWith(l,MATCHSTR)
l=fgetl(fid);
end
chn=strtrim(extracAfter,l,MATCHSTR);
chn=split(chn);
[chn,idx]=sort(chn);
end
When this is done, then move on to finding the sampling frequency in similar fashion. While the given file shows it is the next record and likely will always be, don't presume that to always be the case; it looks as though the file structure is one that can be somewhat flexible so there may be some that have other information as well (unless there is a document that describes the format that says otherwise).
I'd probably choose to save the date/time data as well as the magnitude and locations; likely will turn out to want just for the annotation later, if nothing else.
You might choose to also return the channel string as it exists before splitting/sorting; you could then use that as the key to find the beginning of the acceleration data.
The Q? about the existence or not of the "T" in each file is still open -- is it the case that some do and some don't have it? The key trick there is that you can't search for what isn't there except by the exhaustive search that fails which is very expensive. You can, of course always first presume it isn't and try to convert the first record and catch the error when it fails. The pain with reading data record-by-record is that there isn't a very convenient way to resynch back to the beginning of the record just read when did find it to read the whole set of data in one fscanf operation. When the "T" does exist and the conversion fails, then the next record on are the data and it's easy; when it didn't exist and the conversion succeeded, then read the rest and catenate that result to the first record.
You'll have much better success if you factorize the code into small pieces, each of which does its one task and then hands off to the next.
Jorge Luis Paredes Estacio
il 8 Mag 2023
dpb
il 8 Mag 2023
NOTA BENE: In initial code above there was a typo/mismatch between the returned indexing variable and the variable used as the return value in the sort call -- I fixed above, but the original would have an issue...
Risposta accettata
Più risposte (0)
Categorie
Scopri di più su JSON Format in Centro assistenza e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!