How to loop fopen/fread/fseek to analyze large binary files in shorter segments ?
Mostra commenti meno recenti
I have a large bin file (10Gb) which contains binary data from 16 channels in a single array, plus one value for the time variable (basically a single sample is made by 17 binary values). The size of each sample should be 8bit.
I am already able to extract a segment of this file and analyze it channel by channel. I achieved this by loading the all file, reading it and creating a sub-array sized properly.
[logfname, pathname] = uigetfile('*.mat','Pick a log file');
logfullpathname = [pathname logfname];
load(logfullpathname);
datafullpathname = [pathname datafilename];
FID = fopen(datafullpathname,'r');
fwrite(FID,[(start_seg*60*Fs):(end_seg*60*Fs-1)]); %Fs = sampling freq = 20Khz (20000)
%%load selected data segment and select channel to analyze
currData = fread(FID,'double'); %%<- THIS IS THE COMMAND THAT SLOWS EVERYTHING DOWN
time = currData((1:(n_channels+1):end), 1);
SIGNAL = currData(((activechannel+1):(n_channels+1):end), 1);
trace = [time SIGNAL];
crop = trace((start_seg*60*Fs):(end_seg*60*Fs-1),:); % select data range of active channel to analyze
seg = transpose(crop(:, 2)); % should output column 2 values (SIGNAL) within the time range defined by "crop"
But this takes a minute or so for each segment (each segment is 1 minute).
what I would like to do is just loading a segment at a time and loop the analysis (the analysis per se does not take long).
however I am stuck: I am using ftell(FID) to get the indexes of my bin file just for testing purposes but it gives me always the same number and it does not loop.
this is my code for a 1hour recording file in which I select a segment from the 45th to 46th minute, I have tried both a "while" and a "for" cycle:
this is the WHILE cycle
%%WHILE CYCLE
file=('C:\Users\Admin\Documents\MATLAB\EOD examples\Examples 16ch\copy.mat');
fileID = fopen(file,'r');
feof(fileID)
index = 0;
n_channels = 16;
Fs = 20000; %sampling frequency
seg = Fs*60*(n_channels+1); %in the bin file data are organized in a vector [t(i),Ch(1)..Ch(i)], with t = time
size_of_double = 8;
while ~feof(fileID)
fseek(fileID,index*size_of_double,'bof'); %this should look for the first data point in the file
position = ftell(fileID); %this should report the current index
position
currData = fread(fileID, seg,'double');
currData
index=index+seg;
end
this is the FOR cycle %% FOR CYCLE
file=('C:\Users\Admin\Documents\MATLAB\EOD examples\Examples 16ch\copy.mat');
Fs = 20000;
n_channels = 16; %number of active channels
index=2; %index of the first data point
segsize = 60*Fs*(n_channels+1); % this is the length of 1 min segment, 20.000 samples per second
% get filesize
fileID = fopen(file,'r');
fseek(fileID, 0, 1); % move to end
file_length_in_byte = ftell(fileID); % read end position
size_of_double = 8;
file_length_in_double_elements = file_length_in_byte / size_of_double;
feof(fileID);
%step = 1; % in elements
for i = 1:segsize:file_length_in_double_elements-1; % until end of file
fileindex = (i-1)*segsize*size_of_double; % this is where we wanna go
fseek(fileID,fileindex,'bof'); % go to i-th data point
current_index = ftell(fileID); % get file index
current_index % this is where we went
currData = fread(fileID, 2,'double');
currData % here is what we read
end
fclose(fileID);
Basically, before adding the analysis part I would like to have as an output the indexes in the file corresponding to the starting points of each segment. I am stuck and I do not know where the issue is: the first cycle reports just a sequence of values that does not look to have the right interval (which should be 20000*60*17, if the segment is 1 minute long). The second just prompts one number, as if the loop would just run one time. Thanks for your help in advance !
5 Commenti
Couple comments and some questions...
...
FID = fopen(datafullpathname,'r');
fwrite(FID,[(start_seg*60*Fs):(end_seg*60*Fs-1)]);
You opened the file for read access then try to write to it. Whassup w/ that? What are you trying to do here?
Your description of the file format says "The size of each sample should be 8bit." but then you write
currData = fread(FID,'double'); %%COMMAND THAT SLOWS EVERYTHING DOWN
which reads the file as 8- BYTE floating-point doubles.
So which is it, a floating point double or an 8-bit A/D sampled data stream?
Also, the reason that fread statement takes a while is that it is reading the entire file; if you want to process a subset of the file (which is easily-enough done once we ascertain the actual data format), you have to specify how many elements you wish read with the optional second sizeA parameter.
Walter Roberson
il 13 Set 2018
while ~feof(fileID) fseek(fileID, ,index*size_of_double,'bof'); %this should look for, the first data point in the file position = ftell(fileID); %this should report the current index position
That code risks infinite loop. You are seeking to a constant location relative to the beginning of the file. If feof is not true before the loop and if it is not set by the first fseek then you infinite loop.
POSIX specifically says that fseek clears feof status. POSIX permits seeking past end of file because POSIX permits extending files by seeking and then writing.
But MATLAB does not follow POSIX behavior in this regard, and positions to end of file instead. But whether it would set feof in that case is not as clear.
So your code is either infinite loop or else relies upon an edge case together with relying on the file not to be too large...
LO
il 14 Set 2018
Jan
il 14 Set 2018
@Livio Oboti: What exactly is the problem?
How to loop fopen/fread/fseek to analyze large binary files in
shorter segments?
This should be easy. With segsize = 60*Fs*(n_channels+1) it should be trivial to use fseek(fid, n * segsize * 8) to move the file pointer to the wanted position.
LO
il 14 Set 2018
Risposta accettata
Più risposte (0)
Categorie
Scopri di più su Large Files and Big Data in Centro assistenza e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!