Remove specific data sequence when reading .bin file

How do I remove/skip the specific data sequence [1024 0 2240 -24500] when reading the .bin file in batch? thanks!
filename='rf.bin'; % filename='./rf.bin';
fid=fopen(filename,'r');
dataHeader=fread(fid,123,'uint8'); % skipping the header for .bin
NsperBatch = 1e3; % number of sample per batch
K=100; % Average every K set of values, K=100 in this case
magSpectrumMat=[];
while ~feof(fid)
magSpectrum=0;
for k=1:K
data=fread(fid,NsperBatch*2,'int16','b');
dataIQ=data(1:2:end)+1i*data(2:2:end);
dataSpectrum=fftshift(fft(dataIQ));
magSpectrum=magSpectrum+abs(dataSpectrum).^2;
end
magSpectrum = magSpectrum/K;
magSpectrumMat = [magSpectrumMat magSpectrum];
end
magSpectrumMat_dB=pow2db(magSpectrumMat);

3 Commenti

Where in the file can it occur? Can it appear just anywhere, like in the middle of a block of NsperBatch*2 values, and you want the code to notice and drop that from the sequence of values and read four more values to make up for the ignored data?
Ivy Chen
Ivy Chen il 12 Ott 2017
Modificato: Ivy Chen il 12 Ott 2017
Yes, it can occur anywhere. What you described above is the preferred way to just drop the [1024 0 2240 -24500] data sequence and read four more values in the block. This way, it will make the calculation and matrix clean. Thanks for the help.
Is there any possibility that it could occur inside the 123 byte header? Is there any possibility it could start inside the 123 byte header but end outside the header?

Accedi per commentare.

 Risposta accettata

Okay, here it is, with skips accounted for, and with automatic padding in case the data is the wrong size.
As you indicated the words to skip could occur "anywhere" after I asked about that, I assumed that the words to skip might even occur during that 123 byte header.
I assumed that if there were not a full K batch that you wanted to take the mean of what was available in the last partial batch rather than dividing by K specifically.
I vectorized a lot of the computation.
I was not completely sure of the order of data you wanted to output. I think your existing code is putting the results for averaging into column vectors in a matrix; that is the output format I create here.
The below is not tested as I do not happen to have your data file.
NsperBatch = 1e3;
header_size = 123;
K=100; % Average every K set of values, K=100 in this case
pattern_to_skip = int16([1024 0 2240 -24500]); %magic sequence of words to ignore
filename = 'rf.bin'; % filename='./rf.bin';
pattern_to_skip = typecast( swapbytes(pattern_to_skip), 'uint8'); %big endian
PL = length(pattern_to_skip);
fid = fopen(filename,'r');
bytes = reshape( fread(fid, inf, '*uint8'), 1, []); %row vector
fclose(fid);
orig_num_bytes = length(bytes);
skiplocs = strfind(bytes, pattern_to_skip);
for idx = fliplr(skiplocs)
bytes(idx:idx+PL-1) = []; %delete bytes
end
postskip_num_bytes = length(bytes);
fprintf('%d groups were skipped\n', (orig_num_bytes - postskip_num_bytes) / PL );
dataHeader = bytes(1:header_size);
bytes = bytes(header_size+1:end);
data_length = length(bytes);
if mod(data_length, 2) ~= 0
fprintf('warning: data is odd number of bytes long, padding\n');
bytes(end+1) = 0;
end
if mod(data_length, 4) ~= 0
fprintf('warning: data is odd number of words long, padding\n');
bytes(end+1:end+2) = 0;
end
words = typecast(bytes, 'int16');
all_dataIQ = double( complex( words(1:2:end), words(2:2:end) ) );
num_dataIQ = length(all_dataIQ);
target_num_dataIQ = NsperBatch * ceil( num_dataIQ / NsperBatch);
if num_dataIQ ~= target_num_dataIQ
fprintf('warning: complex data is not a multiple of %d samples long, padding\n', numNsperBatch);
all_dataIQ(target_num_dataIQ) = 0; %zero fill automatically
end
magSpectra = abs(fftshift( fft( reshape(all_dataIQ, NsperBatch, []) ) )).^2; %do it all at once!
num_spectra = size(magSpectra, 2);
num_full_batches = floor(num_spectra / K);
num_leftover = num_spectra - K * num_full_batches;
num_batches = num_full_batches + (num_leftover ~= 0);
magSpectrumMat = zeros(NsperBatch, num_batches);
for batch_idx = 1 : num_full_batches
bstart = (batch_idx - 1) * K + 1;
bend = bstart + K - 1;
magSpectrum = mean( magSpectra(:, bstart : bend ), 2 );
magSpectrumMat(:, batch_idx) = magSpectrum;
end
if num_leftover ~= 0
magSpectrum = mean( magSpectra(:, end-num_leftover+1 : end), 2 );
magSpectrumMat(:, end) = magSpectrum;
end

8 Commenti

thanks so much for such detailed explanation and complete codes. I am incorporating these changes to test out.
If you do not want an incomplete group to be done, then
num_batches = num_full_batches;
and do not run the "if num_leftover ~= 0" branch.
Excellent, thanks soooo much! I almost done of incorporating everything!
Walter: all runs well and I received a couple suggestions and would like to consult with you.
1. When using typecast in code. How do we know it is using the big endian? As I did not see any argument being specified such as 'b' being used normally in the fread(fid, NesperBath*2, 'b').
2. As our group would like to emphasize more on differences, we want to normalize the "magSpectra". I did a bit research, and looks like there are a couple of options:
  • magSpectra_Norm= magSpectra/norm(magSpectra);
  • magSpectra_Norm= magSpectra/max(abs(magSpectra(:)));
which one will you recommend? Thanks!
MATLAB has only been supported on x86 and x64 architecture since R2010a. x86 and x64 architectures are little endian. On little-endian systems, the a sequence of bytes, int16([1024 0 2240 -24500]) is not hex 0400 0000 08C0 A04C, it is 0004 0000 C008 4CA0, so to search for int16([1024 0 2240 -24500]) in a sequence of bytes, you need to byte-swap it.
Normalizing by max(abs) guarantees that the result peaks at either +1 or -1. In some contexts, that is a good property, if you are trying to compare relative proportions from magSpectra to magSpectra. It is, however, not good if you need to be able to compare absolute proportions. For example, if the signal were to suddenly cut in half for everything, normalizing by max(abs) would not notice.
You also need to be aware of the fact that your values being normalized are fft results. The first entry in any fft result is the total power of the signal, sum() of the time series values, and if you are not working with a signal that has an expected mean of 0 and you are not subtracting off the mean before doing the fft, then it is much more common than not for that entry to dominate the output -- and so max(abs()) is going to be picking out that entry most of the time. But the rest of your entries are effectively relative frequency components and after you normalize by the total power they can be pretty small... If you want to show relative proportions of signal, then you should either subtract off the mean before doing the fft or else set the corresponding array element to 0 before doing the max(abs()). It is usually easier to set it before doing the fftshift().
Normalizing by norm() will give signal levels typically much below normalizing by max(abs)
There are many cases where it is best to normalize to mean 0 std 1, by subtracting off the mean and dividing the result by the standard deviation.
There are also many cases where it is best to calculate the normalization factors over all of the data rather than spectrum by spectrum.
You need to think more about what you would like to see.
For example suppose that you had two spectra, nearly the same except that one had a narrow band frequency injected, so giving another peak compared to the first. If you normalize by norm() then because the total of the squares of the samples increased, the norm would increase, so dividing by the norm would reduce the relative heights of the second signal. If you normalize by max(abs) then the relative heights would not change unless the injected signal happened to be the highest peak (but then it would...).
On the other hand, if you had two signals that were the same except that one had twice the magnitude around the mean that the other did, and the mean was 0, then dividing by max(abs) would show exactly the same results. But if the mean was not 0, then the total power is probably going to dominate the fft() and you end up normalizing by that, effectively scaling the graphs except for a spike up to 1 where the power is being plotted...
thanks, Walter! Based on the explanation, I understand how it is handled for the
pattern_to_skip = typecast( swapbytes(pattern_to_skip), 'uint8'); %big endian
For the line
words = typecast(bytes, 'int16');
do we need to do something similarly? or do we need to modify the line in
bytes = reshape( fread(fid, inf, '*uint8'), 1, []); %row vector
with "b", so it carries the big-endian read "bytes" when convert typecast to "words"?
For the normalized "magSpectra", I have run with both options and did not see significant differences in comparisons. Therefore, like you suggested, I will need to think about this a bit more and discuss with my team on the options. Your explanations are really helpful for me to understand the basic differences and the typical cases in using each of options. thanks again!
Do not modify the bytes = line. Your header is defined by an odd number of bytes, and if you swap at the time you read them in, you would move byte 123 to the position of byte 124 and would then be ignoring the wrong byte. So you have to scan as bytes and delete the garbage as bytes (unless you are sure the garbage never occurs in the headers), and once have scrubbed the garbage you need to trim off the first 123 bytes of what is left.
Once you have trimmed off the header, there is a possibility that you need to byte swap: it depends on how the data was stored.
When you described the data values to remove, I assumed you had read through the data stream and had found those particular numeric values after reading as int16, with the implication that the bytes were in the other order (because the native order on whatever host you are using is little-endian.) But it is possible that you were told the sequence of bytes by someone else who assumed you were using big endian, in which case the byteswap would not be needed... Do you have a sample file known to have the sequence of bytes in it that you could process with byteswap or not on the match for deletion, to check to see which is happening in practice?
If the data is written as big-endian then you would need to byteswap the int16, which you would do by changing
words = typecast(bytes, 'int16');
to
words = byteswap( typecast(bytes, 'int16') );
It is the last case that the IQ data is big endian. I will add the byteswap to complete the process. thanks again!

Accedi per commentare.

Più risposte (0)

Categorie

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by