Auto Detect different file types?
7 visualizzazioni (ultimi 30 giorni)
Mostra commenti meno recenti
Hello,
I am trying to edit a program so that it is capable of auto detecting different text files. Currently, I am using two different pograms to open and report the seperate text files using the following bits of code:
Program 1:
filespec=[fpath char(fnameALL(2))]; TD=filespec;
delimiter='[\t]';comment='';quotes='';options='numeric';
[TDdata, ~]= readtext(filespec, delimiter, comment, quotes, options);
pVelocity = TDdata(6,62)*100; pCadence = (TDdata(6,21)+TDdata(6,48))/2;
pStride = TDdata(6,65)*100; pStepWidth = TDdata(6,68)*100;
pGSR=(pCadence/60)/(pVelocity/100);
pRTO = (TDdata(6,39)/TDdata(6,33))*100; pLTO = (TDdata(6,12)/TDdata(6,9))*100;
pRSS = (TDdata(6,30)/TDdata(6,33))*100; pLSS = (TDdata(6,57)/TDdata(6,9))*100;
pRSTEP = TDdata(6,42)*100; pLSTEP = TDdata(6,15)*100;
pROTO = (TDdata(6,36)/TDdata(6,33))*100; pLOTO = (TDdata(6,60)/TDdata(6,9))*100;
ToeOff = [pRTO pLTO];
filespec=[fpath char(fnameALL(3))]; TD=filespec;
delimiter='[\t]';comment='';quotes='';options='numeric';
[TDalldata, result]= readtext(filespec, delimiter, comment, quotes, options);
Num_trialstd=(length(TDalldata(1,:))-1)/68;
Program 2:
[fname fpath]=uigetfile('*.txt','Please select the _td file');
conditionid=input('Enter the condition (no spaces): ','s');
cd(fpath);
[dataALL,results]=readtext(fname,';','','','numeric');
[row, col]=find(dataALL(:,3)>0);
data=dataALL(row:length(dataALL),:);
What I am wondering is if there is a function I am unaware of that would automatically be able to distinguish the differences between text files?
If what i'm asking is unclear, I can provide clarification.
Thank you.
2 Commenti
Voss
il 21 Gen 2022
Is the idea is that you have a set of text files, each of which may have one format or another but you can't tell what the format is beforehand? If so, you may try something along the lines of:
try
read_file_method_1(file_name);
catch
read_file_method_2(file_name);
end
But you'd have to be sure that attempting to read any file with format 2 using method 1 will generate an error, i.e., you don't want to be able to "successfully" call read_file_method_1() on a file with format 2 and get nonsense results. You may have to do some sanity-check in read_file_method_1() that makes sure everything looks good and if not, throw an error to trigger the catch block, which will call read_file_method_2().
Is that more-or-less the situation here? If not, please explain more about what the situation is, and maybe attach a couple of sample text files.
Risposta accettata
Voss
il 24 Gen 2022
Since the only difference between the way the two file types are read with readtext() is the delimiter, you can try different delimiters until you find one that works. With those two files you posted, I found that readtext() returns all NaNs if you use the wrong delimiter, so I'm using that as the condition that determines whether the file was read correctly or not. (If you have any file that returns all NaNs which needs to be considered valid, then you'd have to use a different condition.)
The following code loops over a set of files and for each file tries readtext() with each different delimiter (in this case just '[\t]' and ';' but the code will work for any number of delimiters) until one gives something that's not all NaNs. Then, for the next file, the delimiter that worked is tried first.
my_files = {'111111111d_test_HT_td.txt' '111111111e_ss.txt'};
delimiters = {'[\t]' ';'};
n_delimiters = numel(delimiters);
delimiter_idx = 1;
for i = 1:numel(my_files)
fprintf('preparing to read file %s:\n',my_files{i});
tried_delimiters = false(1,n_delimiters);
success = false;
while any(~tried_delimiters)
fprintf('\ttrying readtext() with delimiter ''%s'' ...\n',delimiters{delimiter_idx});
data = readtext(my_files{i},delimiters{delimiter_idx},'','','numeric');
tried_delimiters(delimiter_idx) = true;
if all(isnan(data(:)))
fprintf('\tfailed\n');
delimiter_idx = mod(delimiter_idx,n_delimiters)+1; % switch to the next delimiter
continue
end
success = true;
break
end
if success
% successfully read my_files{i} with delimiter delimiters{delimiter_idx}
fprintf('\tsuccess\n');
else
% couldn't figure out how to read this file
fprintf('\tall delimiters failed. couldn''t read the file\n');
continue
end
if delimiter_idx == 1
% do file type 1 stuff
else
% do file type 2 stuff
end
end
6 Commenti
Voss
il 1 Feb 2022
When you say you've "read these files into a cell array", I assume that means you've modified the code I posted so that when it finds a delimiter that works, it stores the data variable in a cell array with one cell per file. (Which seems like a reasonable thing to do.) Is that what you mean?
If that's the case, and now you want to know how to figure out from the contents of each cell whether it was a type 1 or type 2 file, then you'd have to be able to distinguish between the two file types based on what comes from readtext() for each file type. readtext() returns a matrix, so you'd have to know something about the size of possible matrices returned by readtext() in each case or the possible locations of the NaN's in the matrix, etc. I have no idea about the range of possiblities for what those files could possibly contain, so I wouldn't be able to put any conditions on the matrices from readtext() in order to distinguish one type from another. But you may know more about what the possibilities are for those file types and hence what the matrices from readtext should look like, so you may be able to come up with some condition to distinguish the two types.
However, I think it may be easier to just keep track of each file's type when it is succesfully read, rather than trying to go back after the fact and figure it out from the matrices you end up with. That would look something like this minor modification to the code above:
my_files = {'111111111d_test_HT_td.txt' '111111111e_ss.txt'};
N = numel(my_files);
file_type = zeros(1,N);
file_data = cell(1,N);
delimiters = {'[\t]' ';'};
n_delimiters = numel(delimiters);
delimiter_idx = 1;
for i = 1:N
fprintf('preparing to read file %s:\n',my_files{i});
tried_delimiters = false(1,n_delimiters);
success = false;
while any(~tried_delimiters)
fprintf('\ttrying readtext() with delimiter ''%s'' ...\n',delimiters{delimiter_idx});
data = readtext(my_files{i},delimiters{delimiter_idx},'','','numeric');
tried_delimiters(delimiter_idx) = true;
if all(isnan(data(:)))
fprintf('\tfailed\n');
delimiter_idx = mod(delimiter_idx,n_delimiters)+1; % switch to the next delimiter
continue
end
file_type(i) = delimiter_idx;
file_data{i} = data;
success = true;
break
end
if success
% successfully read my_files{i} with delimiter delimiters{delimiter_idx}
fprintf('\tsuccess\n');
else
% couldn't figure out how to read this file
fprintf('\tall delimiters failed. couldn''t read the file\n');
end
end
Then you could run through your subsequent operations with the data from the files like this:
for i = 1:N
if file_type(i) == 1
% do file type 1 stuff with file_data{i}
elseif file_type(i) == 2
% do file type 2 stuff with file_data{i}
end
end
I'm not sure if that answers your question. If not, let me know.
Più risposte (1)
Vedere anche
Categorie
Scopri di più su Low-Level File I/O in Help Center e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!