Auto Detect different file types?

7 visualizzazioni (ultimi 30 giorni)
Stuart Nezlek
Stuart Nezlek il 21 Gen 2022
Modificato: Stuart Nezlek il 2 Feb 2022
Hello,
I am trying to edit a program so that it is capable of auto detecting different text files. Currently, I am using two different pograms to open and report the seperate text files using the following bits of code:
Program 1:
filespec=[fpath char(fnameALL(2))]; TD=filespec;
delimiter='[\t]';comment='';quotes='';options='numeric';
[TDdata, ~]= readtext(filespec, delimiter, comment, quotes, options);
pVelocity = TDdata(6,62)*100; pCadence = (TDdata(6,21)+TDdata(6,48))/2;
pStride = TDdata(6,65)*100; pStepWidth = TDdata(6,68)*100;
pGSR=(pCadence/60)/(pVelocity/100);
pRTO = (TDdata(6,39)/TDdata(6,33))*100; pLTO = (TDdata(6,12)/TDdata(6,9))*100;
pRSS = (TDdata(6,30)/TDdata(6,33))*100; pLSS = (TDdata(6,57)/TDdata(6,9))*100;
pRSTEP = TDdata(6,42)*100; pLSTEP = TDdata(6,15)*100;
pROTO = (TDdata(6,36)/TDdata(6,33))*100; pLOTO = (TDdata(6,60)/TDdata(6,9))*100;
ToeOff = [pRTO pLTO];
filespec=[fpath char(fnameALL(3))]; TD=filespec;
delimiter='[\t]';comment='';quotes='';options='numeric';
[TDalldata, result]= readtext(filespec, delimiter, comment, quotes, options);
Num_trialstd=(length(TDalldata(1,:))-1)/68;
Program 2:
[fname fpath]=uigetfile('*.txt','Please select the _td file');
conditionid=input('Enter the condition (no spaces): ','s');
cd(fpath);
[dataALL,results]=readtext(fname,';','','','numeric');
[row, col]=find(dataALL(:,3)>0);
data=dataALL(row:length(dataALL),:);
What I am wondering is if there is a function I am unaware of that would automatically be able to distinguish the differences between text files?
If what i'm asking is unclear, I can provide clarification.
Thank you.
  2 Commenti
Voss
Voss il 21 Gen 2022
Is the idea is that you have a set of text files, each of which may have one format or another but you can't tell what the format is beforehand? If so, you may try something along the lines of:
try
read_file_method_1(file_name);
catch
read_file_method_2(file_name);
end
But you'd have to be sure that attempting to read any file with format 2 using method 1 will generate an error, i.e., you don't want to be able to "successfully" call read_file_method_1() on a file with format 2 and get nonsense results. You may have to do some sanity-check in read_file_method_1() that makes sure everything looks good and if not, throw an error to trigger the catch block, which will call read_file_method_2().
Is that more-or-less the situation here? If not, please explain more about what the situation is, and maybe attach a couple of sample text files.
Stuart Nezlek
Stuart Nezlek il 24 Gen 2022
In theory, yes that's what I am attempting to do. I've never used the try, catch in Matlab before so I am wondering that since I would try to open file 1 with the 1st read would it automatically try and ready file 2 in the same manner? I've attached the two different text files I'm working with as an example.
My idea so far was to use the different endings of the files to differentiate the two and do this for however many files I specify via input. Does that make sense?

Accedi per commentare.

Risposta accettata

Voss
Voss il 24 Gen 2022
Since the only difference between the way the two file types are read with readtext() is the delimiter, you can try different delimiters until you find one that works. With those two files you posted, I found that readtext() returns all NaNs if you use the wrong delimiter, so I'm using that as the condition that determines whether the file was read correctly or not. (If you have any file that returns all NaNs which needs to be considered valid, then you'd have to use a different condition.)
The following code loops over a set of files and for each file tries readtext() with each different delimiter (in this case just '[\t]' and ';' but the code will work for any number of delimiters) until one gives something that's not all NaNs. Then, for the next file, the delimiter that worked is tried first.
my_files = {'111111111d_test_HT_td.txt' '111111111e_ss.txt'};
delimiters = {'[\t]' ';'};
n_delimiters = numel(delimiters);
delimiter_idx = 1;
for i = 1:numel(my_files)
fprintf('preparing to read file %s:\n',my_files{i});
tried_delimiters = false(1,n_delimiters);
success = false;
while any(~tried_delimiters)
fprintf('\ttrying readtext() with delimiter ''%s'' ...\n',delimiters{delimiter_idx});
data = readtext(my_files{i},delimiters{delimiter_idx},'','','numeric');
tried_delimiters(delimiter_idx) = true;
if all(isnan(data(:)))
fprintf('\tfailed\n');
delimiter_idx = mod(delimiter_idx,n_delimiters)+1; % switch to the next delimiter
continue
end
success = true;
break
end
if success
% successfully read my_files{i} with delimiter delimiters{delimiter_idx}
fprintf('\tsuccess\n');
else
% couldn't figure out how to read this file
fprintf('\tall delimiters failed. couldn''t read the file\n');
continue
end
if delimiter_idx == 1
% do file type 1 stuff
else
% do file type 2 stuff
end
end
preparing to read file 111111111d_test_HT_td.txt:
trying readtext() with delimiter '[\t]' ...
success
preparing to read file 111111111e_ss.txt:
trying readtext() with delimiter '[\t]' ...
failed
trying readtext() with delimiter ';' ...
success
  6 Commenti
Stuart Nezlek
Stuart Nezlek il 31 Gen 2022
Benjamin,
I wanted to ask one additional question that maybe you can help me with.
Now that I've read these files into a cell array, is there someway I am able to read the contents of each cell and use the delimiters to determine whether the file is type 1 or type 2? I have tried strcmp as well as strfind but both options have proven to return no values.
Essentially what I'm trying to automate is if a file uses delimiter type 1, then assign it a value of 1. If it uses the delimiter type 2, then Ideally I would assign it a value of 2 and have that value be written in a new array.
Voss
Voss il 1 Feb 2022
When you say you've "read these files into a cell array", I assume that means you've modified the code I posted so that when it finds a delimiter that works, it stores the data variable in a cell array with one cell per file. (Which seems like a reasonable thing to do.) Is that what you mean?
If that's the case, and now you want to know how to figure out from the contents of each cell whether it was a type 1 or type 2 file, then you'd have to be able to distinguish between the two file types based on what comes from readtext() for each file type. readtext() returns a matrix, so you'd have to know something about the size of possible matrices returned by readtext() in each case or the possible locations of the NaN's in the matrix, etc. I have no idea about the range of possiblities for what those files could possibly contain, so I wouldn't be able to put any conditions on the matrices from readtext() in order to distinguish one type from another. But you may know more about what the possibilities are for those file types and hence what the matrices from readtext should look like, so you may be able to come up with some condition to distinguish the two types.
However, I think it may be easier to just keep track of each file's type when it is succesfully read, rather than trying to go back after the fact and figure it out from the matrices you end up with. That would look something like this minor modification to the code above:
my_files = {'111111111d_test_HT_td.txt' '111111111e_ss.txt'};
N = numel(my_files);
file_type = zeros(1,N);
file_data = cell(1,N);
delimiters = {'[\t]' ';'};
n_delimiters = numel(delimiters);
delimiter_idx = 1;
for i = 1:N
fprintf('preparing to read file %s:\n',my_files{i});
tried_delimiters = false(1,n_delimiters);
success = false;
while any(~tried_delimiters)
fprintf('\ttrying readtext() with delimiter ''%s'' ...\n',delimiters{delimiter_idx});
data = readtext(my_files{i},delimiters{delimiter_idx},'','','numeric');
tried_delimiters(delimiter_idx) = true;
if all(isnan(data(:)))
fprintf('\tfailed\n');
delimiter_idx = mod(delimiter_idx,n_delimiters)+1; % switch to the next delimiter
continue
end
file_type(i) = delimiter_idx;
file_data{i} = data;
success = true;
break
end
if success
% successfully read my_files{i} with delimiter delimiters{delimiter_idx}
fprintf('\tsuccess\n');
else
% couldn't figure out how to read this file
fprintf('\tall delimiters failed. couldn''t read the file\n');
end
end
Then you could run through your subsequent operations with the data from the files like this:
for i = 1:N
if file_type(i) == 1
% do file type 1 stuff with file_data{i}
elseif file_type(i) == 2
% do file type 2 stuff with file_data{i}
end
end
I'm not sure if that answers your question. If not, let me know.

Accedi per commentare.

Più risposte (1)

Stuart Nezlek
Stuart Nezlek il 2 Feb 2022
Modificato: Stuart Nezlek il 2 Feb 2022
"When you say you've "read these files into a cell array", I assume that means you've modified the code I posted so that when it finds a delimiter that works, it stores the data variable in a cell array with one cell per file. (Which seems like a reasonable thing to do.) Is that what you mean?"
  • Correct. I've changed it so that I am given back a 1x(number of files read in with the two different types of delimiters)
"However, I think it may be easier to just keep track of each file's type when it is succesfully read, rather than trying to go back after the fact and figure it out from the matrices you end up with. That would look something like this minor modification to the code above"
  • I've implemented this small change, and I see where I was messing up after reading your example! I kept on getting the same error that the cell contents couldn't be read but that was because I was reading them and not having anywhere for the data to be stored after being read so the data was being overwritten for the length of the cell array. By assigning variables (like your example did), I have now been able to define the different data file types and Can continue to work the the data.
Again, thank you for pointing out my simple errors! You have helped me a lot.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by