Auto Detect different file types?

Question

Stuart Nezlek il 21 Gen 2022

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/1633490-auto-detect-different-file-types

Modificato: Stuart Nezlek il 2 Feb 2022

Hello,

I am trying to edit a program so that it is capable of auto detecting different text files. Currently, I am using two different pograms to open and report the seperate text files using the following bits of code:

Program 1:

 filespec=[fpath char(fnameALL(2))]; TD=filespec;
        delimiter='[\t]';comment='';quotes='';options='numeric';
        [TDdata, ~]= readtext(filespec, delimiter, comment, quotes, options);
        pVelocity = TDdata(6,62)*100; pCadence = (TDdata(6,21)+TDdata(6,48))/2;
        pStride = TDdata(6,65)*100; pStepWidth = TDdata(6,68)*100;
        pGSR=(pCadence/60)/(pVelocity/100);
        pRTO = (TDdata(6,39)/TDdata(6,33))*100; pLTO = (TDdata(6,12)/TDdata(6,9))*100;
        pRSS = (TDdata(6,30)/TDdata(6,33))*100; pLSS = (TDdata(6,57)/TDdata(6,9))*100;
        pRSTEP = TDdata(6,42)*100; pLSTEP = TDdata(6,15)*100;
        pROTO = (TDdata(6,36)/TDdata(6,33))*100; pLOTO = (TDdata(6,60)/TDdata(6,9))*100;
        
        ToeOff = [pRTO pLTO];
        
        filespec=[fpath char(fnameALL(3))]; TD=filespec;
        delimiter='[\t]';comment='';quotes='';options='numeric';
        [TDalldata, result]= readtext(filespec, delimiter, comment, quotes, options);
        Num_trialstd=(length(TDalldata(1,:))-1)/68;
        

Program 2:

        
     [fname fpath]=uigetfile('*.txt','Please select the _td file');
    conditionid=input('Enter the condition (no spaces):  ','s');
    cd(fpath);
    [dataALL,results]=readtext(fname,';','','','numeric');
    [row, col]=find(dataALL(:,3)>0);
    data=dataALL(row:length(dataALL),:);

What I am wondering is if there is a function I am unaware of that would automatically be able to distinguish the differences between text files?

If what i'm asking is unclear, I can provide clarification.

Thank you.

2 Commenti
Mostra NessunoNascondi Nessuno

Voss il 21 Gen 2022

Apri in MATLAB Online

Is the idea is that you have a set of text files, each of which may have one format or another but you can't tell what the format is beforehand? If so, you may try something along the lines of:

try
    read_file_method_1(file_name);
catch
    read_file_method_2(file_name);
end

But you'd have to be sure that attempting to read any file with format 2 using method 1 will generate an error, i.e., you don't want to be able to "successfully" call read_file_method_1() on a file with format 2 and get nonsense results. You may have to do some sanity-check in read_file_method_1() that makes sure everything looks good and if not, throw an error to trigger the catch block, which will call read_file_method_2().

Is that more-or-less the situation here? If not, please explain more about what the situation is, and maybe attach a couple of sample text files.

Stuart Nezlek il 24 Gen 2022

In theory, yes that's what I am attempting to do. I've never used the try, catch in Matlab before so I am wondering that since I would try to open file 1 with the 1st read would it automatically try and ready file 2 in the same manner? I've attached the two different text files I'm working with as an example.

My idea so far was to use the different endings of the files to differentiate the two and do this for however many files I specify via input. Does that make sense?

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

Voss il 24 Gen 2022

1
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/1633490-auto-detect-different-file-types#answer_880780

Apri in MATLAB Online

Since the only difference between the way the two file types are read with readtext() is the delimiter, you can try different delimiters until you find one that works. With those two files you posted, I found that readtext() returns all NaNs if you use the wrong delimiter, so I'm using that as the condition that determines whether the file was read correctly or not. (If you have any file that returns all NaNs which needs to be considered valid, then you'd have to use a different condition.)

The following code loops over a set of files and for each file tries readtext() with each different delimiter (in this case just '[\t]' and ';' but the code will work for any number of delimiters) until one gives something that's not all NaNs. Then, for the next file, the delimiter that worked is tried first.

my_files = {'111111111d_test_HT_td.txt' '111111111e_ss.txt'};
delimiters = {'[\t]' ';'};
n_delimiters = numel(delimiters);
delimiter_idx = 1;
for i = 1:numel(my_files)
    
    fprintf('preparing to read file %s:\n',my_files{i});
    
    tried_delimiters = false(1,n_delimiters);
    success = false;
    
    while any(~tried_delimiters)
        
        fprintf('\ttrying readtext() with delimiter ''%s'' ...\n',delimiters{delimiter_idx});
        
        data = readtext(my_files{i},delimiters{delimiter_idx},'','','numeric');
        tried_delimiters(delimiter_idx) = true;
        
        if all(isnan(data(:)))
            fprintf('\tfailed\n');
            delimiter_idx = mod(delimiter_idx,n_delimiters)+1; % switch to the next delimiter
            continue
        end
        
        success = true;
        break
        
    end
    
    if success
        % successfully read my_files{i} with delimiter delimiters{delimiter_idx}
        fprintf('\tsuccess\n');
    else
        % couldn't figure out how to read this file
        fprintf('\tall delimiters failed. couldn''t read the file\n');
        continue
    end
    
    if delimiter_idx == 1
        % do file type 1 stuff
    else
        % do file type 2 stuff
    end
    
end
preparing to read file 111111111d_test_HT_td.txt:
	trying readtext() with delimiter '[\t]' ...
	success
preparing to read file 111111111e_ss.txt:
	trying readtext() with delimiter '[\t]' ...
	failed
	trying readtext() with delimiter ';' ...
	success

6 Commenti
Mostra 4 commenti meno recentiNascondi 4 commenti meno recenti

Voss il 25 Gen 2022

Modificato: Voss il 25 Gen 2022

Apri in MATLAB Online

Running this

my_files = uigetdir('C:\User\Desktop\Text Files','Please select a folder.');

and selecting a directory will give you a character vector my_files, containing the name of a folder (nothing about the contents of that folder), e.g., my_files might be:

'C:\User\Desktop\Text Files'

But to run the code above (which loops over my_files like "for i = 1:numel(my_files)"), my_files needs to be a cell array of character vectors corresponding to the names of the relevant files located within the selected folder. So, you need to tell the program how to do that.

It might be something like this:

% select a folder:
my_folder = uigetdir('C:\User\Desktop\Text Files','Please select a folder.');
% get info about all the txt files in the folder:
% (if you only want particular txt files, you'll have to modify this)
my_file_info = dir(fullfile(my_folder,'*.txt'));
% generate the full paths to the txt files in my_folder:
my_files = fullfile(my_folder,{my_file_info.name});
% then loop over my_files as above

Stuart Nezlek il 31 Gen 2022

Benjamin,

I wanted to ask one additional question that maybe you can help me with.

Now that I've read these files into a cell array, is there someway I am able to read the contents of each cell and use the delimiters to determine whether the file is type 1 or type 2? I have tried strcmp as well as strfind but both options have proven to return no values.

Essentially what I'm trying to automate is if a file uses delimiter type 1, then assign it a value of 1. If it uses the delimiter type 2, then Ideally I would assign it a value of 2 and have that value be written in a new array.

Voss il 1 Feb 2022

Apri in MATLAB Online

When you say you've "read these files into a cell array", I assume that means you've modified the code I posted so that when it finds a delimiter that works, it stores the data variable in a cell array with one cell per file. (Which seems like a reasonable thing to do.) Is that what you mean?

If that's the case, and now you want to know how to figure out from the contents of each cell whether it was a type 1 or type 2 file, then you'd have to be able to distinguish between the two file types based on what comes from readtext() for each file type. readtext() returns a matrix, so you'd have to know something about the size of possible matrices returned by readtext() in each case or the possible locations of the NaN's in the matrix, etc. I have no idea about the range of possiblities for what those files could possibly contain, so I wouldn't be able to put any conditions on the matrices from readtext() in order to distinguish one type from another. But you may know more about what the possibilities are for those file types and hence what the matrices from readtext should look like, so you may be able to come up with some condition to distinguish the two types.

However, I think it may be easier to just keep track of each file's type when it is succesfully read, rather than trying to go back after the fact and figure it out from the matrices you end up with. That would look something like this minor modification to the code above:

my_files = {'111111111d_test_HT_td.txt' '111111111e_ss.txt'};
N = numel(my_files);
file_type = zeros(1,N);
file_data = cell(1,N);
delimiters = {'[\t]' ';'};
n_delimiters = numel(delimiters);
delimiter_idx = 1;
for i = 1:N
    
    fprintf('preparing to read file %s:\n',my_files{i});
    
    tried_delimiters = false(1,n_delimiters);
    success = false;
    
    while any(~tried_delimiters)
        
        fprintf('\ttrying readtext() with delimiter ''%s'' ...\n',delimiters{delimiter_idx});
        
        data = readtext(my_files{i},delimiters{delimiter_idx},'','','numeric');
        tried_delimiters(delimiter_idx) = true;
        
        if all(isnan(data(:)))
            fprintf('\tfailed\n');
            delimiter_idx = mod(delimiter_idx,n_delimiters)+1; % switch to the next delimiter
            continue
        end
        
        file_type(i) = delimiter_idx;
        file_data{i} = data;
        
        success = true;
        break
        
    end
    
    if success
        % successfully read my_files{i} with delimiter delimiters{delimiter_idx}
        fprintf('\tsuccess\n');
    else
        % couldn't figure out how to read this file
        fprintf('\tall delimiters failed. couldn''t read the file\n');
    end
    
end

Then you could run through your subsequent operations with the data from the files like this:

for i = 1:N
    if file_type(i) == 1
        % do file type 1 stuff with file_data{i}
    elseif file_type(i) == 2
        % do file type 2 stuff with file_data{i}
    end
end

I'm not sure if that answers your question. If not, let me know.

Accedi per commentare.

Answer 2

Stuart Nezlek il 2 Feb 2022

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/1633490-auto-detect-different-file-types#answer_887270

Modificato: Stuart Nezlek il 2 Feb 2022

"When you say you've "read these files into a cell array", I assume that means you've modified the code I posted so that when it finds a delimiter that works, it stores the data variable in a cell array with one cell per file. (Which seems like a reasonable thing to do.) Is that what you mean?"

Correct. I've changed it so that I am given back a 1x(number of files read in with the two different types of delimiters)

"However, I think it may be easier to just keep track of each file's type when it is succesfully read, rather than trying to go back after the fact and figure it out from the matrices you end up with. That would look something like this minor modification to the code above"

I've implemented this small change, and I see where I was messing up after reading your example! I kept on getting the same error that the cell contents couldn't be read but that was because I was reading them and not having anywhere for the data to be stored after being read so the data was being overwritten for the length of the cell array. By assigning variables (like your example did), I have now been able to define the different data file types and Can continue to work the the data.

Again, thank you for pointing out my simple errors! You have helped me a lot.

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Auto Detect different file types?

2 Commenti
Mostra NessunoNascondi Nessuno

Risposta accettata

6 Commenti
Mostra 4 commenti meno recentiNascondi 4 commenti meno recenti

Più risposte (1)

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Vedere anche

Categorie

Tag

Community Treasure Hunt

Auto Detect different file types?

2 Commenti Mostra NessunoNascondi Nessuno

Risposta accettata

6 Commenti Mostra 4 commenti meno recentiNascondi 4 commenti meno recenti

Più risposte (1)

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Vedere anche

Categorie

Tag

Community Treasure Hunt

2 Commenti
Mostra NessunoNascondi Nessuno

6 Commenti
Mostra 4 commenti meno recentiNascondi 4 commenti meno recenti

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti