detecting Excel files without Excel extension
5 visualizzazioni (ultimi 30 giorni)
Mostra commenti meno recenti
I need to be able to identify an Excel file without using xlsfinfo. It is a VERY slow function. Without it, my algorithm runs faster. Any ideas?
Background: Our test setup outputs text files that are named with the ".xls" extension. Back in Excel 2003 and before, if you named a tab delimited text file with "xls", Excel would open the file no problem and properly interpret it. Now, it is causing me headaches with importing data into MATLAB. MATLAB uses the ".xls" extension to check for wrong input to "textscan" and "importdata." Here is where it gets interesting. When you open a text file with the ".xls" extension and then save it, it is now an excel file and no longer tab delimited text file. Basically, I need to be able to tell the difference between text and Excel files without xlsfinfo because xlsfinfo is very slow.
0 Commenti
Risposta accettata
Iain
il 23 Mag 2013
You can try something like
fid = fopen(filename,'r');
fifty_bytes = fread(fid,50,'uint8'); fclose(fid);
Valid_characters = ['A':'Z' 'a':'z' ',./\ 0123456789' 9 10 13 '!"£$%^&*()[]{}-=_+ ... and any other valid characters you expect in your text files. ]
excel_file = false;
for i = 1:50
if any(fifty_bytes ~= Valid_characters)
excel_file = true;
end
end
3 Commenti
Più risposte (2)
José-Luis
il 23 Mag 2013
Modificato: José-Luis
il 23 Mag 2013
You could always check the file signature. You would need to read the file as a binary file and look into specific locations, using hexadecimal values.
Here is a snippet that will give you a string of the hexadecimal values:
fid = fopen('/path/to/test.xls','r');
uid = fread(fid,4,'ushort','b');
fseek(fid,520,'bof');
ftype = fread(fid,4,'ushort','b');
fclose(fid);
s1 = sprintf('%X',uid)
s2 = sprintf('%X',ftype)
If the file is an office document, s1 should be:
D0 CF 11 E0 A1 B1 1A E1 without the spaces
It it is an excel spreasheet, then s2 should be:
FD FF FF FF nn xx where the last two values (nn and xx) can change within a certain range.
It would be a matter of using strcmp(). You could of also compare hexadecimal values directly with fi(). For your purpose, it sounds like it would be enough to check if it is an Office document.
Text files carry no file identifier.
4 Commenti
José-Luis
il 24 Mag 2013
What are you getting? Maybe it is a little endian system? You could try with:
uid = fread(fid,4,'ushort','l');
Same thing for ftype.
Vedere anche
Categorie
Scopri di più su Spreadsheets in Help Center e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!