Read and seperate CSV data

54 visualizzazioni (ultimi 30 giorni)
Dan Howard
Dan Howard il 8 Feb 2016
Commentato: Noam Greenboim il 21 Mag 2020
Hello everyone,
Im back again with a simple question which I cant fathom out.
Have a CSV file that Im writing a script to:
  1. open and read the file
  2. seperate the data i.e have tail in column A, type_ref in column B, date in column C etc
  3. plot against date and amp
% [data, txtdata] = xlsread('Radial.csv');
length = size(data);
length = length(1,1);
rowoffset = 11 % from first csv row of numerical data
I hope this makes sense as I don't know how else to explain this:
The first current line puts the csv into a [64x1] cell but does not seperate the data.
eg.
'tail,type_ref,datetime,seq,seq_no,sys_ref,ref,reg_ref,res_seq,ci_ref,chan_ref,component_ref,passband_ref,amp'
'FPG123,FLYDATA,2013-12-16 11:21:51.110,3,3,FLYDATA Engines,FLYDATA 170 No 1 Engine (9),FLYDATA Engine Ground,7,FLYDATA SERIAL1,FLYDATA No 1 Engine Radial,FLYDATA No 1 Engine Shaft ,FLYDATA No Freq Passband,0.72825962305069'
The above example shows all the headings in one row and column whilse all the other data follows under from row two onwards.
I have been trying to get the data to show in either seperate cells ie:
tail type_ref datetime seq etc......... ( 14 columns altogether)
FPG123 FLYDATA 2013-12-16 11:21:51.110 3
FPG126 FLYDATA 2014-12-17 11:28:32.101 3
FPG128 FLYDATA 2015-12-02 11:28:47.100 3
I have tried using the delimiter by calling each heading as in :
% [tail type_ref datetime seq] = csvimport('Radial.csv', columns, [1:4],'delimiter',',') % only for 4 headings and data
or show all the data on one sheet. Really could appreciate some help on this. Many thanks.
  5 Commenti
Stephen23
Stephen23 il 9 Feb 2016
Modificato: Stephen23 il 9 Feb 2016
Here is an almost identical question, where each line was written as one string between quotation marks (and the user also thought it was a CSV file):
What tool creates these terrible file formats? Who writes something that can't even get a CSV file right, and is so buggy that it places the whole line within one set of quotation marks (and ignored fields that really were strings). Very puzzling.
dpb
dpb il 9 Feb 2016
Modificato: dpb il 9 Feb 2016
I don't know, but my first guess would be Excel if somebody pasted the whole thing record-by-record into a column without using the import tool to split columns...I think that's what it would then do on output to csv as each cell would have to be a string.
Well, let's just try...hmmm, ok, that's what happens excepting--this version of Excel, at least, used double quotes instead of single ones as the delimiter. Not sure if that's a user-settable option or dependent on locale, maybe.

Accedi per commentare.

Risposte (2)

Walter Roberson
Walter Roberson il 15 Feb 2016
You have a problem that your file (at least the fixedtest.csv that you posted) is UTF-16 Little Endian, which is an encoding format that MATLAB does not support. It needs to be translated. I wrote a routine that detects UTF encoding and offers a conversion routine from bytes; I have attached the source.
Using that routine:
input_file = 'fixedtest.csv';
[file_encoding, bytes_per_char, BOM_size, bytes2char] = detect_UTF_encoding(input_file);
if isempty(file_encoding)
error('No usable input file');
end
fid = fopen(input_file,'rt');
fread(fid, [1, BOM_size], '*uint8'); %skip the Byte Order Mark
thisbuffer = fgets(fid);
extra = mod(length(thisbuffer), bytes_per_char);
if extra ~= 0
%in little-endian modes, newline would be found in first byte and the 0's after need to be read
thisbuffer = [thisbuffer, fread(fid, [1, bytes_per_char - extra], '*uint8')];
end
thisline = bytes2char(thisbuffer);
data_cell = textscan(thisline, '%s', 'delimiter', ','); %will ignore the end of lines
header_fields = reshape(data_cell{1}, 1, []);
num_field = length(header_fields);
thisbuffer = fread(fid, [1 inf], '*uint8');
extra = mod(length(thisbuffer), bytes_per_char);
if extra ~= 0
thisbuffer = [thisbuffer, zeros(1, bytes_per_char - extra, 'uint8')];
end
thisline = bytes2char(thisbuffer);
fmt = repmat('%s', 1, num_field);
data_cell = textscan(thisline, fmt, 'delimiter', ',');
data_fields_text = horzcat(data_cell{:});
data_fields = data_fields_text;
data_fields(:,3) = num2cell(datenum(data_fields_text(:,3))); %or use datetime() instead of datenum
data_fields(:,4) = num2cell(str2double(data_fields_text(:,4)));
data_fields(:,5) = num2cell(str2double(data_fields_text(:,5)));
data_fields(:,9) = num2cell(str2double(data_fields_text(:,9)));
data_fields(:,13) = num2cell(str2double(data_fields_text(:,13)));
data_table = cell2table(data_fields, 'VariableNames', header_fields);
  4 Commenti
Noam Greenboim
Noam Greenboim il 21 Mag 2020
Here is a simple way I made to detect and write BOM to unicode files:
Usage:
CurrentBOM = BOM(UnicodeFileName, Encoding, FileToWrite)

Accedi per commentare.


dpb
dpb il 8 Feb 2016
Modificato: dpb il 8 Feb 2016
Your basic problem is the file was created as a text file of multiple strings which include the comma delimiters within them, not as a csv file, strictly speaking.
But, you can deal with it fairly simply...
>> fid=fopen('dan.csv'); % open a copy of your two lines
>> hdr=textscan(fid,'%s',14,'delimiter',',','whitespace',[' \b\t' ''''],'collectoutput',1)
hdr =
{14x1 cell}
>> hdr{:}
ans =
'tail'
'type_ref'
'datetime'
...[elided middle values for brevity]...
'passband_ref'
'amp''
>> dat=textscan(fid,[repmat('%s',1,13) '%f'],'delimiter',',','whitespace',[' \b\t' ''''],'collectoutput',1)
dat =
{1x13 cell} [7.2826e-01]
>> fid=fclose(fid);
Only tricky part is the addition of the single quote mark into the whitespace list; otherwise you'll end up with an extra tick for the first and last header names and potentially the last floating point value will error (altho I didn't test it; I'm sure it'll work for the first record, whether the internal record position will be ok to do subsequent lines or not I don't know for certain. The sure way is to ensure it's skipped as above.)
  16 Commenti
Dan Howard
Dan Howard il 18 Feb 2016
Ok, thanks dpb.
I think I have a better understanding of what you have said.
By using your method below does exactly what you said and opens and reads a new file:
f=textread('fixedtest.csv','%s','delimiter','\n'); % cell string array
f=strrep(f,'''',''); % remove the single quotes
fid=fopen('newfile.csv','wt'); % make a new file
for i=1:size(f,1)
fprintf(fid,'%s\n',char(f(i));
end
fid=fclose(fid);
Can I ask that when I get the error below, is this due to the file (fixedtest.csv) not having quote marks in each string and numeric value?
1. When i run the code above I get an error stating..
Error using strrep
Cell elements must be character arrays.
Error in filerun (line18)
f=strrep(f,'''','');
2. If i uncomment
%f=strrep(f,'''','');
This gives an error but still opens the newfile.csv.
Error using char
Cell elements must be character arrays.
Error in filerun (line 21)
fprint(fid,'%s\n',char(f(i)));
So I thought I would try using readtable with newfile.csv:
>> readtable('newfile.csv')
ans =
empty O-by-O table
I have included each step in what I have done.
Do you know why this is returning an empty file?
dpb
dpb il 18 Feb 2016
Modificato: dpb il 18 Feb 2016
See Walter Roberson's Answer -- he recognized a problem I didn't as I just opened the above file in Excel to see whether it did or did not have the quote marks as given in your very first post. That encoding keeps the Matlab normal i/o functions from functioning correctly.
I can't answer why you're getting the strrep error; it works here w/o error but on trying on the actual file textread only returns a few characters instead of the whole file.
Looking back, I note that you used xlsread originally and successfully read the text; the problem is the data weren't separated by cell in Excel. Thus xlsread can read the file as it is stored so you can either use Walter's converter script/function or stick with an Excel solution.
Probably still the simplest since you have an Excel spreadsheet and your apparent level of familiarity with the tools (assuming you do have Excel and not just the file) is to use the import tool in it when you open the file with it and convert the single cell per row into a regular spreadsheet format and save that then read that file as a "real" .xls file. Then xlsread will return the values separated by column to be imported into the table class.
Or, once you've done the conversion, presuming as noted that it's not something about your OS/locale creating the specific file encoding, if you save the spreadsheet as a .csv file where each column is a cell, then the direct readtable solution should work.
Either way, you've got to work around the Matlab limitation for the file encoding as Walter discovered.
I'm presuming that the resulting file will not also be UTF-16 Little Endian here...if that's inherent in your system for some reason owing to a locale setting, or somesuch, you're going to have much difficulty with Matlab it would seem.

Accedi per commentare.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by