Importing a .csv to matlab, delimiter problem

125 visualizzazioni (ultimi 30 giorni)
I need to import .csv file with data saved as lists. But Matlab does not recognize the comma as a delimiter. I am thinking the problem is because the list has "", but I do not know how to fix it. The same happens in Excel. I am using the matlab built-in import function, to create the script.
%%Initialize variables.
filename = 'C:\Users\Gert Lõhmus\Downloads\ShipData.csv';
delimiter = {',',' '};
%%Format string for each line of text:
% column1: text (%q)
% For more information, see the TEXTSCAN documentation.
formatSpec = '%q%[^\n\r]';
%%Open the text file.
fileID = fopen(filename,'r');
%%Read columns of data according to format string.
% This call is based on the structure of the file used to generate this
% code. If an error occurs for a different file, try regenerating the code
% from the Import Tool.
dataArray = textscan(fileID, formatSpec, 'Delimiter', delimiter, 'ReturnOnError', false);
%%Close the text file.
fclose(fileID);
%%Post processing for unimportable data.
% No unimportable data rules were applied during the import, so no post
% processing code is included. To generate code which works for
% unimportable data, select unimportable cells in a file and regenerate the
% script.
%%Create output variable
ShipData = table(dataArray{1:end-1}, 'VariableNames', {'RegisterNumberIMONumberShipNameCallSignTypeserviceSystemShipTyp'});
%%Clear temporary variables
clearvars filename delimiter formatSpec fileID dataArray ans;
  5 Commenti
Stephen23
Stephen23 il 30 Gen 2016
Modificato: Stephen23 il 30 Gen 2016
@Gert Lõhmus: the file has some rather strange features, for example:
  • double quotes at the beginning and end of each line. Normally this would indicate text fields that needs to be kept together, but in this case it appears to be a mistake, as the first line includes header strings, indicating that there should actually be multiple columns, not just one (string) column.
  • significantly different numbers of commas per line, which seems to be because of erroneous encoding of the string fields without string delimiters (e.g. double quotes).
Judging by the column names and the content of the file it seems whatever tool encoded the file has completely messed up creating a useful CSV file. Fields that should have double quotes to keep them together are missing them entirely, and some fields seem to be missing but this is not indicated by two consecutive delimiter characters.
Here are a few lines of the file, to which I have added space character to indicate which fields seem to be containing the same information. However you will see that other fields are missing completely, and some fields contain commas (without double quotes).
"Register Number,IMO Number, Ship Name,Call Sign, Type & service,...,"
" 12536D, 9180401, CEMSOL, C4WH2, General cargo ship,...,"
" 14968W, 9376036, RUTH, 5BEG2, Container ship,...,"
...
" 961S32, 9063665,MURCIA CARRIER, HAINAN BAOSHA 011 (2011), LOUIS PASTEUR (1996), 3FJZ2, Refrigerated cargo ship,Other cargo ship,...,"
" 11930U, SAND CARRIER 101, Hopper barge,...,"
There is no sensible way to "recreate" missing fields, or identify which ones need to be aligned. The simplest solution would be to find the tool that defined that file and export the data again, this time using a standard CSV file (with double quote string delimiters, etc).

Accedi per commentare.

Risposta accettata

Star Strider
Star Strider il 29 Gen 2016
This will read it:
fidi = fopen('Gert Lõhmus ShipData.csv','r');
L = fgets(fidi);
k1 = 0;
while ~(feof(fidi))
k1 = k1 + 1;
L = fgetl(fidi);
fields = length(regexp(L, ','))+1;
Data{k1} = textscan(L, repmat('%s', 1, fields), 'Delimiter',',');
end
It produces an array of 633 cells (for this file) of varying lengths, because each line has a different length. After that, it will be up to you as to how you want to process the results.
  2 Commenti
N/A
N/A il 30 Gen 2016
Thanks, this really does create the job. But how could I turn the cell array to table? cell2table(Data) creates a 1x633 table. Thank you.
Star Strider
Star Strider il 30 Gen 2016
As always, my pleasure.
I doubt you could turn it into a table in the form my code reads it in, because the number of columns varies in each line. Putting it into a format for a table would at least require significant pre-processing.
I have no idea what the individual fields mean, so I have no idea how to characterise them, determine the ones that are missing, and parse them into columns, probably with NaN entries in the empty columns. Most of them do not seem to me to have identifiable repeated formats or content that would make that task straightforward. If there were empty cells in the original file so all the lines had equal numbers of columns, that would make this task immeasurably easier.
I would search to see if the source of the information in that file has a version that has equal-length lines (the same number of columns — including empty columns — in every line), and then read that file. (My code would work with it as well, but then a single textscan call would likely be able to also.)

Accedi per commentare.

Più risposte (1)

John BG
John BG il 30 Gen 2016
try to save the file as SHipData.xlsx and the following:
filename3='ShipData.xlsx'
[num,txt,raw]=xlsread(filename3)
the data you want is all in the cell txt
John

Prodotti

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by