Converting large semicolon-seperated .txt file to string array

4 visualizzazioni (ultimi 30 giorni)
Hello together, I have a large text file, which look like this:Two exemplary lines are shown below.
There are ~200,000 lines and ~40 to ~500 columns.
My problem is, that the bold italic part occurs up to 135 times per line (with different content of course). But it can also occur only once. The original file misses all the ";" delimiters in that case.
____________________
"117";"117344";"AAAA";"BBBB";;;"Test1234";;"20150209 ";"2709026500 2709033401 2709040700";;"20000310 ";"1234027600 1114033401 9994040700"
"117";"117344";"AAAA";"BBBB";;;"Test1234";;"20150209 ";"2709026500 2709033401 2709040700"
____________________
My goal is to parse this text into columns - and it is working using regexp & cellfun so far.
filestr = fileread(i_filename);
filebyline = regexp(filestr, '\n', 'split'); %break it into lines
filebyline( cellfun('isempty',filebyline) ) = []; %remove empty lines
filebyfield = regexp(filebyline, '\;', 'split'); %split by fields
clear filebyline
numfields = cellfun(@length, filebyfield); %pad out so each line has the same number of fields
maxfields = max(numfields);
fieldpattern = repmat({[]}, 1, maxfields);
firstN = @(S,N) S(1:N); %Transpose?
filebyfield = cellfun(@(S) firstN([S,fieldpattern], maxfields), filebyfield, 'Uniform', 0);
fieldarray_base = vertcat(filebyfield{:}); %switch from cell vector of cell vectors into a 2D cell
But is is generating a huge amount of data in a cell structure, that I later want to convert back to a string array. Overall it needs too mach mamory.
Maybe someone can give me a hint towards an efficient solution. Thanks!

Risposte (0)

Categorie

Scopri di più su Characters and Strings in Help Center e File Exchange

Prodotti


Release

R2018b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by