Yet another TEXTSCAN question...

Question

0 voti

Example string:

s = ['"1","2","3"' 10 '"","2","3"' 10 '"1","","3"' 10 '"1","2",""' 10 '"","",""' 10]
s =
    '"1","2","3"
     "","2","3"
     "1","","3"
     "1","2",""
     "","",""
     '

I want to extract columns as either cellstring or as numbers, using textscan (because it is fast). I can cheat and do this with the following:

t=textscan(strrep(s,'"',''),'%f%f%f','Delimiter',','); [t{:}] %as number
ans =
     1     2     3
   NaN     2     3
     1   NaN     3
     1     2   NaN
   NaN   NaN   NaN
t=textscan(strrep(s,'"',''),'%s%s%s','Delimiter',','); [t{:}] %as string
ans =
  5×3 cell array
    {'1'     }    {'2'     }    {'3'     }
    {0×0 char}    {'2'     }    {'3'     }
    {'1'     }    {0×0 char}    {'3'     }
    {'1'     }    {'2'     }    {0×0 char}
    {0×0 char}    {0×0 char}    {0×0 char}

But how to do it without strrep? so as to operate on file_id directly..

I have spent hours, thinking I've almost got it, 1 million permutations later no joy.. :'( :'( :'(

2 Commenti
Mostra Nessuno Nascondi Nessuno

Rik il 27 Mag 2018

Is strrep so much slower that it is not feasible?

Serge il 28 Mag 2018

Performance wise this is ok because textscan is doing 90% of the work, but it does requires the whole file to be read in first, even if say you only want one of 50 fields in the data. Using textscan directly on the file_id would have been neater. It looks like it is not possible with this file format...

Accedi per commentare.

Accedi per rispondere a questa domanda.

Follow Question

Answer 1

dpb il 27 Mag 2018

Modificato: dpb il 27 Mag 2018

Apri in MATLAB Online

1 voto

Let textscan do the equivalent strrep for you...

>> fmt1=repmat('%f',1,3);
>> t=cell2mat(textscan(s,fmt1,'delim',',','collectout',1,'whitespace','"'))
t =
     1     2     3
   NaN     2     3
     1   NaN     3
     1     2   NaN
   NaN   NaN   NaN
>> fmt2=repmat('%s',1,3);
>> t=textscan(s,fmt2,'delim',',','collectout',1,'whitespace','"')
t =
  1×1 cell array
    {5×3 cell}
>> t{:}
ans =
  5×3 cell array
    {'1"'    }    {'2"'    }    {'3"'    }
    {0×0 char}    {'2"'    }    {'3"'    }
    {'1"'    }    {0×0 char}    {'3"'    }
    {'1"'    }    {'2"'    }    {0×0 char}
    {0×0 char}    {0×0 char}    {0×0 char}
>>

7 Commenti
Mostra 5 commenti meno recenti Nascondi 5 commenti meno recenti

dpb il 28 Mag 2018

+1 Stephen; forgot about '%q'.

Serge, it works to reproduce your suggested/requested output for the strings case; the alternative posted works for numeric.

Serge il 28 Mag 2018

Apri in MATLAB Online

Perhaps this is a better example:

s = ['"","",""' 10 '"a","2",""' 10 '"a","","c"' 10 '"","2","c"' 10]
s =
    '"","",""
     "a","2",""
     "a","","c"
     "","2","c"
     '

Where: any value can be any length or empty, any column may be numeric, which is described by file header.

I almost got it working with this ugly thing:

t = textscan(s,'%q%f%s','delim',{'","' '"'})

ISSUE: first column cannot be numeric, it must be string, and format for first column must be %q, for subsequent string columns must use %s.

Would love to see other suggestions.. Because this is ugly and has issues I'll stay with the strrep cheat.

Accedi per commentare.

Answer 2

Jeremy Hughes il 29 Mag 2018

Apri in MATLAB Online

0 voti

If the numbers are always surrounded by double-quotes, try this,

t = textscan(s,'"%f""%f""%f"','Delimiter',',')

or,

t = textscan(s,'%f%f%f', 'Delimiter',',','Whitespace',' \t"')

There's a lot of knobs in textscan. If you have a file with this kind of data, I suggest:

opts = detectImportOptions(filename)
t = readtable(filename)

HTH,

Jeremy

1 Commento
Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

Serge il 29 Mag 2018

Apri in MATLAB Online

Thank you Jeremy,

I think I have tried every permutation under the sun :/

This one fails when a value is empty: ,"",

t = textscan(s,'"%f""%f""%f"','Delimiter',',')

And this one grabs " at the end of strings, eg 'a"':

s = ['"a","","c"' 10 '"","2","c"' 10]
t = textscan(s,'%s%f%q','Delimiter',',','Whitespace',' \t"')

dbp said it looks like a BUG IN TEXTSCAN and I am inclined to agree.

Accedi per commentare.

Yet another TEXTSCAN question...

2 Commenti
Mostra Nessuno Nascondi Nessuno

Risposte (2)

7 Commenti
Mostra 5 commenti meno recenti Nascondi 5 commenti meno recenti

1 Commento
Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

Categorie

Tag

Community Treasure Hunt

Yet another TEXTSCAN question...

2 Commenti Mostra Nessuno Nascondi Nessuno

Risposte (2)

7 Commenti Mostra 5 commenti meno recenti Nascondi 5 commenti meno recenti

1 Commento Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

Categorie

Tag

Vedere anche

Community Treasure Hunt

2 Commenti
Mostra Nessuno Nascondi Nessuno

7 Commenti
Mostra 5 commenti meno recenti Nascondi 5 commenti meno recenti

1 Commento
Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti