Speed Up String Conversion

9 visualizzazioni (ultimi 30 giorni)
Stephen Gray
Stephen Gray il 1 Mag 2024
Commentato: Stephen Gray il 1 Mag 2024
Hi all. I am trying to speed up string conversion of a table field as below :-
GoingUC=string(table2cell(Inps(:,5)));
Inps is a table with approximately 730000 records with 13 fields. I've got 6 categorical fields to convert and it is taking over 2.5 hours so I wondered if there was a quicker way to do this. I need a string array for the following code which converts the categorical strings to numbers in a map (which is quick) :-
[Unique_GoingU,~,GoingU_Numeric_Cats] = unique(GoingUC);
CTNM_GoingU=containers.Map(Unique_GoingU,num2cell(1:length(Unique_GoingU)));
NTD_GoingU=cell2mat(values(CTNM_GoingU,num2cell(GoingUC)));
It all works perfectly for my use but it's just if I can speed it up that would be great.
Steve Gray
  2 Commenti
Voss
Voss il 1 Mag 2024
The third output from unique is the same as the end result (or the transpose of the end result, if GoingUC is a row vector), so using a Map is unnecessary.
GoingUC = string(randi(10,10000,1))
GoingUC = 10000x1 string array
"9" "6" "2" "3" "9" "1" "10" "5" "4" "9" "4" "10" "10" "3" "10" "8" "7" "2" "9" "7" "2" "2" "3" "7" "8" "9" "7" "1" "1" "6"
[Unique_GoingU,~,GoingU_Numeric_Cats] = unique(GoingUC)
Unique_GoingU = 10x1 string array
"1" "10" "2" "3" "4" "5" "6" "7" "8" "9"
GoingU_Numeric_Cats = 10000x1
10 7 3 4 10 1 2 6 5 10
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
CTNM_GoingU=containers.Map(Unique_GoingU,num2cell(1:length(Unique_GoingU)));
NTD_GoingU=cell2mat(values(CTNM_GoingU,num2cell(GoingUC)))
NTD_GoingU = 10000x1
10 7 3 4 10 1 2 6 5 10
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
isequal(GoingU_Numeric_Cats,NTD_GoingU)
ans = logical
1
Stephen Gray
Stephen Gray il 1 Mag 2024
Thanks!

Accedi per commentare.

Risposta accettata

Voss
Voss il 1 Mag 2024
Avoid using table2cell for this; instead, access the table data directly (using curly braces {}, or, even better, dot indexing)
% 100000x1 table of categoricals
Inps = table(categorical(randi(10,100000,1)))
Inps = 100000x1 table
Var1 ____ 7 10 8 5 7 3 10 8 4 2 6 6 2 10 10 9
% using table2cell
tic
str1 = string(table2cell(Inps(:,1)));
toc
Elapsed time is 1.676799 seconds.
% using curly brace indexing
tic
str2 = string(Inps{:,1});
toc
Elapsed time is 0.013733 seconds.
% using dot indexing
tic
str3 = string(Inps.(1));
toc
Elapsed time is 0.005515 seconds.
Accessing the table data directly is > 100 times faster, and produces the same result:
isequal(str2,str2,str3)
ans = logical
1
  3 Commenti
Voss
Voss il 1 Mag 2024
Modificato: Voss il 1 Mag 2024
You're welcome!
table2cell could be useful for collecting multiple variables of a table into a cell array, particularly if the variables contain different classes of data. Although I would most likely just keep the data in table form.
T = table(rand(10,1),cellstr(char(65+randi([0,9],10,5))),string(rand(10,1)))
T = 10x3 table
Var1 Var2 Var3 _______ _________ __________ 0.23051 {'ADAJB'} "0.15424" 0.46691 {'FACCA'} "0.49046" 0.60176 {'BFJGB'} "0.12775" 0.97235 {'IBGBJ'} "0.93042" 0.26794 {'GCCAI'} "0.42212" 0.13361 {'GABEB'} "0.094709" 0.12238 {'EEFBH'} "0.14285" 0.24268 {'CDDDG'} "0.42503" 0.69713 {'IGHGF'} "0.075316" 0.59503 {'JFEBG'} "0.36855"
% table to cell keeps the data classes as they are in the table
C = table2cell(T(:,[1 2 3]))
C = 10x3 cell array
{[0.2305]} {'ADAJB'} {["0.15424" ]} {[0.4669]} {'FACCA'} {["0.49046" ]} {[0.6018]} {'BFJGB'} {["0.12775" ]} {[0.9724]} {'IBGBJ'} {["0.93042" ]} {[0.2679]} {'GCCAI'} {["0.42212" ]} {[0.1336]} {'GABEB'} {["0.094709"]} {[0.1224]} {'EEFBH'} {["0.14285" ]} {[0.2427]} {'CDDDG'} {["0.42503" ]} {[0.6971]} {'IGHGF'} {["0.075316"]} {[0.5950]} {'JFEBG'} {["0.36855" ]}
% but the concatenation required when accessing directly converts
% numeric and cell char to string, in order to combine the
% numeric and cell char table variables with the string variable
T{:,[1 2 3]}
ans = 10x3 string array
"0.23051" "ADAJB" "0.15424" "0.46691" "FACCA" "0.49046" "0.60176" "BFJGB" "0.12775" "0.97235" "IBGBJ" "0.93042" "0.26794" "GCCAI" "0.42212" "0.13361" "GABEB" "0.094709" "0.12238" "EEFBH" "0.14285" "0.24268" "CDDDG" "0.42503" "0.69713" "IGHGF" "0.075316" "0.59503" "JFEBG" "0.36855"
C = [T.(1) T.(2) T.(3)]
C = 10x3 string array
"0.23051" "ADAJB" "0.15424" "0.46691" "FACCA" "0.49046" "0.60176" "BFJGB" "0.12775" "0.97235" "IBGBJ" "0.93042" "0.26794" "GCCAI" "0.42212" "0.13361" "GABEB" "0.094709" "0.12238" "EEFBH" "0.14285" "0.24268" "CDDDG" "0.42503" "0.69713" "IGHGF" "0.075316" "0.59503" "JFEBG" "0.36855"
Stephen Gray
Stephen Gray il 1 Mag 2024
Cool, understood.

Accedi per commentare.

Più risposte (0)

Categorie

Scopri di più su Cell Arrays in Help Center e File Exchange

Prodotti


Release

R2024a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by