How can I speed up my script? I'm using loops and contains function and I have to screen 14K X 167K variables

Question

Alan Cesar Pilon Miro il 24 Apr 2018

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/397056-how-can-i-speed-up-my-script-i-m-using-loops-and-contains-function-and-i-have-to-screen-14k-x-167k

Commentato: Guillaume il 25 Apr 2018

Hello, I'm doing text mining in attempt to organize my data. I have a table with more then 200K rows and 12 columns and I want extract some information from one of columns. Indeed, I'm looking for names that match with my reference table (approx. 14K names). For that, I'm using contains function. For make this search, I'm using two loops. First to lock one of 14K names and second for look for this name in the 200K rows. This takes a very long time. Could help to speed up my script? Thanks

Here I show you the code:

if true
  % code
1st loop (reference name table)
for k=2:14045;
    clear test
    clear Genr
    test=DNP(k,10);
    virgula=',';
    Space= ' ';
    Genr=Space+test+Space;
 Second loop (my raw table with more than 200K rows)
    for i=15001:16000;
        clear Presence
        clear A
        clear B
        clear C
        BiolSource=DNP(i,3);
        Presence=contains(BiolSource, Genr, 'IgnoreCase',true);
            if Presence ==1;
                A=DNP(i,13);
                B=DNP(k,11);
                    DNP(i,13)=A+virgula+Space+test+Space+B;
                    C=DNP(i,13);
                     DNP(i,13)=erase(C,"0, ");
            end
    end
end

5 Commenti
Mostra 3 commenti meno recentiNascondi 3 commenti meno recenti

Guillaume il 24 Apr 2018

Modificato: Guillaume il 24 Apr 2018

So, to be clear, you want to identify in column C of Ask3 which term is the genera. All possible genera are stored in column A of Ask2?

Assumption: there is always one and only one genera in column C.

Alan Cesar Pilon Miro il 24 Apr 2018

Yes, I want to find one of genera present in A ASk2 in Column C. Can there is more than one genus by row of column C

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

Guillaume il 24 Apr 2018

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/397056-how-can-i-speed-up-my-script-i-m-using-loops-and-contains-function-and-i-have-to-screen-14k-x-167k#answer_316957

Apri in MATLAB Online

reference = readtable('MATLAB ASk2.xls', 'ReadVariableNames', false, 'Range', 'A:B');
raw = readtable('MATLAB ASk3.xls');
genera = lower(reference.Var1);  %convert everything to lower case for easier comparison
matched_genera = rowfun(@(t) {genera(ismember(genera, strsplit(lower(t))))}, raw, 'InputVariables', 'Text', 'ExtractCellContent', true, 'OutputVariableNames', 'matched');

Each row of matched_genera is a cell array contain 0, 1 or more of the genera found in 'Text' (case insensitive). You can concatenate that with the original table if you wish:

newraw = [raw, matched_genera]

2 Commenti
Mostra NessunoNascondi Nessuno

Alan Cesar Pilon Miro il 24 Apr 2018

Could you include column 2 (family) from the reference table as well in the newraw?

Guillaume il 25 Apr 2018

Apri in MATLAB Online

The easiest way to do that is to create a separate m file for the rowfun function:

In its own match_raw.m file:

function [matched_genera, family] = match_raw(raw_row, genera_lower, family)
     ismatch = ismember(genera_lower, strsplit(lower(raw_row)));
     matched_genera = {genera_lower(ismatch)};
     family = {family(ismatch)};
end

The rowfun call then becomes:

matches = rowfun(@(t) match_raw(t, genera, reference.Var2), raw, 'InputVariables', 'Text', 'ExtractCellContent', true, 'OutputVariableNames', {'matched_genera', 'family'})

Accedi per commentare.

How can I speed up my script? I'm using loops and contains function and I have to screen 14K X 167K variables

5 Commenti
Mostra 3 commenti meno recentiNascondi 3 commenti meno recenti

Risposta accettata

2 Commenti
Mostra NessunoNascondi Nessuno

Più risposte (0)

Vedere anche

Categorie

Tag

Community Treasure Hunt

How can I speed up my script? I'm using loops and contains function and I have to screen 14K X 167K variables

5 Commenti Mostra 3 commenti meno recentiNascondi 3 commenti meno recenti

Risposta accettata

2 Commenti Mostra NessunoNascondi Nessuno

Più risposte (0)

Vedere anche

Categorie

Tag

Community Treasure Hunt

5 Commenti
Mostra 3 commenti meno recentiNascondi 3 commenti meno recenti

2 Commenti
Mostra NessunoNascondi Nessuno