Optimization code. One line takses 86% of working time.

Hello Everyone! I am developing a Clasificator for hand written signs and digits using Logical Regression and Softmax function.
I have a problem with one line in one function because it takes too much time.
function [grad] = logistic_cost_function(xTrain, yTrain, w)
%xTrain = [30134 x 11760]
%xTrain = [30134 x 1]
%w = [36 x 11760]
N = size(xTrain,1);
K = 36;
grad = zeros(size(w));
ARG = w * xTrain';
for n = 1 : N
xRow = xTrain(n,:);
Probability = softmax(ARG(:,n));
label_n = yTrain(n);
for k = 1 : K
IND = label_n == k;
prob_k = Probability(k);
grad(k,:) = grad(k,:) + (IND - prob_k) * xRow;
end
end
grad = -grad/N;
end
It is the time counted on only 2 epochs. My point is to count a few thousands of epchos.
Does anyone have any idea how to improve an algorithm?
Bet Regards

3 Commenti

Indeed, that line is called over a million times! Nested loops have a tendency to do that. But I'm glad you're using the profiler--it's a great tool for finding problem lines of code.
Can you explain in words what the overall goal is? What is grad?
IND is a logical value, a 0 or 1. Are you sure you want to be subtracting prob_k from that? That would be likely to give you a negative value from the subtraction for the cases where the two are not equal.
I believe that might be the point. Don't think of it as a negative probability, think of IND as an indicator function. You either subtract expectation by the probability of hitting the case, or add expectation by the probability of not hitting the case. I may be wrong though as I have no idea what he is actually trying to do.

Accedi per commentare.

Risposte (1)

Ahmet Cecen
Ahmet Cecen il 14 Mag 2016
Modificato: Ahmet Cecen il 14 Mag 2016
for k = 1 : K
IND = label_n == k;
prob_k = Probability(k);
grad(k,:) = grad(k,:) + (IND - prob_k) * xRow;
end
Here is a way out of this loop (well hopefully, as I can't actually run the code to verify):
IND = zeros(K,1); IND(label_n)=1;
prob_k = Probability;
% (IND - prob_k) IS A COLUMN VECTOR, SO OUTER PRODUCT.
grad = grad + (IND - prob_k)*xRow;
Hmm... turns out no need for bsxfun.

4 Commenti

You were right. IND it is an indicator function. The thing we have to count is grad. grad is and gradient from the cost function (FOTO1), Gradient from this fuctinon if FOTO2. And also I give to you Softmax funcion where a probability is shown.
What do I want a grad. It is necessery for Stochastic Gradiend Algorithm to minimize cost function.
I will try your solution in a few hours, when I am home :)
FOT1
FOTO2
FOTO3
function w = stochastic_gradient_descent(obj_fun, xTrain,
yTrain, w0, epochs, eta, mini_batch)
w = w0;
M = size(xTrain,1) / mini_batch;
for ep = 1 : epochs
for m = 1 : M
low_bord = (m-1) * mini_batch + 1;
upp_bord = m * mini_batch;
grad = obj_fun(xTrain(low_bord:upp_bord,:), yTrain(low_bord:upp_bord,:), w);
w = w - eta * grad;
end
end
end
Here is the funcion where I call the logistic_cost_function. It is given to SGA by a function pointer:
obj_fun = @(X,Y,w)(logistic_cost_function(X,Y,w));
As I mentioned before, xTrain = [30134x11760] Mini_batch is about 60-80. And Epochs as much ass possible. So far it is about 2000 per 24h.
Ahmet - bsxfun is not needed here because * between a column and row vector is well defined as a matrix product already. That outer product has always worked.
Yeah, in a previous version of my answer I didn't see that operation was an outer product for some reason and used a weird combination of bsxfun and repmat. Sometimes when I focus on trying to understand other people's code the simple stuff manages to elude me.

Accedi per commentare.

Richiesto:

il 13 Mag 2016

Commentato:

il 14 Mag 2016

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by