Regularization for Naive Bayes

Question

I have a data which the number of features is much more than the number of examples, let's say input X is a 50 * 5000 matrix, 50 is the number of examples and 5000 is the number of features. And Y is the label with two classes 1 or 0. Now I want to use Naive Bayes classifier to make classification of this data.
Because the features is much more than the examples, so the result is very poor because of the over-fitting. I already successfully tried lasso algorithm on this data and made pretty good classification result, now I want to compare it with Naive Bayes as a baseline. But the performance of NB is too bad to even make persuasive comparison. So I'm wondering whether I can add regularization to Naive Bayes like the lasso does and overcome this over-fitting problem.
Below is my Naive Bayes Code, can anyone help me to revise this and let it had the regularization function?
Thanks a lot!

  X = rand(50, 5000); % This is my train/test sample matrix
  Y = randi([0 1], 50, 1); % This is my train/test label vector
  
  CrossValSet = cvpartition(Y,'KFold',3); % 3-fold Cross Validation
  
  % Training set
  Train_sample = X(training(CrossValSet,1),:);
  Train_label = Y(training(CrossValSet,1));
  
  % Test set
  % Test_sample = X(training(CrossValSet,1),:);
  Test_sample = X(test(CrossValSet,1),:);
  % Test_label = Y(training(CrossValSet,1));
  Test_label = Y(test(CrossValSet,1),:); 
  
  Class_num = length(unique(Train_label)); % Classes Pool - 1 and 0
  Feature_num = size(Train_sample,2); 
  Para_mean =   cell(1,Class_num);%Mean for each feature and class 
  Para_dev = cell(1,Class_num);%Dev for each feature and class 
  Sample_byclass = cell(1,Class_num);%Reorder the data set by class 
  Prior_prob = zeros(1,Class_num);%Prior probability of each class 
  
  %% Algorithm Processing
  % Prior
  for i=1:1:size(Train_sample, 1)  
      Sample_byclass{1,Train_label(i,1)+1} = [Sample_byclass{1,Train_label(i,1)+1}; Train_sample(i,:)]; 
      Prior_prob(1,Train_label(i,1)+1) = Prior_prob(1,Train_label(i,1)+1) + 1; 
  end 
  Prior_prob = Prior_prob/size(Train_sample,1); % Prior probability 
  
  % Parameters from training set
  for i=1:1:Class_num 
       mu = mean(Sample_byclass{1,i}); 
       sigma = std(Sample_byclass{1,i});    
       Para_mean{1,i} = mu; 
       Para_dev{1,i} = sigma; 
  end 
  
  % Get predicted output for test set
  predict = []; 
  for i = 1:size(Test_sample)   %length(Test_sample) 
       prob = log(Prior_prob); 
       likelihood = 0; 
       for j = 1:Class_num 
           for k = 1:1:Feature_num  % Adjust sigma if it's zero
               if Para_dev{1,j}(1,k) == 0 
                   Para_dev{1,j}(1,k) = 0.1667; 
               end 
               % Log - Gaussian
               likelihood = likelihood - ( Test_sample(i,k) - Para_mean{1,j}(1,k))^2 / ( 2 * Para_dev{1,j}(1,k)^2 )   - log(Para_dev{1,j}(1,k)); 
                
           end  % For every Class
           prob(1,j) = prob(1,j)+likelihood; 
       end
       [value index] = max(prob); 
       predict = [predict ; index-1]; 
  end 
  accuracy = length(find(predict - Test_label ==0))/length(Test_label);

Regularization for Naive Bayes

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Risposte (0)

Categorie

Tag

Community Treasure Hunt

Regularization for Naive Bayes

0 Commenti Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Risposte (0)

Categorie

Tag

Vedere anche

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti