Feature Selection QUBO (Quadratic Unconstrained Binary Optimization)
Note
Installation Required: This functionality requires MATLAB Support Package for Quantum Computing.
Feature Selection Overview
Feature selection is a dimensionality reduction technique used to select a subset of features (predictor variables) that provide the best predictive power in modeling a set of data for classification or regression.
Feature selection can be used to:
Prevent overfitting — Avoid modeling with an excessive number of features that are more susceptible to rote-learning-specific training examples.
Reduce model size — Increase computational performance with high-dimensional data or prepare a model for embedded deployment when memory might be limited.
Improve interpretability — Use fewer features, which might help identify those that affect model behavior.
Based on Mücke, Heese, Müller, Wolter, and Piatkowski [1], you can perform feature selection by using a QUBO model to select relevant predictors. Specifically, given a set of N scalar observations y(i), where the observations are associated with p variables (the predictors) x(i,j) for i = 1 through N and j = 1 through p. The problem is to find a subset of the p predictors such that the resulting model gives accurate predictions of the observations.
QUBO Data Model
As noted in [1], the suggested process for creating a QUBO data model is as follows:
Specify the input data as an N-by-p matrix
X
, where each row is one data point.Specify the response data as an N-by-1 vector
Y
, whereY(i)
is the response to the rowX(i,:)
.Calculate the matrix
R(i,j)
fori
andj
from 1 through p as the mutual information between columnsi
andj
ofX
.R(i,j)
represents the redundancy between pairs of features. The mutual information is a nonnegative quantity that can be estimated by binning the data and counting entries in each bin to estimate the associated probabilities. For more information, see https://en.wikipedia.org/wiki/Mutual_information.Similarly, calculate the vector
J(i)
fori
from 1 through p as the mutual information between columni
ofX
and the response variableY
. The quantityJ(i)
is nonnegative and represents the strength of the coupling between columni
and the responseY
.For each value a from 0 through 1, define the QUBO as
Qa = (1 – a)R – diag(aJ).
Because each R
and J
entry is nonnegative, the
minimum of the QUBO problem is at the point [1,1,…,1] for a = 1 and the
point [0,0,…,0] for a = 0. In [1], Mücke proves that in
nontrivial problems, for each k from 0 through p,
there is a value a such that the number of nonzero entries in the
solution to Qa is equal to k.
In other words, for each number of predictors k, there is a QUBO problem
Qa whose solution has exactly
k nonzero entries. These entries represent the data columns in
X
that best match the Y
data while not matching the
other data columns.
Create Feature Selection Model with Synthetic Data
Create synthetic data for feature selection and then create a QUBO model to select the features. To train the resulting regression model, you need a Statistics and Machine Learning Toolbox™ license.
Create a synthetic data set using the
iSyntheticData1
helper function.[N,p,X,Y] = iSyntheticData1;
View the number of data points and predictors.
N
N = 10000
p
p = 30
Select five features, the number of useful features in this example. In general, you can try to find how many useful features exist by cross-validating models with different numbers of features.
K = 5;
Create the
R
matrix andJ
vector from the data by using 20 bins for the data, and estimating the mutual information by counting the number of entries in each bin. Use theiBinXY
,iComputeMIXX
, andiComputeMIXY
helper functions to create the matrix and vector.nbins = 20; [binnedXInfo,binnedYInfo] = iBinXY(X,Y,nbins); binnedX = binnedXInfo.binned; nbinsX = binnedXInfo.nbins; binnedY = binnedYInfo.binned; nbinsY = binnedYInfo.nbins; R0 = iComputeMIXX(binnedX,nbinsX); J = iComputeMIXY(binnedX,binnedY,nbinsX,nbinsY); % Scale $R$ to make $\alpha = 0.5$ correspond to equal redundancy and % relevance terms. R = R0/(K-1);
Find the appropriate value of a for the QUBO Qa = (1 – a)R – diag(aJ). To do so, solve for the number of nonzero elements in the solution by using the
howmany
helper function, and then usefzero
to set that number equal toK
.fun = @(alpha)howmany(alpha,R,J) - K; alphasol = fzero(fun,[0 1]); [~,xsol] = howmany(alphasol,R,J);
View the selected predictors.
find(xsol.BestX)
ans = 6 8 9 11 13
Train a regression tree, first using the full set of 30 predictors and then using the selected set of five predictors. Specify a common holdout value of 0.2, and compare the cross-validation losses. First train the regression tree using all the predictors. Training a regression tree requires a Statistics and Machine Learning Toolbox license.
rng default mdl = fitrtree(X,Y,CrossVal="on",Holdout=0.2); kfoldLoss(mdl)
ans = 0.0041
rng default X2 = X(:,find(xsol.BestX)); mdl2 = fitrtree(X2,Y,CrossVal="on",Holdout=0.2); kfoldLoss(mdl2)
ans = 0.0032
Although both regression trees have a low loss value, the regression tree that uses only five of the 30 predictors has a lower loss value.
Helper Functions
This code creates the iSyntheticData1
helper function.
function [N,p,X,y] = iSyntheticData1 rng default N = 10000; p = 30; useful = [6,8,9,11,13]; C = randn(p,p); R = corrcov(C'*C); X = mvnrnd(zeros(p,1),R,N); % Make features 15 to 19 highly correlated with useful features: % 15 -> 6 % 16 -> 8 % 17 -> 9 % 18 -> 11 % 19 -> 13 corrStd = 0.1; X(:,15:19) = X(:,useful) + corrStd*randn(N,5); noiseStd = 0.1; t = 0.5*cos(X(:,11)) + sin(X(:,9).*X(:,8)) + 0.5*X(:,13).*X(:,6) + noiseStd*randn(N,1); y = rescale(t,0,1); X = zscore(X); end
This code creates the iBinXY
helper function. Note that this helper
function uses the iBinPredictors
helper function.
function [binnedXInfo,binnedYInfo] = iBinXY(X,Y,nbins) binnedXInfo = iBinPredictors(X,nbins); binnedYInfo = iBinPredictors(Y,nbins); end
This code creates the iComputeMIXX
helper function. Note that this
helper function uses the iComputeMIXIXJ
helper function.
function Q = iComputeMIXX(binnedX,nbinsX) p = size(binnedX,2); Q = zeros(p,p); for i = 1:p for j = i+1:p Xi = binnedX(:,i); Xj = binnedX(:,j); nbinsXi = nbinsX(i); nbinsXj = nbinsX(j); Q(i,j) = iComputeMIXIXJ(Xi,Xj,nbinsXi,nbinsXj); end end Q = Q + Q'; end
This code creates the iComputeMIXIXJ
helper function.
function mi = iComputeMIXIXJ(Xi,Xj,nbinsXi,nbinsXj) N = size(Xi,1); NXY = zeros(nbinsXi,nbinsXj); for n = 1:N a = Xi(n); b = Xj(n); NXY(a,b) = NXY(a,b) + 1; end NX = sum(NXY,2); NY = sum(NXY,1); NXNYDivN = NX.*NY/N; nonzeroID = NXY ~= 0; nonzeroNXY = NXY(nonzeroID); nonzeroNXNYDivN = NXNYDivN(nonzeroID); mi = sum((nonzeroNXY./N).*log(nonzeroNXY./nonzeroNXNYDivN),'all'); end
This code creates the iComputeMIXY
helper function. Note that this
helper function uses the iComputeMIXIXJ
helper function.
function f = iComputeMIXY(binnedX,binnedY,nbinsX,nbinsY) p = size(binnedX,2); f = zeros(p,1); for i = 1:p Xi = binnedX(:,i); nbinsXi = nbinsX(i); f(i) = iComputeMIXIXJ(Xi,binnedY,nbinsXi,nbinsY); end end
This code creates the howmany
helper function.
function [n,result] = howmany(alpha,R,J) Q = qubo((1-alpha)*R - alpha*diag(J)); result = solve(Q); n = sum(result.BestX); end
This code creates the iBinPredictors
helper function. Note that this
helper function uses the iDiscretize
helper function.
function binnedInfo = iBinPredictors(X,nbins) [N,p] = size(X); binnedX = zeros(N,p); edgesX = cell(1,p); iscatX = false(1,p); nbinsX = zeros(1,p); istbl = istable(X); for i = 1:p if istbl oneX = X{:,i}; else oneX = X(:,i); end nbinsi = min(nbins, numel(unique(oneX))); [binnedX(:,i),edgesX{i},iscatX(i),nbinsX(i)] = iDiscretize(oneX,nbinsi); end binnedInfo = struct; binnedInfo.binned = binnedX; binnedInfo.edges = edgesX; binnedInfo.iscat = iscatX; binnedInfo.nbins = nbinsX; end
This code creates the iDiscretize
helper function.
function [binnedx,edges,iscat,nbins] = iDiscretize(x,nbins) if iscategorical(x) binnedx = double(x); edges = []; iscat = true; nbins = numel(unique(x)); else probs = [0,(1:(nbins-1))/nbins,1]; edges = quantile(x,probs); binnedx = discretize(x,edges,IncludedEdge="right"); iscat = false; end end
References
[1] Mücke, S., R. Heese, S. Müller, M. Wolter, and N. Piatkowski. Quantum Feature Selection. arXiv:2203.13261v1, March 2022. Available at https://arxiv.org/pdf/2203.13261.
See Also
fitrtree
(Statistics and Machine Learning Toolbox) | solve
| qubo
| tabuSearch