Feature Selection QUBO (Quadratic Unconstrained Binary Optimization)
Note
Installation Required: This functionality requires MATLAB Support Package for Quantum Computing.
Feature Selection Overview
Feature selection is a dimensionality reduction technique used to select a subset of features (predictor variables) that provide the best predictive power in modeling a set of data for classification or regression.
Feature selection can be used to:
Prevent overfitting — Avoid modeling with an excessive number of features that are more susceptible to rote-learning-specific training examples.
Reduce model size — Increase computational performance with high-dimensional data or prepare a model for embedded deployment when memory might be limited.
Improve interpretability — Use fewer features, which might help identify those that affect model behavior.
Based on Mücke, Heese, Müller, Wolter, and Piatkowski [1], you can perform feature selection by using a QUBO model to select relevant predictors. Specifically, given a set of N scalar observations y(i), where the observations are associated with p variables (the predictors) x(i,j) for i = 1 through N and j = 1 through p. The problem is to find a subset of the p predictors such that the resulting model gives accurate predictions of the observations.
QUBO Data Model
As noted in [1], the suggested process for creating a QUBO data model is as follows:
Specify the input data as an N-by-p matrix
X, where each row is one data point.Specify the response data as an N-by-1 vector
Y, whereY(i)is the response to the rowX(i,:).Calculate the matrix
R(i,j)foriandjfrom 1 through p as the mutual information between columnsiandjofX.R(i,j)represents the redundancy between pairs of features. The mutual information is a nonnegative quantity that can be estimated by binning the data and counting entries in each bin to estimate the associated probabilities. For more information, see https://en.wikipedia.org/wiki/Mutual_information.Similarly, calculate the vector
J(i)forifrom 1 through p as the mutual information between columniofXand the response variableY. The quantityJ(i)is nonnegative and represents the strength of the coupling between columniand the responseY.For each value a from 0 through 1, define the QUBO as
Qa = (1 – a)R – diag(aJ).
Because each R and J entry is nonnegative, the
minimum of the QUBO problem is at the point [1,1,…,1] for a = 1 and the
point [0,0,…,0] for a = 0. In [1], Mücke proves that in
nontrivial problems, for each k from 0 through p,
there is a value a such that the number of nonzero entries in the
solution to Qa is equal to k.
In other words, for each number of predictors k, there is a QUBO problem
Qa whose solution has exactly
k nonzero entries. These entries represent the data columns in
X that best match the Y data while not matching the
other data columns.
Create Feature Selection Model with Synthetic Data
Create synthetic data for feature selection and then create a QUBO model to select the features. To train the resulting regression model, you need a Statistics and Machine Learning Toolbox™ license.
Create a synthetic data set using the
iSyntheticData1helper function.[N,p,X,Y] = iSyntheticData1;
View the number of data points and predictors.
N
N = 10000
p
p = 30
Select five features, the number of useful features in this example. In general, you can try to find how many useful features exist by cross-validating models with different numbers of features.
K = 5;
Create the
Rmatrix andJvector from the data by using 20 bins for the data, and estimating the mutual information by counting the number of entries in each bin. Use theiBinXY,iComputeMIXX, andiComputeMIXYhelper functions to create the matrix and vector.nbins = 20; [binnedXInfo,binnedYInfo] = iBinXY(X,Y,nbins); binnedX = binnedXInfo.binned; nbinsX = binnedXInfo.nbins; binnedY = binnedYInfo.binned; nbinsY = binnedYInfo.nbins; R0 = iComputeMIXX(binnedX,nbinsX); J = iComputeMIXY(binnedX,binnedY,nbinsX,nbinsY); % Scale $R$ to make $\alpha = 0.5$ correspond to equal redundancy and % relevance terms. R = R0/(K-1);
Find the appropriate value of a for the QUBO Qa = (1 – a)R – diag(aJ). To do so, solve for the number of nonzero elements in the solution by using the
howmanyhelper function, and then usefzeroto set that number equal toK.fun = @(alpha)howmany(alpha,R,J) - K; alphasol = fzero(fun,[0 1]); [~,xsol] = howmany(alphasol,R,J);
View the selected predictors.
find(xsol.BestX)
ans = 6 8 9 11 13Train a regression tree, first using the full set of 30 predictors and then using the selected set of five predictors. Specify a common holdout value of 0.2, and compare the cross-validation losses. First train the regression tree using all the predictors. Training a regression tree requires a Statistics and Machine Learning Toolbox license.
rng default mdl = fitrtree(X,Y,CrossVal="on",Holdout=0.2); kfoldLoss(mdl)
ans = 0.0041
rng default X2 = X(:,find(xsol.BestX)); mdl2 = fitrtree(X2,Y,CrossVal="on",Holdout=0.2); kfoldLoss(mdl2)
ans = 0.0032
Although both regression trees have a low loss value, the regression tree that uses only five of the 30 predictors has a lower loss value.
Helper Functions
This code creates the iSyntheticData1 helper function.
function [N,p,X,y] = iSyntheticData1 rng default N = 10000; p = 30; useful = [6,8,9,11,13]; C = randn(p,p); R = corrcov(C'*C); X = mvnrnd(zeros(p,1),R,N); % Make features 15 to 19 highly correlated with useful features: % 15 -> 6 % 16 -> 8 % 17 -> 9 % 18 -> 11 % 19 -> 13 corrStd = 0.1; X(:,15:19) = X(:,useful) + corrStd*randn(N,5); noiseStd = 0.1; t = 0.5*cos(X(:,11)) + sin(X(:,9).*X(:,8)) + 0.5*X(:,13).*X(:,6) + noiseStd*randn(N,1); y = rescale(t,0,1); X = zscore(X); end
This code creates the iBinXY helper function. Note that this helper
function uses the iBinPredictors helper function.
function [binnedXInfo,binnedYInfo] = iBinXY(X,Y,nbins) binnedXInfo = iBinPredictors(X,nbins); binnedYInfo = iBinPredictors(Y,nbins); end
This code creates the iComputeMIXX helper function. Note that this
helper function uses the iComputeMIXIXJ helper function.
function Q = iComputeMIXX(binnedX,nbinsX) p = size(binnedX,2); Q = zeros(p,p); for i = 1:p for j = i+1:p Xi = binnedX(:,i); Xj = binnedX(:,j); nbinsXi = nbinsX(i); nbinsXj = nbinsX(j); Q(i,j) = iComputeMIXIXJ(Xi,Xj,nbinsXi,nbinsXj); end end Q = Q + Q'; end
This code creates the iComputeMIXIXJ helper function.
function mi = iComputeMIXIXJ(Xi,Xj,nbinsXi,nbinsXj) N = size(Xi,1); NXY = zeros(nbinsXi,nbinsXj); for n = 1:N a = Xi(n); b = Xj(n); NXY(a,b) = NXY(a,b) + 1; end NX = sum(NXY,2); NY = sum(NXY,1); NXNYDivN = NX.*NY/N; nonzeroID = NXY ~= 0; nonzeroNXY = NXY(nonzeroID); nonzeroNXNYDivN = NXNYDivN(nonzeroID); mi = sum((nonzeroNXY./N).*log(nonzeroNXY./nonzeroNXNYDivN),'all'); end
This code creates the iComputeMIXY helper function. Note that this
helper function uses the iComputeMIXIXJ helper function.
function f = iComputeMIXY(binnedX,binnedY,nbinsX,nbinsY) p = size(binnedX,2); f = zeros(p,1); for i = 1:p Xi = binnedX(:,i); nbinsXi = nbinsX(i); f(i) = iComputeMIXIXJ(Xi,binnedY,nbinsXi,nbinsY); end end
This code creates the howmany helper function.
function [n,result] = howmany(alpha,R,J) Q = qubo((1-alpha)*R - alpha*diag(J)); result = solve(Q); n = sum(result.BestX); end
This code creates the iBinPredictors helper function. Note that this
helper function uses the iDiscretize helper function.
function binnedInfo = iBinPredictors(X,nbins) [N,p] = size(X); binnedX = zeros(N,p); edgesX = cell(1,p); iscatX = false(1,p); nbinsX = zeros(1,p); istbl = istable(X); for i = 1:p if istbl oneX = X{:,i}; else oneX = X(:,i); end nbinsi = min(nbins, numel(unique(oneX))); [binnedX(:,i),edgesX{i},iscatX(i),nbinsX(i)] = iDiscretize(oneX,nbinsi); end binnedInfo = struct; binnedInfo.binned = binnedX; binnedInfo.edges = edgesX; binnedInfo.iscat = iscatX; binnedInfo.nbins = nbinsX; end
This code creates the iDiscretize helper function.
function [binnedx,edges,iscat,nbins] = iDiscretize(x,nbins) if iscategorical(x) binnedx = double(x); edges = []; iscat = true; nbins = numel(unique(x)); else probs = [0,(1:(nbins-1))/nbins,1]; edges = quantile(x,probs); binnedx = discretize(x,edges,IncludedEdge="right"); iscat = false; end end
References
[1] Mücke, S., R. Heese, S. Müller, M. Wolter, and N. Piatkowski. Quantum Feature Selection. arXiv:2203.13261v1, March 2022. Available at https://arxiv.org/pdf/2203.13261.
See Also
Functions
Objects
qubo|tabuSearch|qaoa