Main Content

Identify and Visualize Correlated Variables

When analyzing the relationships between data variables, you can identify and visualize correlated variables to gain insights into the data set. This example shows how to determine the strength and direction of relationships by using the corrcoef function to calculate correlation coefficients. The correlation coefficients range from –1 to 1, where:

  • Values close to 1 indicate a positive linear relationship between the data variables.

  • Values close to –1 indicate a negative linear relationship between the data variables (anticorrelation).

  • Values close to or equal to 0 suggest no linear relationship between the data variables.

Additionally, this example shows how to determine which correlations are significant and identify the most correlated pairs. The example plots correlations using a heatmap and bar chart, so you can visually compare the relationships between variables.

Compute Correlation Coefficients and P-Values

Generate random data with correlations among variables. Then, use the corrcoef function to calculate the correlation coefficients and corresponding p-values that describe the significance of the correlations.

rng(16)
A = randn(50,6);
A(:,3) = A(:,3) + 2*A(:,1);
A(:,4) = A(:,4) - 3*A(:,2);
A(:,5) = A(:,6) + 0.1*A(:,4) - 0.1*A(:,3);
[R,P] = corrcoef(A)
R = 6×6

    1.0000   -0.1248    0.8047    0.1300   -0.1726   -0.1009
   -0.1248    1.0000   -0.0744   -0.9612   -0.2314    0.0730
    0.8047   -0.0744    1.0000    0.0709   -0.3595   -0.2515
    0.1300   -0.9612    0.0709    1.0000    0.2611   -0.0549
   -0.1726   -0.2314   -0.3595    0.2611    1.0000    0.9384
   -0.1009    0.0730   -0.2515   -0.0549    0.9384    1.0000

P = 6×6

    1.0000    0.3878    0.0000    0.3683    0.2308    0.4858
    0.3878    1.0000    0.6075    0.0000    0.1060    0.6143
    0.0000    0.6075    1.0000    0.6248    0.0103    0.0781
    0.3683    0.0000    0.6248    1.0000    0.0670    0.7051
    0.2308    0.1060    0.0103    0.0670    1.0000    0.0000
    0.4858    0.6143    0.0781    0.7051    0.0000    1.0000

The returned matrices R and P, which contain the correlation coefficients and the p-values respectively, are symmetric. Extract the lower triangular part of R to focus on unique pairwise correlations.

R = tril(R,-1)
R = 6×6

         0         0         0         0         0         0
   -0.1248         0         0         0         0         0
    0.8047   -0.0744         0         0         0         0
    0.1300   -0.9612    0.0709         0         0         0
   -0.1726   -0.2314   -0.3595    0.2611         0         0
   -0.1009    0.0730   -0.2515   -0.0549    0.9384         0

Visualize Correlations Using Heatmap

Create a heatmap of the correlation coefficients to visualize the strength and direction of relationships between variables.

Convert the zeros in the correlation coefficient matrix, which mirror the redundant elements of the lower triangle correlations, into missing values (NaN). Next, create a colormap where variable pairs with negative correlations are in red, pairs with positive correlations are in blue, and pairs with no correlation are in white. The heatmap highlights the strongest correlations in bright red and bright blue.

Rheatmap = standardizeMissing(R,0);
map = [1 0 0;
    0.9 0.3 0.3;
    0.9 0.6 0.6;
    1 1 1;
    0.6 0.6 0.9;
    0.3 0.3 0.9;
    0 0 1];
h = heatmap(Rheatmap,Colormap=map,ColorLimits=[-1 1]);
h.Title = "Correlation Coefficients";
h.YLabel = "First Variable";
h.XLabel = "Second Variable";

Figure contains an object of type heatmap. The chart of type heatmap has title Correlation Coefficients.

Determine Significant Correlations

Identify significant correlations by filtering out those with a p-value greater than 0.05. This approach focuses the analysis only on relationships that are statistically meaningful and avoids interpreting random noise as a correlation.

threshold = 0.05;
R(abs(P) > threshold) = 0;
[firstVar,secondVar,corrCoef] = find(R)
firstVar = 4×1

     3
     4
     5
     6

secondVar = 4×1

     1
     2
     3
     5

corrCoef = 4×1

    0.8047
   -0.9612
   -0.3595
    0.9384

Display Correlations in Table

Compile significant correlations in a table, including indices, correlation coefficients, and p-values.

ind2 = sub2ind(size(P),firstVar,secondVar);
sigP = P(ind2);
TSig = table(firstVar,secondVar,corrCoef,sigP)
TSig=4×4 table
    firstVar    secondVar    corrCoef       sigP   
    ________    _________    ________    __________

       3            1         0.80471    1.8997e-12
       4            2        -0.96118    1.7065e-28
       5            3        -0.35949      0.010346
       6            5          0.9384    8.5819e-24

List the top three correlations by magnitude. The top correlations are consistent with the relationships established in the input data.

A(:,3) = A(:,3) + 2*A(:,1);
A(:,4) = A(:,4) - 3*A(:,2);
A(:,5) = A(:,6) + 0.1*A(:,4) - 0.1*A(:,3);
k = 3;
TTopk = topkrows(TSig,k,"corrCoef","descend",ComparisonMethod="abs")
TTopk=3×4 table
    firstVar    secondVar    corrCoef       sigP   
    ________    _________    ________    __________

       4            2        -0.96118    1.7065e-28
       6            5          0.9384    8.5819e-24
       3            1         0.80471    1.8997e-12

Visualize Top Correlations Using Bar Chart

Display the three most significant correlations using a horizontal bar chart. Represent negative correlations with red bars and positive correlations with blue bars.

labels = TTopk.firstVar + " & " + TTopk.secondVar;
b = barh(labels,abs(TTopk.corrCoef));

negCorr = TTopk.corrCoef < 0;
b.FaceColor = "flat";
b.CData(negCorr,:) = repmat([1 0 0],nnz(negCorr),1);
b.Labels = TTopk.corrCoef;
b.LabelLocation = "end-inside";

title("Top Correlations")
xlabel("Correlation Coefficient")
ylabel("Correlated Variables")

Figure contains an axes object. The axes object with title Top Correlations, xlabel Correlation Coefficient, ylabel Correlated Variables contains an object of type bar.

See Also

| |