Rolling Beta For Multiple x and y variables simultaneously

Hello,
I have two matrices: One is an x-variable matrix (for example, 8 x variables as columns, with 1,000 rows, where each row represents a day). And one is a y-variable matrix (for example, 20 y variables as columns, with the same 1,000 rows).
I would like to calculate a matrix C that produces a rolling 100-day beta of each y variable to each x variable. Thus, C would have 20 * 8 = 160 columns. And moreover, since it's a rolling beta, the number of rows would be (1,000-100+1) = 901 rows (since the first 99 days wouldn't be eligible for a 100-day beta).
I have been playing around with various functions, e.g., corr, polyfit, and regress. However, none of these appear to address my query on rolling betas. In fact, I'm not sure I even see the ability to implement a rolling beta for just one variable in each matrix.
I would appreciate any guidance on this. Thank you!

10 Commenti

Adam Danz
Adam Danz il 4 Set 2019
Modificato: Adam Danz il 4 Set 2019
What's a beta? Do you mean a regression coefficient? And by rolling, do you mean a moving window?
Yes, by beta, I am referring to the conventional statistical definition, i.e., slope, or regression coefficient. Mathematically, beta of y to x is equivalent to: (correlation between x & y) * (vol of y) / (vol of x). It is also equal to: (covariance between x & y) / (variance of x).
And yes, exactly...by rolling, I mean moving window.
For each y (at one of the time points), do you want
  • the 8 coefficients of a single regression against the 8 x's together, OR
  • the 8 slopes of y vs. each x, individually?
It would be the second bullet. These would all be slopes associated with a specific regression of a given y vs. a given x (which is why there would be 20 * 8 = 160 combinations).
OK.
But note that there would 20*8 combinations for either of the above bullets. The second bullet gives 8 slopes, from 8 regressions. The first bullet gives 8 coefficients, in one regression. (That's why it needed clarification.)
Totally makes sense. I should have clarified it by saying that the x-variables don't interact with each other in the regression.
Another clarification ...
For a single time point, and a given y and x, do you want
  • the correlation coefficient, OR
  • the regression coefficient
These will be the same if you standardize your variables first (i.e de-mean and divide by standard deviation). But you'll have to standardize each window separately.
It would be the regression coefficient, i.e., the slope (or beta in my lingo).
Another clarification ...
As you have pointed out, the number of coefficients you'll be calculating is
(number y's) * (number x's) * (number windows) = 20 * 8 * 901.
How do you want those arranged in the output? In a 20 x 8 x 901 numeric array? Or something else?
Yes, a 20 x 8 x 901 matrix would be ideal, though it doesn't really matter much as long as we know what the dimensions represent. I can always use reshape to convert it to a format that would be desirable.

Accedi per commentare.

 Risposta accettata

I believe this does what you intend.
% Set seed for reproducibility
rng default
% Set a few convenience parameters
N = 1000;
WINSIZE = 100;
XN = 8;
YN = 10;
% Simulate some data
x = randn(N,XN);
y = randn(N,YN);
% Calculate the number of window
numberWindows = N - WINSIZE + 1;
% Preallocate the output
output = zeros(YN,XN,numberWindows);
% Loop over the windows
for nw = 1:numberWindows
% Find the data for this window
thisWindowIndex = nw:(nw+WINSIZE-1);
thisWindowXData = x(thisWindowIndex,:);
thisWindowYData = y(thisWindowIndex,:);
for ny = 1:YN
for nx = 1:XN
% Solve the regression (returns intercept and slope)
tmp = [ones(WINSIZE,1) thisWindowXData(:,nx)]\thisWindowYData(:,ny);
% Store the slope
output(ny,nx,nw) = tmp(2);
end
end
end

1 Commento

This is fantastic. Simple and effective. I also learned something new, i.e., that the backwards slash (mldivide) can naturally be used for a conventional beta calculation.
Separately, I thought I'd also provide some code for a function I created. It generates a rolling beta for a given set of x and y variables (in the form of a matrix). This function uses the built-in "mov" functions, so the general methodology and output format follow their protocol. What's neat about this function is that no loops are involved. (Also, "n" in this case refers to the length of the rolling observation window.)
function bet = movbeta(y,x,n)
bet = (movsum(x.*y,[n-1 0],'e','d')+movmean(x,[n-1 0],'e','d').*movmean(y,[n-1 0],'e','d')*-n)./movstd(x,[n-1 0],'e','d').^2/(n-1);
end

Accedi per commentare.

Più risposte (1)

John D'Errico
John D'Errico il 4 Set 2019
Modificato: John D'Errico il 4 Set 2019
Is the x vector equally spaced? If so, then my movingslope code (found on the File Exchange) will do it trivially and efficiently.
If not, then nothing stops you from using a loop and polyfit. It still will be reasonably efficient. You could make it a little faster with carefully written code than polyfit, but why bother?

3 Commenti

Unfortunately, the x variables are not equally spaced (though I did indeed see your nifty movingslope function).
A loop should work....I just wasn't sure if there was a more elegant approach embedded within Matlab somehow.
If the points are not evenly spaced, then the regression matrix changes for each location. You could write code that would work, not using a loop. It would look more elegant. It might take more memory though.
For example, you could write it using an update and downdate for a QR decomposition. Adding one point at the end, then dropping the first point. It would still be a loop. And the update/downdate would be slower then just throwing backslash at it, or even polyfit.
Or, given a simple regression for just a simple slope, you could do effectively the same thing. The formula for the slope is easy to write down. So, again, it would be easy to do, though still a loop.
Is this something you will be doing often? If so, then it would be worth the programmer time to do it better. But for a one shot deal, I'd not bother. CPU time is really cheap, and for a problem that is not a bottleneck in your task, a loop is easy.
It is indeed something I'd be doing often. But your point is a good one, i.e., that a loop might be sufficient for this purpose.
As I mentioned to the cyclist, I also created a function that generates a rolling beta for a given set of x and y variables (in the form of a matrix). This function uses the built-in "mov" functions, so the general methodology and output format follow their protocol. What's neat about this function is that no loops are involved. (Also, "n" in this case refers to the length of the rolling observation window.)
function bet = movbeta(y,x,n)
bet = (movsum(x.*y,[n-1 0],'e','d')+movmean(x,[n-1 0],'e','d').*movmean(y,[n-1 0],'e','d')*-n)./movstd(x,[n-1 0],'e','d').^2/(n-1);
end

Accedi per commentare.

Prodotti

Release

R2019a

Tag

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by