Here is the solution fully vectorized (Though uses much more multiplications than needed):
mA = reshape(bsxfun(@minus, permute(mY, [1, 3, 2]), permute(mX, [3, 1, 2])), [(m * n), p]);
mD = reshape(diag(A* inv(mC) * A.'), [m, n]);
If anyone has faster way (Not necessarily fully vectorized) I'd be happy to see.