Why does 10 repeat running of rica (reconstruction ica) on the same data set return different independent components, if not using 'rng default' ?or so

6 visualizzazioni (ultimi 30 giorni)
I have a egg dataset (3000 * 120), each row is one curve of egg recording. I have 120 repeats. I know it should be 12 ICs, in theory. I applied the following code to analyze the traces for independent components, and I can get some result, as the following curves, I got 12 peaks, as expected.
q = 12;
sample = prewhiten(raw);
Mdl = rica(sample,q,'NonGaussianityIndicator',ones(q,1));
sample_rica = transform(Mdl,sample);
figure;
plot(sample_rica);
Then I change the code to following ones, and the resulting 100 repeats do not give consistent components between each run. The code and result are as following. The curves from 100 mean of each components appears as over 40 peaks, which means the results from 100 repeat run are not consistent.
q = 12;
sample = prewhiten(raw);
for i = 1:1:100
Mdl = rica(sample,q,'NonGaussianityIndicator',ones(q,1));
sample_rica(:,:,i) = transform(Mdl,sample);
end
sample_100_avg = squeeze(mean(sample_rica,2));
figure;
plot(sample_00_avg);
If I can always use rica to get the same independent components, although orders may vary, they should be the sample twelve peaks when adding or averaging, however, I got more than 40.
Does it mean rica should not be applied to this data, or there are settings I need to tweak, or something else?
Thank you for any input!

Risposte (1)

Shivansh
Shivansh il 24 Gen 2024
Modificato: Shivansh il 24 Gen 2024
Hi Tao!
The rica function is essentially Reconstruction ICA which is a variant of ICA(Independent Component Analysis). I can't reproduce the issue with the same dataset on my end as you haven't provided the "prewhiten" function and the data file "raw". From the looks of the provided plots, I can provide a few potential reasons and workarounds for this behaviour.
  1. Different plots for each iterations can be due to the different random initialization for each run. Set the random seed before the loop to eliminate this possibility.
  2. Analyze plots for a few iterations instead of 100 to check the stability of the algorithm.
  3. You can also check with the ordering of the components. Different orderings of components for different runs can give results which are consistent but can give results similar to the second plot.
I have reproduced the issue with random sample data "raw" and "prewhiten" function. The plots for both the cases were same after adding the above changes to the code provided by you.
Plot for first case with the sample data
Plot for second case with sample data
As you can observe, both the plots are same.
I have generated the "raw" data using the following code snippet.
% Number of samples and independent components
numSamples = 3000;
numComponents = 3;
% Create three independent sources with peaks
source1 = sin(linspace(0, 3*pi, numSamples))'; % Sine wave peak
source2 = exp(-((1:numSamples) - 1500).^2 / (2*300^2))'; % Gaussian peak
source3 = double(((1:numSamples) / 1000) == round((1:numSamples) / 1000))'; % Impulse peak
% Stack sources into a matrix
S = [source1, source2, source3];
% Create a random mixing matrix
A = rand(numComponents, numComponents);
% Mix the sources to create observed data
raw = S * A;
% Prewhiten the raw data
sample = prewhiten(raw);
I have used the following code snippet for the first case.
% Set a random seed for reproducibility
rng(1);
% Number of independent components to extract
q = numComponents;
% Run rica and transform the data
Mdl = rica(sample, q, 'NonGaussianityIndicator', ones(q,1));
sample_rica = transform(Mdl, sample);
% Plot the independent components
figure;
plot(sample_rica);
I have used the following code snippet for the second case. It also contains a function to align the components before averaging.
% Initialize the matrix to hold the transformed data
sample_rica = zeros(size(sample, 1), numComponents, 100);
rng(1);
% Run rica 100 times with the same random seed for reproducibility
for i = 1:100
Mdl = rica(sample, numComponents, 'NonGaussianityIndicator', ones(numComponents, 1));
sample_rica(:,:,i) = transform(Mdl, sample);
end
% Align the components before averaging
sample_rica_aligned = alignComponents(sample_rica,numComponents);
% Average the independent components across the 100 runs
sample_100_avg = squeeze(mean(sample_rica_aligned, 3));
% Plot the average independent components
figure;
plot(sample_100_avg);
% Function to align components based on maximum correlation
function aligned = alignComponents(components, numComponents)
numObservations = size(components, 1);
numRuns = size(components, 3);
aligned = zeros(numObservations, numComponents, numRuns);
reference = components(:,:,1); % Use the first run as a reference
for i = 1:numRuns
for j = 1:numComponents
% Find the component in the current run that best matches the reference
correlations = corr(reference(:,j), components(:,:,i));
[~, idx] = max(abs(correlations));
signChange = sign(correlations(idx));
aligned(:,j,i) = components(:,idx,i) * signChange;
end
end
end
The query regarding the application of "rica" on your data can't be analyzed without the actual data. It mainly depends on the problem statement and the purpose for analysis.
If you want to learn more about "rica", refer to the following documentation https://www.mathworks.com/help/stats/rica.html.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by