Trying to make 2 data sets the same length

Question

Matthew il 28 Feb 2023

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/1920570-trying-to-make-2-data-sets-the-same-length

Commentato: William Rose il 28 Feb 2023

I have two datasets. One is a 1x102437 and the other is 1x41716. I am trying to make them the same length so that I can perform a paired t-test on the data. How do I make the 1x41716 the same length as the 1x102437? I have tries using interp1 but keep running into trouble with this.

Thanks!

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

Voss il 28 Feb 2023

1
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/1920570-trying-to-make-2-data-sets-the-same-length#answer_1181955

Modificato: Voss il 28 Feb 2023

Apri in MATLAB Online

If you want to "expand" the shorter dataset to match the length of the longer one using interp1, here's one way:

dataset1 = rand(1,51); % random data

dataset2 = rand(1,21);

n1 = numel(dataset1);

n2 = numel(dataset2);

x1 = 1:n1;

x2 = linspace(1,n1,n2);

dataset2_interp = interp1(x2,dataset2,x1);

subplot(2,1,1)

hold on

plot(dataset1,'.-b')

plot(dataset2,'.-r')

legend

title('Original')

subplot(2,1,2)

hold on

plot(dataset1,'.-b')

plot(dataset2_interp,'.-r')

legend

title('Dataset 2 Expanded')

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Answer 2

William Rose il 28 Feb 2023

1
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/1920570-trying-to-make-2-data-sets-the-same-length#answer_1181970

Apri in MATLAB Online

@Matthew,

You can do it, but that does not mean you should do it.

On what basis do you justify the pairing of the samples from the long vector with the interoplated samples form the short vector?

As for generating an equal number of samples (but I don;t recommend doing it unless you have a good justification):

Let's call the vectors y1 (long) and y2 (short). Illustrate with example vectors that are 1000 times shorter than yours:

y1=rand(1,102); y2=rand(1,42);

To interpolate y2 to be as long as y1, you need associated x values. Create a vector x2:

x2=1:length(y2);

Create a query vector, xq:

xq=linspace(1,length(y2),length(y1));

Interpolate:

y2int=interp1(x2,y2,xq);
disp([length(y1),length(y2int)])
   102   102

y2int has the same length as y1. But this does not mean each value in y2 is paired with a certain element in y1.

3 Commenti
Mostra 1 commento meno recenteNascondi 1 commento meno recente

William Rose il 28 Feb 2023

Apri in MATLAB Online

@Matthew, that sounds like a great project!

What did you record during the two trials from the same subject? Why do trial 1 and trial 2 differ in length by such a large amount?

With a paired test, you subtract each value from its corresponding paired value. This reduces a potential source of variance. By using a paried t test, you reduce the chance of making a type II error.

Here is an example of how a paired test could be justified and used in your case: You recorded one cycle of activity with controller 1 and 1 cycle with controller 2. You used 102 samples in one cycle in one case, and 42 samples for one cycle in the other case. (I don't know why you would have such a different number of samples, but let's assume you did):

y1=10*sin((1:102)/(2*pi*102))+rand(1,102);
y2=10*sin((1:42)/(2*pi*42))+rand(1,42)+.1;
% Compare y1 to y2 with unpaired t test:
[h,p]=ttest2(y1,y2);
if h==1, fprintf('Unpaired: Samples have diferent means, p=%.3f\n',p);
else fprintf('Unpaired: Sample means do not differ, p=%.3f\n',p); end
Unpaired: Sample means do not differ, p=0.093

Interpolate y2 to have 102 values, and compare the vectors using a paired t-test:

y2int=interp1(1:42,y2,linspace(1,42,102));
[h,p]=ttest(y1,y2int);
if h==1, fprintf('Paired: Samples have diferent means, p=%.3f\n',p);
else fprintf('Paired: Sample means do not differ, p=%.3f\n',p); end
Paired: Samples have diferent means, p=0.000

I ran the code above six times. The result was the same with both tests in two cases. In the other four cases, the first test got the wrong answer ("Sample means do not differ") while the paired test got the right answer ("Samples have different means"). These results support the idea that, when properly used, the paired test reduces the chance of making a type II error.

If you prefer to continue this discussion offline, click on the envelope icon at the top right of the pop-up window that appears when you click the WR circle by my comment.

William Rose il 28 Feb 2023

Apri in MATLAB Online

@Matthew, you mentioned "The only problem with this [ttest2()] is that it assumes that the data sets come from two independent samples." The paired t test (ttest()) also assumes independence of the individual samples from one another, and it assumes or makes use of the built-in pairing of the data. That second assumption might be quesitoned when the raw data is not paired point-by point, as in this case. Both the paired (ttest) and unpaired (ttest2) tests assume the samples are normally distributed with equal variance. If you don;t want to make assumptions about normality and equal variance, use the Wilcoxon rank-sum test, also known as the Mann-Whitney U test, without interpolation, on the unequal-size samples.

[p,h]=ranksum(y1,y2);

If you want to interpolate, and you want to do a paired test, without the assumptions of normality and equal variance that are inherent in a t test, do the sign test, which is the paired equivalent of the rank-sum test:

[p,h]=signrank(y1,y2int);

Accedi per commentare.

Answer 3

Image Analyst il 28 Feb 2023

1
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/1920570-trying-to-make-2-data-sets-the-same-length#answer_1181995

Apri in MATLAB Online

What about ttest2?

help ttest2
 TTEST2 Two-sample t-test with pooled or unpooled variance estimate.
    H = TTEST2(X,Y) performs a t-test of the hypothesis that two
    independent samples, in the vectors X and Y, come from distributions
    with equal means, and returns the result of the test in H.  H=0
    indicates that the null hypothesis ("means are equal") cannot be
    rejected at the 5% significance level.  H=1 indicates that the null
    hypothesis can be rejected at the 5% level.  The data are assumed to
    come from normal distributions with unknown, but equal, variances.  X
    and Y can have different lengths.
 
    This function performs an unpaired two-sample t-test. For a paired
    test, use the TTEST function.
 
    X and Y can also be matrices or N-D arrays.  For matrices, TTEST2
    performs separate t-tests along each column, and returns a vector of
    results.  X and Y must have the same number of columns.  For N-D
    arrays, TTEST2 works along the first non-singleton dimension.  X and Y
    must have the same size along all the remaining dimensions.
 
    TTEST2 treats NaNs as missing values, and ignores them.
 
    [H,P] = TTEST2(...) returns the p-value, i.e., the probability of
    observing the given result, or one more extreme, by chance if the null
    hypothesis is true.  Small values of P cast doubt on the validity of
    the null hypothesis.
 
    [H,P,CI] = TTEST2(...) returns a 100*(1-ALPHA)% confidence interval for
    the true difference of population means.
 
    [H,P,CI,STATS] = TTEST2(...) returns a structure with the following fields:
       'tstat' -- the value of the test statistic
       'df'    -- the degrees of freedom of the test
       'sd'    -- the pooled estimate of the population standard deviation
                  (for the equal variance case) or a vector containing the
                  unpooled estimates of the population standard deviations
                  (for the unequal variance case)
 
    [...] = TTEST2(X,Y,'PARAM1',val1,'PARAM2',val2,...) specifies one or
    more of the following name/value pairs:
 
        Parameter       Value
        'alpha'         A value ALPHA between 0 and 1 specifying the
                        significance level as (100*ALPHA)%. Default is
                        0.05 for 5% significance.
        'dim'           Dimension DIM to work along. For example, specifying
                        'dim' as 1 tests the column means. Default is the
                        first non-singleton dimension.
        'tail'          A string specifying the alternative hypothesis:
            'both'  "means are not equal" (two-tailed test)
            'right' "mean of X is greater than mean of Y" (right-tailed test)
            'left'  "mean of X is less than mean of Y" (left-tailed test)
        'vartype'       'equal' to perform the default test assuming equal
                        variances, or 'unequal', to perform the test
                        assuming that the two samples come from normal
                        distributions with unknown and unequal variances.
                        This is known as the Behrens-Fisher problem. TTEST2
                        uses Satterthwaite's approximation for the
                        effective degrees of freedom.
 
    See also TTEST, RANKSUM, VARTEST2, ANSARIBRADLEY.

    Documentation for ttest2
       doc ttest2
set1 = rand(1, 100);
set2 = rand(1, 50); % Second set has a different number of observations.
[h,p] = ttest2(set1, set2) 
h = 0
p = 0.8317

2 Commenti
Mostra NessunoNascondi Nessuno

Matthew il 28 Feb 2023

The only problem with this is that it assumes that the data sets come from two independent samples. But other than that, I can see how that might work.

William Rose il 28 Feb 2023

@Image Analyst is right (as always, it seems to me) that ttest2 is a good option. Which is why I used it in my example above. It is more conservative than a paired t test, in the sense that it does not make any assumptions about paired-ness.

Accedi per commentare.

Trying to make 2 data sets the same length

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposte (3)

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

3 Commenti
Mostra 1 commento meno recenteNascondi 1 commento meno recente

2 Commenti
Mostra NessunoNascondi Nessuno

Vedere anche

Categorie

Tag

Community Treasure Hunt

Trying to make 2 data sets the same length

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposte (3)

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

3 Commenti Mostra 1 commento meno recenteNascondi 1 commento meno recente

2 Commenti Mostra NessunoNascondi Nessuno

Vedere anche

Categorie

Tag

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

3 Commenti
Mostra 1 commento meno recenteNascondi 1 commento meno recente

2 Commenti
Mostra NessunoNascondi Nessuno