Analysis of varince and covariance

Some litterature about the subject:

Armitage, P. Berry,G. Stataistical Methods in Medical Research. 2nd ed.Blacwell Sientific Publications. Oxford. 1987.

Barker,H.R. Barker,B.M. Multivariate Analysys of Variance (MANOVA). A Practical Guide to Its Use in Scientific Decision Making. The University of Alabama Press. 1984.

Kempthorne,O. The Design and Analysis of Experiments. John Wiley & Sons. New York. 1952.

Montgomery, D. C. Design and Analysis of Experiments. 2 nd ed. John Wiley. New York. 1976.

Winer,B.J. Statistical Principles in Experimental Design. 2. .ed. McGraw-Hill Book Company. New York. 1971.

The university of eastern finland is closing the personal web pages so these pages will be deleted soon.

1. Study desing principles (Designing/planning trials/experiments)

Example The most siple study design.

A pharmaceutical company has produced two sleeping pills SLEEP1 and SLEEP2. We want to study witch pill gives the longer sleepingtime. The response variable here is the sleepingtime. We study the nullhypothese H0: µ1 = µ2i = mean sleepingtime for the pills SLEEP1 and SLEEP2).

Data:


Unit             SLEEP1              SLEEP2               Difference          

1                   7                   6                    1                   

2                   3                   3                    0                   

3                   6                   5                    1                   

4                   4                   3                    1                   

5                   8                   7                    1                   

6                   3                   2                    1                   

7                   6                   4                    2                   

8                   9                   8                    1                   

9                   5                   4                    1                   

10                  6                   5                    1                   

Means               5.7                 4.7                  1.0                 



The data can be from two different studydesings.

Desing 1. We take 20 persons to the experiment. The persons are randoimized into two groups. We give the SLEEP1 pills for the group1 and SLEEP1 pills for the group2 (n1 = n2 = 10) and ask the length of the sleep after taking the pill. If the sleepingtime is almost normally distributed and the groups are independent, we can compare the means with independent group t-test, where nullhypothese is H0: µ1 = µ2

The test bases the difference of the groupmeans and comparing the difference to the standard error of the difference. If the variances in the groups do not differ, the standard error of the difference is

where s the standard deviation of the all observations. Here s = 1.96, so the standard error of the difference is

.

If the variances in the groups differ, the standard error of the difference is

We get the test value t by deviding the difference with the standard error of the difference. Here the test value t = 1.0/0.877 = 1.14 is very small and from the t(18)-distribution, the probability is p = P(|t(18)| > 1.14) = 0.269 We must accept the H0, because the risk to make an error when we reject H0 is too large.

Design 2. We take 10 persons, witch take both pills. Half of the persons takes first the SLEEP1 pill and after a week the SLEEP2. An other half takes first the SLEEP2 and after a week the SLEEP1. Now the observation depend pairwise, and we can use paired t-test. The idea is to calculate the difference between the sleepingtimes after the pills and the test if the mean of the difference is zero

The differences can be seen in the table and the mean of the differences is 1.0 with standard deviation s = 0.47 . The standard error of the mean is s/n, if the difference is normal distributed. Here the SE(x) = 0.47/10 =0.15. The test value t = 1.0 / 0.15 = 6.667 is large and the risk to make an error to reject the correct H0 is p = P(|t(9)| > 6.667) = 0.000092 small enough to reject.

The advantage in the first design is that the groups are independent and the treatment cannot effect to the other treatment (for exaple, residual of the SLEEP1 might affect to the sleepingtime after SLEEP2). There is also designs, where the treatment can be done only once for one unit (for exaple learning experiments and the treatments, where an animal is hurt).

The advantage in the second design is seen in the standard error of the measured statistic (mean). Here it is 0.15, when the independent groups gives 0.877. The longitudinal effect of the treatment is taken care varying the treatment order.

Some of the criteria for good experinmental designs are as follows:

1. The analyses resulting from the design should probvide unambigous information on the primary objectives of the experiment. In particular the design should lead to unbiased estimates.

2. The model and its underlying assumptions should be appropriate for the experimental material.

3. The design should provide the maximum information with respect to the major objectives of the experiment per minimum amount of experimental effort.

4. The design should provide some information with respect to all the objectives of the experiment.

5. The design must be feasible within the working conditions that exist for the experimenter.

J.B. Wiener (1971) presents following picture:

2. Oneway analysis of variance

If the observations of the response variable can be assumed to be normal distributed, a model for an obseration in the saple is given by

We can compare two previous structural models with the likelihood ratio test (look the chapter 1.3).

If the variances in the groups (treatments) are similar, we can devide the variation of the observations into the variation of the groups (variation of the means) and the variation in the groups. The variation is measured with the sum of the squares

The idea is to measure the variation of the groupmeans. If the the variation of the means is large compared to the variation in the groups (Within), the groups differ. We must take care of the number of the observations deviding the sum of squares with the degreece of freedoms:

3 Comparisons among treatment means

The analysis of variance tells only if the groupmeans differ. It does not tell how the means differ. We compared two means with independent samples t-test, but it des not fit here, because the t-test does not take into consideration the variation of all groups and using the t-tests, we make an error, when we use unadjusted p-values. Taking into consideration the variation of all observations, will give more power into the test (smaller p-values). Testing the same variable with several tests is like tossing a coin several times: with one toss the probability of tail is 1/2, but with 3 tosses the probability to get at least one tai is not 1/2.

If we use t-test comparing the means pairwise, the p-value should be adjusted with Bonferroni correction:

padjusted = 1 - (1 - ptest)number of the comparisons

The main idea of the pairwise tests is to use the Bonferroni adjustment and replace the standard error of the difference between means with the standard error term from the all groups. Here is two exaples for the pairwise comparisons formulas

where q is F -distributed with degreece of freedom number of treatments (k) and MS(Within) (n-k).

Several persons have presented the comparisons (for example Duncan, Sheffe, Tukey and Dunnet). Their methods bases on the algorithms to not compare the means, witch we already know to differ or not differ.

If the researcher can formulate a hypothese (and is interested in) to compare the groups of the groups, we can use the contrasts. For example the hypothese can be to compare two first groups to two second groups.

The comparison is usually presented with the coefficients (1/2, 1/2, -1/2 and -1/2). The result is close the zero, if the means do not differ. We can test the difference deviding the reslut by the MS(Within). This comparison method is called as contrasts. The researcher can test also if the polynom explains the difference between the groups. For exaple coefficients for the linear contrast can be 1, 2, 3 and 4.

4 Blocks and covariates

Often the researcher knows the factors, that affects to the response varible. In the study desing the researcher should make the treatmen groups equal due these factors. In the analysis of variance the effect of the confounding factors can be adjusted and therefore we get the test for the pure treatment effect. The confounding factors grow the variaten in the groups (MS(Within)) and it can be taken care using block design or covariance anaysis. If the confoundin factor is noncontinous, we can use the block desing and if it is continous, we can use covariance analysis.

The sum of squares in the block effect anova is

SS(Total) = SS(Between) + SS(Block) + SS(Within).

Where the SS(Block) is from the SS(Within) in the simple anova.

The idea of the covariance analysis to explain the response variable by the confounding covariate with the regression analysis and then we do the anova for the regression residuals.

Exaple Following SPSS listing is from the block desing covariance analysis, where we compare the differences between (tratment) groups. To get the "pure" effect of the treatment, we adjust the response with age (continous)and sex (noncontinous).

                        Sum of                  Mean                    Sig 
Source of Variation     Squares         DF      Square          F       of F 
Covariates              44.491          1       44.491          34.291 .000 
AGE                     44.491          1       44.491          34.291 .000 
Main Effects            15.718          2       7.859           6.057   .011 
SEX                     15.695          1       15.695          12.097 .003 
GROUP                   .026            1       .026            .020    .890 
Explained               75.791          3       25.264          19.472 .000 
Residual                20.759          16      1.297 
Total                   96.550          19      5.082 

Here we can see, that the age affect significantly to the response variable (Sif of F = p = .000) and the sex affect too (Sig of F = p =.003). Whet the effect of the age and sex has been adjusted the groups do not differ (Sig of F = p = .890). The one-way anova table between groups:

Sum of Mean F F Source D.F. Squares Squares Ratio Prob. Between Groups 1 .0500 .0500 .0093 .9241 Within Groups 18 96.5000 5.3611 Total 19 96.5500

5 Multi-factor analysis of variance

If in the previous exaple the study question conserns also testing the effect of the sex (not just blocking variable), we can use two-way anova. With two-way anova we can get also an interaction effect, that tells us in thin exaple if the treatment effects differently to male than to female.

The SPSS listing from the two-way anova:

                        Sum of                  Mean                    Sig 
Source of Variation     Squares         DF      Square          F       of F 
Main Effects            31.300          2       15.650          3.913   .041 
SEX                     31.250          1       31.250          7.813   .013 
GROUP                   .050            1       .050            .013    .912 
2-Way Interactions      1.250           1       1.250           .313    .584 
SEX GROUP               1.250           1       1.250           .313    .584 
Explained               32.550          3       10.850          2.713   .080 
Residual                64.000          16      4.000 
Total                   96.550          19      5.082 

Here we can se that the sex effects to the response variable (p = .013), the treatment (GROUP) does not (p = .912) and the treatment does not effect differently for sexes (p = .584).

The sum of squares in the two-way anova can be written

SS(Total) = SS(A) + SS(B) + SS(AB) + SS(Within),

where A and B are the factors and AB is the interaction term.

This can be generalized into the several factors design, where we get the multiway interaction terms. There is two kind of problems in multiway anova: there must me enough observations in the all groups and the intrpretation can be difficult or the multiway interaction can be meningless for the study problem.

Example from Armitage & Berry. The response variable is the relative weight of right adrenal in mice obtained by crossing parents of four strains. For each of the 16 combinations of parental strains, four mice were used. This is s three-factor design. The factors - mother's strain, father's strain and se - are not, of course, experimental treatments imposed by random allocaion. Nevertheless they represent potential sources of variation whose main effects and interactions may be studied. Here is the variance table for the analysis:

* * * A N A L Y S I S O F V A R I A N C E * * *

ADRENAL 
BY SEX 
FATHER 
MOTHER 
                        Sum of                  Mean                            Signif 
Source of Variation     Squares         DF      Square          F               of F 
Main Effects            15.470          7       2.210           55.915          .000 
SEX                     14.890          1       14.890          376.737         .000 
MOTHER                  .340            3       .113            2.864           .052 
FATHER                  .240            3       .080            2.025           .130 
2way Interactions       1.718           15      .115            2.897           .006 
SEX MOTHER              .394            3       .131            3.327           .032 
SEX FATHER              .024            3       .008            .206            .891 
MOTHER FATHER           1.299           9       .144            3.651           .003 
3way Interactions       .261            9       .029            .735            .675 
SEX FATHER MOTHER       .261            9       .029            .735            .675 
Explained               17.449          31      .563            14.241          .000 
Residual                1.265           32      .040 
Total                   18.713          63      .297 

The sex affect to the adrenal (p = .000) and the effect of the mother's strain is almost significant (p = .052), but the father's strain does not effet (p = .130). From the two-way interactions we can see, that the mother's strain effects differently the sexes (p = .032) and mother's strain has an interaction with the father's strain (p = .003) but the father's stain effects similar both sexes (p = .891). There is no three-way interaction (p =.675).

To closer analyse of the effects, we must draw a graph from the means:

The means for the females (sex = 2, 0.5 - 1.1) is smaller than the means for males (sex = 1, 1.2 - 2.2). Effect of the mother's strain can be seen in the high values in the strain 3 and the small values in the strain 1. The interaction of the parents can bee seen in the crossing lines.

6 Repeated measures ANOVA

When the same units (animals, persons, patients) have been measured several times, the observations depend and the difference between the means should be tested with the repeated measures anova.

The variation of the data can be devided into between persons and within people variation:

SS(Total) = SS(Between people) + SS(Within people) ,

where the Within people variation can be devided in to repeated and residual varition:

SS(Within people) = SS(Repeated) + SS(Residual).

The idea is to compre the repeated variation to the residual variation, when the between people variation is taken from the total variation.

Example From the earlier exaple for the pairet t-test about the sleepin pills with repeated measures anova.

Within people sum of squares can be calculated subtracting the between people SS from the total SS:

In this exaple it is

SS(Within people) = (72+62+32+32+62+52+...+52 ) - (132+62+ ... +112)/2 = 6.0 ,

witch is devided into repeated SS and residual SS:

SS(Residual) = 6.0 - 5.0 = 1.0

The sum of squares are devided by the degreece of freedoms 5.0/1 = 5.0 and 1.0/9 = 0.111 and we get the mean sum of squares (MS). The ratio of the mean squares F = 5.0 / 0.111 = 45 is F-distributed with degreece of freedom 1 and 9. If the test value F is small, the means do not differ, but if F is large, the means differ. From the F-distribution we get p = P(F > 45) = 0.000087. The result is same as in the paired t-test and if we had larger calculation accuracy, the p-value would be the same. We can do the repeated measures anova in the SPSS choosing SCALE (the first listing) or choosing Anova Models Repeated measures (the second listing):

Source of Variation     Sum of Sq.      DF      Mean Square     F       Prob. 
Between People          67.2000         9       7.4667 
Within People           6.0000          10      .6000 
Between Measures        5.0000          1       5.0000          45.0000 .0001 
Residual                1.0000          9       .1111 
Total                   73.2000         19      3.8526 

Tests of Significance for T1 using UNIQUE sums of squares

Source of Variation SS DF MS F Sig of F WITHIN+RESIDUAL 67.20 9 7.47 CONSTANT 540.80 1 540.80 72.43 .000

Tests of Significance for T2 using UNIQUE sums of squares

Source of Variation SS DF MS F Sig of F WITHIN+RESIDUAL 1.00 9 .11 FACTOR1 5.00 1 5.00 45.00 .000

The first is from the SCALE listing, where we can see all the sums. The second is ANOVA listing a test if the allover mean (CONSTANT) differs from zero. The third is also ANOVA listing.

Schematic representation of the analysis:

7. Comparisons among repeated means

There is all the same problems and possibilities in the repeated measurements pairwise comprisons as in the independendent groups anova. Unfortunately the "usual" statistical packages (like SPSS and SAS) do not offer the pairwise prosedures in the repeated anova like in the independent one-way anova. The easiest way to compare the means in the dependent measurements is to use t-tests with Bonferroni correction. Neuman - Keuls method for the pairwise comparison is to calculate the difference between the means and devide it with the adjusted residual SS term:

The statistic qr is F -distributed with degreese of reedom number of measurement and MS(Residual).

Example Three repeated measurements in the hart rate with SPSS (repeated anova):

Orthonormalized Transformation Matrix (Transposed)

      T1   T2   T3 
RHR0 .577 .816 .000 
RHR1 .577 .408 .707 
RHR2 .577 .408 .707

Univariate F tests with (1,39) D. F.

Variable Hypoth. SS Error SS Hypoth. MS Error MS F Sig. of F T2 212.81667 1528.18333 212.81667 39.18419 5.43119 .025 T3 96.80000 2244.20000 96.80000 57.54359 1.68220 .202

AVERAGED Tests of Significance for RHR using UNIQUE sums of squares

Source of Variation SS DF MS F Sig of F WITHIN CELLS 3772.38 78 48.36 FACTOR1 309.62 2 154.81 3.20 .046

Variable Mean Std Dev Minimum Maximum N Label RHR0 68.55 11.62 47 97 40 RHR1 66.82 12.04 44 92 40 RHR2 64.63 10.11 50 98 40

We can see from the anova table that the means differ with the significant level p = 0.046. For the contrasts there is printed the transformation matrix, that shows that the cantrast T2 compares the 1. measurement to the others and the contrast T3 compares 2. and 3. measurements. Here the first measurement (mean) differs from the others (p = 0.025).

The repeated measurements anova assumes the variables distributed with normal distribution and the variaces to be equal. The researcher can test the normal distribution asuumption with Kruskall-Wallis test and the equality of the variances with Mauchly spericity test. The SPSS repeated measurements anova presents the Mauchly test:

Tests involving 'FACTOR1' WithinSubject Effect. 

Mauchly sphericity test, W = .93476 Chisquare approx. = 2.56370 with 2 D. F. Significance = .278

GreenhouseGeisser Epsilon = .93876 HuynhFeldt Epsilon = .98458 Lowerbound Epsilon = .50000

AVERAGED Tests of Significance that follow multivariate tests are equivalent to univariate or splitplot or mixedmodel approach to repeated measures. Epsilons may be used to adjust d.f. for the AVERAGED results.

The variances are equal with p = 0.278 . If the variances would be unequal (p < 0.05), we could adjust the degreece of freedoms in the anova table with the epsilon coefficient with multiplying the degreece of freedoms with epsilon.

8. Mixed-model ANOVA

If the researcher has the independent groups and the repeated measurements, the most important hypothese is to test if the change is different in the groups. Here we can use the mixed model anova, where the model is mixed from the independent factors and dependent factors.

Example Partly prepared data, where rats has three groups (control and two treatments) and three measurements. The response variables is a latens in the labyrint:

Tests of BetweenSubjects Effects. 

Tests of Significance for T1 using UNIQUE sums of squares Source of Variation SS DF MS F Sig of F WITHIN CELLS 2332270.92 122 19116.97 CONSTANT 7818915.35 1 7818915.3 409.00 .000 GROUP 141934.33 2 70967.16 3.71 .027

Tests involving 'FACTOR1' WithinSubject Effect.

Mauchly sphericity test, W = .89427 Chisquare approx. = 13.52150 with 2 D. F. Significance = .001

GreenhouseGeisser Epsilon = .90438 HuynhFeldt Epsilon = .93224 Lowerbound Epsilon = .50000

AVERAGED Tests of Significance that follow multivariate tests are equivalent to univariate or splitplot or mixedmodel approach to repeated measures. Epsilons may be used to adjust d.f. for the AVERAGED results.

AVERAGED Tests of Significance for LAT using UNIQUE sums of squares

Source of Variation SS DF MS F Sig of F WITHIN CELLS 1989646.23 244 8154.29 FACTOR1 397285.84 2 198642.92 24.36 .000 GROUP BY FACTOR1 62465.36 4 15616.34 1.92 .109

In the first we can see that the groups differ (p = 0.027) over all the measurements. The variance test (Mauchly) tells that the variances differ and we must use the adjustments (or to use MANOVA). In the last is the mixed model anova listing, where we must adjust the df:s (0.90438 * 244 = 220.669, 0.90438 * 2 = 1.80856 and 0.90438 * 4 = 3.618) and we get p < 0.000001 and p = 0.115. So there has been statistically significant change in heart rate, but the cange has not been different in the groups.

Here is a graph from the previous means

The graph shows, that the last measurement has the smallest mean. The group 1 seems to have largest means and the group 3 the smallest. Event the trens looks to be different, the interaction effect is not statistically significant. We can compare the trend of the group 1 to the other group trend with contrasts:

Estimates for T3 (CONT.) 
GROUP BY FACTOR1 
Parameter       Coeff.          Std. Err.       t Value         Sig. t          Lower 95% CL Upper 

2 25.614178 16.85742 1.51946 .131 58.98513 7.75677 3 37.564796 16.65225 2.25584 .026 70.52959 4.60000

The parameter 3 (can bee seen in the transformation matrix) compares the first group (control group) to the others and the the estimate is the interaction effect. The p-value 0.025 tells, that the change in the group 1 is different from the othser groups.

9. About MANOVA

If the variances in the repeated measurements anova are not equal (Maucly test) we can use the multivariate analysis of variance model to test the differences. The MANOVA is not also so sensitive to the normal distribution assumption as anova. The MANOVA in not only a alternative to anova. We can explain several response variables in one MANOVA and to do several kind of analysis. The basic idea of MANOVA is linear model. For exaple a model for repeated measurements with several groups can be written

where the parameters µ, , , ß, ß,... tells the difference between the means in the groups and the measurements. The we just test the significance of the parameter (if the parameter differs from the zero). The tests are called with the names of the persons, who have found the test (Pillais, Hotellings and Wilks). The assumption of the tests is the normal distribution of the residual. Here is MANOVA listing from the previous rat exaple:

* * ANALYSIS OF VARIANCE DESIGN 1 * *

EFFECT .. GROUP BY FACTOR1

Multivariate Tests of Significance (S = 2, M = 1/2, N = 59 1/2)

Test Name Value Approx. F Hypoth. DF Error DF Sig. of F

Pillais .07250 2.29456 4.00 244.00 .060 Hotellings .07668 2.30054 4.00 240.00 .059 Wilks .92816 2.29773 4.00 242.00 .060 Roys .06175

EFFECT .. FACTOR1

Multivariate Tests of Significance (S = 1, M = 0, N = 59 1/2)

Test Name Value Approx. F Hypoth. DF Error DF Sig. of F

Pillais .32139 28.65220 2.00 121.00 .000 Hotellings .47359 28.65220 2.00 121.00 .000 Wilks .67861 28.65220 2.00 121.00 .000 Roys .32139

The first is the test of the interaction effect and it gives p = 0.06, so the canges has been closely statistically different. This result is closer to the previous analyse form the graph. The small different in the results may be because the response variable (latent) in not very normal distributes and the MANOVA works better here. The second is the test of the repeated measurement parameter and it gives the same result (p < 0.001) as the mixed model anova.