Some litterature about the subject:
Armitage, P. Berry,G. Stataistical Methods in Medical Research. 2nd ed.Blacwell Sientific Publications. Oxford. 1987.
Barker,H.R. Barker,B.M. Multivariate Analysys of Variance (MANOVA). A Practical Guide to Its Use in Scientific Decision Making. The University of Alabama Press. 1984.
Kempthorne,O. The Design and Analysis of Experiments. John Wiley & Sons. New York. 1952.
Montgomery, D. C. Design and Analysis of Experiments. 2 nd ed. John Wiley. New York. 1976.
Winer,B.J. Statistical Principles in Experimental Design. 2. .ed. McGraw-Hill Book Company. New York. 1971.
Example The most siple study design.
A pharmaceutical company has produced two sleeping pills SLEEP1 and SLEEP2. We want to study witch pill gives the longer sleepingtime. The response variable here is the sleepingtime. We study the nullhypothese H0: µ1 = µ2 (µi = mean sleepingtime for the pills SLEEP1 and SLEEP2).
Data:
Unit SLEEP1 SLEEP2 Difference 1 7 6 1 2 3 3 0 3 6 5 1 4 4 3 1 5 8 7 1 6 3 2 1 7 6 4 2 8 9 8 1 9 5 4 1 10 6 5 1 Means 5.7 4.7 1.0
The data can be from two different studydesings.
Desing 1. We take 20 persons to the experiment. The persons are randoimized into two groups. We give the SLEEP1 pills for the group1 and SLEEP1 pills for the group2 (n1 = n2 = 10) and ask the length of the sleep after taking the pill. If the sleepingtime is almost normally distributed and the groups are independent, we can compare the means with independent group t-test, where nullhypothese is H0: µ1 = µ2
The test bases the difference of the groupmeans and comparing the difference to the standard error of the difference. If the variances in the groups do not differ, the standard error of the difference is
where s the standard deviation of the all observations. Here s = 1.96, so the standard error of the difference is
.
If the variances in the groups differ, the standard error of the difference is
We get the test value t by deviding the difference with the standard error of the difference. Here the test value t = 1.0/0.877 = 1.14 is very small and from the t(18)-distribution, the probability is p = P(|t(18)| > 1.14) = 0.269 We must accept the H0, because the risk to make an error when we reject H0 is too large.
Design 2. We take 10 persons, witch take both pills. Half of the persons takes first the SLEEP1 pill and after a week the SLEEP2. An other half takes first the SLEEP2 and after a week the SLEEP1. Now the observation depend pairwise, and we can use paired t-test. The idea is to calculate the difference between the sleepingtimes after the pills and the test if the mean of the difference is zero
The differences can be seen in the table and the mean of the differences
is 1.0 with standard deviation s = 0.47 . The standard error of the mean
is s/
n, if the difference is normal
distributed. Here the SE(x) = 0.47/
10
=0.15. The test value t = 1.0 / 0.15 = 6.667 is large and the risk to make
an error to reject the correct H0 is p = P(|t(9)| > 6.667)
= 0.000092 small enough to reject.
The advantage in the first design is that the groups are independent and the treatment cannot effect to the other treatment (for exaple, residual of the SLEEP1 might affect to the sleepingtime after SLEEP2). There is also designs, where the treatment can be done only once for one unit (for exaple learning experiments and the treatments, where an animal is hurt).
The advantage in the second design is seen in the standard error of the measured statistic (mean). Here it is 0.15, when the independent groups gives 0.877. The longitudinal effect of the treatment is taken care varying the treatment order.
Some of the criteria for good experinmental designs are as follows:
1. The analyses resulting from the design should probvide unambigous information on the primary objectives of the experiment. In particular the design should lead to unbiased estimates.
2. The model and its underlying assumptions should be appropriate for the experimental material.
3. The design should provide the maximum information with respect to the major objectives of the experiment per minimum amount of experimental effort.
4. The design should provide some information with respect to all the objectives of the experiment.
5. The design must be feasible within the working conditions that exist for the experimenter.
J.B. Wiener (1971) presents following picture:
If the observations of the response variable can be assumed to be normal
distributed, a model for an obseration in the saple is given by
We can compare two previous structural models with the likelihood ratio test (look the chapter 1.3).
If the variances in the groups (treatments) are similar, we can devide the variation of the observations into the variation of the groups (variation of the means) and the variation in the groups. The variation is measured with the sum of the squares
The idea is to measure the variation of the groupmeans. If the the variation
of the means is large compared to the variation in the groups (Within),
the groups differ. We must take care of the number of the observations
deviding the sum of squares with the degreece of freedoms:
The analysis of variance tells only if the groupmeans differ. It does not tell how the means differ. We compared two means with independent samples t-test, but it des not fit here, because the t-test does not take into consideration the variation of all groups and using the t-tests, we make an error, when we use unadjusted p-values. Taking into consideration the variation of all observations, will give more power into the test (smaller p-values). Testing the same variable with several tests is like tossing a coin several times: with one toss the probability of tail is 1/2, but with 3 tosses the probability to get at least one tai is not 1/2.
If we use t-test comparing the means pairwise, the p-value should be adjusted with Bonferroni correction:
padjusted = 1 - (1 - ptest)number of the comparisons
The main idea of the pairwise tests is to use the Bonferroni adjustment and replace the standard error of the difference between means with the standard error term from the all groups. Here is two exaples for the pairwise comparisons formulas
where q is F -distributed with degreece of freedom number of treatments (k) and MS(Within) (n-k).
Several persons have presented the comparisons (for example Duncan, Sheffe, Tukey and Dunnet). Their methods bases on the algorithms to not compare the means, witch we already know to differ or not differ.
If the researcher can formulate a hypothese (and is interested in) to compare the groups of the groups, we can use the contrasts. For example the hypothese can be to compare two first groups to two second groups.
The comparison is usually presented with the coefficients (1/2, 1/2, -1/2 and -1/2). The result is close the zero, if the means do not differ. We can test the difference deviding the reslut by the MS(Within). This comparison method is called as contrasts. The researcher can test also if the polynom explains the difference between the groups. For exaple coefficients for the linear contrast can be 1, 2, 3 and 4.
Often the researcher knows the factors, that affects to the response varible. In the study desing the researcher should make the treatmen groups equal due these factors. In the analysis of variance the effect of the confounding factors can be adjusted and therefore we get the test for the pure treatment effect. The confounding factors grow the variaten in the groups (MS(Within)) and it can be taken care using block design or covariance anaysis. If the confoundin factor is noncontinous, we can use the block desing and if it is continous, we can use covariance analysis.
The sum of squares in the block effect anova is
SS(Total) = SS(Between) + SS(Block) + SS(Within).
Where the SS(Block) is from the SS(Within) in the simple anova.
The idea of the covariance analysis to explain the response variable by the confounding covariate with the regression analysis and then we do the anova for the regression residuals.
Exaple Following SPSS listing is from the block desing covariance analysis, where we compare the differences between (tratment) groups. To get the "pure" effect of the treatment, we adjust the response with age (continous)and sex (noncontinous).
Sum of Mean Sig Source of Variation Squares DF Square F of F Covariates 44.491 1 44.491 34.291 .000 AGE 44.491 1 44.491 34.291 .000 Main Effects 15.718 2 7.859 6.057 .011 SEX 15.695 1 15.695 12.097 .003 GROUP .026 1 .026 .020 .890 Explained 75.791 3 25.264 19.472 .000 Residual 20.759 16 1.297 Total 96.550 19 5.082
Here we can see, that the age affect significantly to the response variable (Sif of F = p = .000) and the sex affect too (Sig of F = p =.003). Whet the effect of the age and sex has been adjusted the groups do not differ (Sig of F = p = .890). The one-way anova table between groups:
Sum of Mean F F Source D.F. Squares Squares Ratio Prob. Between Groups 1 .0500 .0500 .0093 .9241 Within Groups 18 96.5000 5.3611 Total 19 96.5500
If in the previous exaple the study question conserns also testing the effect of the sex (not just blocking variable), we can use two-way anova. With two-way anova we can get also an interaction effect, that tells us in thin exaple if the treatment effects differently to male than to female.
The SPSS listing from the two-way anova:
Sum of Mean Sig Source of Variation Squares DF Square F of F Main Effects 31.300 2 15.650 3.913 .041 SEX 31.250 1 31.250 7.813 .013 GROUP .050 1 .050 .013 .912 2-Way Interactions 1.250 1 1.250 .313 .584 SEX GROUP 1.250 1 1.250 .313 .584 Explained 32.550 3 10.850 2.713 .080 Residual 64.000 16 4.000 Total 96.550 19 5.082
Here we can se that the sex effects to the response variable (p = .013), the treatment (GROUP) does not (p = .912) and the treatment does not effect differently for sexes (p = .584).
The sum of squares in the two-way anova can be written
SS(Total) = SS(A) + SS(B) + SS(AB) + SS(Within),
where A and B are the factors and AB is the interaction term.
This can be generalized into the several factors design, where we get the multiway interaction terms. There is two kind of problems in multiway anova: there must me enough observations in the all groups and the intrpretation can be difficult or the multiway interaction can be meningless for the study problem.
Example from Armitage & Berry. The response variable is the relative weight of right adrenal in mice obtained by crossing parents of four strains. For each of the 16 combinations of parental strains, four mice were used. This is s three-factor design. The factors - mother's strain, father's strain and se - are not, of course, experimental treatments imposed by random allocaion. Nevertheless they represent potential sources of variation whose main effects and interactions may be studied. Here is the variance table for the analysis:
* * * A N A L Y S I S O F V A R I A N C E * * *
ADRENAL
BY SEX
FATHER
MOTHER
Sum of Mean Signif
Source of Variation Squares DF Square F of F
Main Effects 15.470 7 2.210 55.915 .000
SEX 14.890 1 14.890 376.737 .000
MOTHER .340 3 .113 2.864 .052
FATHER .240 3 .080 2.025 .130
2way Interactions 1.718 15 .115 2.897 .006
SEX MOTHER .394 3 .131 3.327 .032
SEX FATHER .024 3 .008 .206 .891
MOTHER FATHER 1.299 9 .144 3.651 .003
3way Interactions .261 9 .029 .735 .675
SEX FATHER MOTHER .261 9 .029 .735 .675
Explained 17.449 31 .563 14.241 .000
Residual 1.265 32 .040
Total 18.713 63 .297
The sex affect to the adrenal (p = .000) and the effect of the mother's strain is almost significant (p = .052), but the father's strain does not effet (p = .130). From the two-way interactions we can see, that the mother's strain effects differently the sexes (p = .032) and mother's strain has an interaction with the father's strain (p = .003) but the father's stain effects similar both sexes (p = .891). There is no three-way interaction (p =.675).
To closer analyse of the effects, we must draw a graph from the means:
The means for the females (sex = 2, 0.5 - 1.1) is smaller than the means for males (sex = 1, 1.2 - 2.2). Effect of the mother's strain can be seen in the high values in the strain 3 and the small values in the strain 1. The interaction of the parents can bee seen in the crossing lines.
When the same units (animals, persons, patients) have been measured several times, the observations depend and the difference between the means should be tested with the repeated measures anova.
The variation of the data can be devided into between persons and within people variation:
SS(Total) = SS(Between people) + SS(Within people) ,
where the Within people variation can be devided in to repeated and residual varition:
SS(Within people) = SS(Repeated) + SS(Residual).
The idea is to compre the repeated variation to the residual variation, when the between people variation is taken from the total variation.
Example From the earlier exaple for the pairet t-test about the sleepin pills with repeated measures anova.
Within people sum of squares can be calculated subtracting the between people SS from the total SS:
In this exaple it is
SS(Within people) = (72+62+32+32+62+52+...+52 ) - (132+62+ ... +112)/2 = 6.0 ,
witch is devided into repeated SS and residual SS:
SS(Residual) = 6.0 - 5.0 = 1.0
The sum of squares are devided by the degreece of freedoms 5.0/1 = 5.0 and 1.0/9 = 0.111 and we get the mean sum of squares (MS). The ratio of the mean squares F = 5.0 / 0.111 = 45 is F-distributed with degreece of freedom 1 and 9. If the test value F is small, the means do not differ, but if F is large, the means differ. From the F-distribution we get p = P(F > 45) = 0.000087. The result is same as in the paired t-test and if we had larger calculation accuracy, the p-value would be the same. We can do the repeated measures anova in the SPSS choosing SCALE (the first listing) or choosing Anova Models Repeated measures (the second listing):
Source of Variation Sum of Sq. DF Mean Square F Prob. Between People 67.2000 9 7.4667 Within People 6.0000 10 .6000 Between Measures 5.0000 1 5.0000 45.0000 .0001 Residual 1.0000 9 .1111 Total 73.2000 19 3.8526
Tests of Significance for T1 using UNIQUE sums of squares
Source of Variation SS DF MS F Sig of F WITHIN+RESIDUAL 67.20 9 7.47 CONSTANT 540.80 1 540.80 72.43 .000
Tests of Significance for T2 using UNIQUE sums of squares
Source of Variation SS DF MS F Sig of F WITHIN+RESIDUAL 1.00 9 .11 FACTOR1 5.00 1 5.00 45.00 .000
The first is from the SCALE listing, where we can see all the sums. The second is ANOVA listing a test if the allover mean (CONSTANT) differs from zero. The third is also ANOVA listing.
Schematic representation of the analysis:
There is all the same problems and possibilities in the repeated measurements pairwise comprisons as in the independendent groups anova. Unfortunately the "usual" statistical packages (like SPSS and SAS) do not offer the pairwise prosedures in the repeated anova like in the independent one-way anova. The easiest way to compare the means in the dependent measurements is to use t-tests with Bonferroni correction. Neuman - Keuls method for the pairwise comparison is to calculate the difference between the means and devide it with the adjusted residual SS term:
The statistic qr is F -distributed with degreese of reedom number of measurement and MS(Residual).
Example Three repeated measurements in the hart rate with SPSS (repeated anova):
Orthonormalized Transformation Matrix (Transposed)
T1 T2 T3 RHR0 .577 .816 .000 RHR1 .577 .408 .707 RHR2 .577 .408 .707
Univariate F tests with (1,39) D. F.
Variable Hypoth. SS Error SS Hypoth. MS Error MS F Sig. of F T2 212.81667 1528.18333 212.81667 39.18419 5.43119 .025 T3 96.80000 2244.20000 96.80000 57.54359 1.68220 .202
AVERAGED Tests of Significance for RHR using UNIQUE sums of squares
Source of Variation SS DF MS F Sig of F WITHIN CELLS 3772.38 78 48.36 FACTOR1 309.62 2 154.81 3.20 .046
Variable Mean Std Dev Minimum Maximum N Label RHR0 68.55 11.62 47 97 40 RHR1 66.82 12.04 44 92 40 RHR2 64.63 10.11 50 98 40
We can see from the anova table that the means differ with the significant level p = 0.046. For the contrasts there is printed the transformation matrix, that shows that the cantrast T2 compares the 1. measurement to the others and the contrast T3 compares 2. and 3. measurements. Here the first measurement (mean) differs from the others (p = 0.025).
The repeated measurements anova assumes the variables distributed with normal distribution and the variaces to be equal. The researcher can test the normal distribution asuumption with Kruskall-Wallis test and the equality of the variances with Mauchly spericity test. The SPSS repeated measurements anova presents the Mauchly test:
Tests involving 'FACTOR1' WithinSubject Effect.
Mauchly sphericity test, W = .93476 Chisquare approx. = 2.56370 with 2 D. F. Significance = .278
GreenhouseGeisser Epsilon = .93876 HuynhFeldt Epsilon = .98458 Lowerbound Epsilon = .50000
AVERAGED Tests of Significance that follow multivariate tests are equivalent to univariate or splitplot or mixedmodel approach to repeated measures. Epsilons may be used to adjust d.f. for the AVERAGED results.
The variances are equal with p = 0.278 . If the variances would be unequal (p < 0.05), we could adjust the degreece of freedoms in the anova table with the epsilon coefficient with multiplying the degreece of freedoms with epsilon.
If the researcher has the independent groups and the repeated measurements, the most important hypothese is to test if the change is different in the groups. Here we can use the mixed model anova, where the model is mixed from the independent factors and dependent factors.
Example Partly prepared data, where rats has three groups (control and two treatments) and three measurements. The response variables is a latens in the labyrint:
Tests of BetweenSubjects Effects.
Tests of Significance for T1 using UNIQUE sums of squares Source of Variation SS DF MS F Sig of F WITHIN CELLS 2332270.92 122 19116.97 CONSTANT 7818915.35 1 7818915.3 409.00 .000 GROUP 141934.33 2 70967.16 3.71 .027
Tests involving 'FACTOR1' WithinSubject Effect.
Mauchly sphericity test, W = .89427 Chisquare approx. = 13.52150 with 2 D. F. Significance = .001
GreenhouseGeisser Epsilon = .90438 HuynhFeldt Epsilon = .93224 Lowerbound Epsilon = .50000
AVERAGED Tests of Significance that follow multivariate tests are equivalent to univariate or splitplot or mixedmodel approach to repeated measures. Epsilons may be used to adjust d.f. for the AVERAGED results.
AVERAGED Tests of Significance for LAT using UNIQUE sums of squares
Source of Variation SS DF MS F Sig of F WITHIN CELLS 1989646.23 244 8154.29 FACTOR1 397285.84 2 198642.92 24.36 .000 GROUP BY FACTOR1 62465.36 4 15616.34 1.92 .109
In the first we can see that the groups differ (p = 0.027) over all the measurements. The variance test (Mauchly) tells that the variances differ and we must use the adjustments (or to use MANOVA). In the last is the mixed model anova listing, where we must adjust the df:s (0.90438 * 244 = 220.669, 0.90438 * 2 = 1.80856 and 0.90438 * 4 = 3.618) and we get p < 0.000001 and p = 0.115. So there has been statistically significant change in heart rate, but the cange has not been different in the groups.
Here is a graph from the previous means
The graph shows, that the last measurement has the smallest mean. The group 1 seems to have largest means and the group 3 the smallest. Event the trens looks to be different, the interaction effect is not statistically significant. We can compare the trend of the group 1 to the other group trend with contrasts:
Estimates for T3 (CONT.) GROUP BY FACTOR1 Parameter Coeff. Std. Err. t Value Sig. t Lower 95% CL Upper
2 25.614178 16.85742 1.51946 .131 58.98513 7.75677 3 37.564796 16.65225 2.25584 .026 70.52959 4.60000
The parameter 3 (can bee seen in the transformation matrix) compares the first group (control group) to the others and the the estimate is the interaction effect. The p-value 0.025 tells, that the change in the group 1 is different from the othser groups.
If the variances in the repeated measurements anova are not equal (Maucly test) we can use the multivariate analysis of variance model to test the differences. The MANOVA is not also so sensitive to the normal distribution assumption as anova. The MANOVA in not only a alternative to anova. We can explain several response variables in one MANOVA and to do several kind of analysis. The basic idea of MANOVA is linear model. For exaple a model for repeated measurements with several groups can be written
where the parameters µ,
,
, ß,
ß,...
tells the difference between the means in the groups and the measurements.
The we just test the significance of the parameter (if the parameter differs
from the zero). The tests are called with the names of the persons, who
have found the test (Pillais, Hotellings and Wilks). The assumption of
the tests is the normal distribution of the residual. Here is MANOVA listing
from the previous rat exaple:
* * ANALYSIS OF VARIANCE DESIGN 1 * *
EFFECT .. GROUP BY FACTOR1
Multivariate Tests of Significance (S = 2, M = 1/2, N = 59 1/2)
Test Name Value Approx. F Hypoth. DF Error DF Sig. of F
Pillais .07250 2.29456 4.00 244.00 .060 Hotellings .07668 2.30054 4.00 240.00 .059 Wilks .92816 2.29773 4.00 242.00 .060 Roys .06175
EFFECT .. FACTOR1
Multivariate Tests of Significance (S = 1, M = 0, N = 59 1/2)
Test Name Value Approx. F Hypoth. DF Error DF Sig. of F
Pillais .32139 28.65220 2.00 121.00 .000 Hotellings .47359 28.65220 2.00 121.00 .000 Wilks .67861 28.65220 2.00 121.00 .000 Roys .32139
The first is the test of the interaction effect and it gives p = 0.06, so the canges has been closely statistically different. This result is closer to the previous analyse form the graph. The small different in the results may be because the response variable (latent) in not very normal distributes and the MANOVA works better here. The second is the test of the repeated measurement parameter and it gives the same result (p < 0.001) as the mixed model anova.