ANOVA.html

MATH 376 -- Probability and Statistics II

ANOVA for testing linearity of regression

April 26, 2004

The data XList represents a list of temperatures in degrees C.

There are three repeated entries of each value because

we have three different YList elements (yields from a

chemical reaction) for each distinct x.

> restart;

> read "/home/fac/little/public_html/ProbStat/MaplePackage/MSP.map";

Warning, the name changecoords has been redefined

The temperatures:

> XList:=evalf([150,150,150,200,200,200,250,250,250,300,300,300]);

The yields:

> YList:=[77.4,76.7,78.2,84.1,84.5,83.7,88.9,89.2,89.7,94.8,94.7,95.9];

First we compute the equation of the regression line corresponding to this data:

> YM:=Mean(YList);

> XM:=Mean(XList);

> Sxx:=add((XList[j]-XM)^2,j=1..12);

> Sxy:=add((XList[j]-XM)*(YList[j]-YM),j=1..12);

> Syy:=add((YList[j]-YM)^2,j=1..12);

> beta[1]:=Sxy/Sxx;

> beta[0]:=YM-beta[1]*XM;

So the regression line is y = + x

Now, the question is: How well does a linear model actually fit this

data? Here are the ANOVA computations. First we find the means

of the three y 's for each distinct x

> Ybar[1]:=Mean(YList[1..3]);

> Ybar[2]:=Mean(YList[4..6]);

> Ybar[3]:=Mean(YList[7..9]);

> Ybar[4]:=Mean(YList[10..12]);

and the ``pure error'' sum of squares:

> SSEPure:=add(add((YList[3*(j-1)+i]-Ybar[j])^2,i=1..3),j=1..4);

(this can be used to get an unbiased estimator of -- variance of

error in this case where we repeat the experiment a number of times

with each )

Next, we start the ANOVA procedure. Compute the SSE first:

> SSE:=Syy-beta[1]*Sxy;

The ``lack of fit'' component of the error is SSE - SSEPure:

> LOF:=SSE-SSEPure;

We have n = 12, k = 4 here. LOF has a distribution with k - 2 = 2

d.f. and SSEPure has a distribution with n - k = 8 d.f.

The ratio [(SSE-SSEPure)/2]/[SSEPure/8] has an F-distribution.

> FLOF:=(SSE-SSEPure)/(2*SSEPure/8);

This is the test statistic for the ANOVA test for linearity of

regression (a one-tailed F- test) . is the hypothesis that the linear model is

adequate; the alternative is that it is not. The p- value of the

test is rather large:

> 1-FCDF(2,8,FLOF);

We conclude that the linear model is reasonably appropriate for this data.