MATH 376 -- Probability and Statistics II

ANOVA for testing linearity of regression

April 26, 2004

The data XList represents a list of temperatures in degrees C.

There are three repeated entries of each value because

we have three different YList elements (yields from a

chemical reaction) for each distinct x.

>    restart;

>    read "/home/fac/little/public_html/ProbStat/MaplePackage/MSP.map";

Warning, the name changecoords has been redefined

285282147

The temperatures:

>    XList:=evalf([150,150,150,200,200,200,250,250,250,300,300,300]);

XList := [150., 150., 150., 200., 200., 200., 250., 250., 250., 300., 300., 300.]

The yields:

>    YList:=[77.4,76.7,78.2,84.1,84.5,83.7,88.9,89.2,89.7,94.8,94.7,95.9];

YList := [77.4, 76.7, 78.2, 84.1, 84.5, 83.7, 88.9, 89.2, 89.7, 94.8, 94.7, 95.9]

First we compute the equation of the regression line corresponding to this data:

>    YM:=Mean(YList);

YM := 86.48333333

>    XM:=Mean(XList);

XM := 225.0000000

>    Sxx:=add((XList[j]-XM)^2,j=1..12);

Sxx := 37500.00000

>    Sxy:=add((XList[j]-XM)*(YList[j]-YM),j=1..12);

Sxy := 4370.000000

>    Syy:=add((YList[j]-YM)^2,j=1..12);

Syy := 513.1166668

>    beta[1]:=Sxy/Sxx;

beta[1] := .1165333333

>    beta[0]:=YM-beta[1]*XM;

beta[0] := 60.26333334

So the regression line is   y = 60.26333334  + .1165333333   x

Now, the question is:  How well does a linear model actually fit this

data?  Here are the ANOVA computations.  First we find the means

of the three y 's for each distinct x

>    Ybar[1]:=Mean(YList[1..3]);

Ybar[1] := 77.43333333

>    Ybar[2]:=Mean(YList[4..6]);

Ybar[2] := 84.10000000

>    Ybar[3]:=Mean(YList[7..9]);

Ybar[3] := 89.26666667

>    Ybar[4]:=Mean(YList[10..12]);

Ybar[4] := 95.13333333

and the ``pure error'' sum of squares:

>    SSEPure:=add(add((YList[3*(j-1)+i]-Ybar[j])^2,i=1..3),j=1..4);

SSEPure := 2.660000001

(this can be used to get an unbiased estimator of sigma^2  -- variance of

error in this case where we repeat the experiment a number of times

with each   x[i]  )

Next, we start the ANOVA procedure.  Compute the SSE first:

>    SSE:=Syy-beta[1]*Sxy;

SSE := 3.8660003

The ``lack of fit'' component of the error is SSE - SSEPure:

>    LOF:=SSE-SSEPure;

LOF := 1.206000299

We have  n = 12,  k = 4  here.  LOF  has a chi^2   distribution with k - 2 = 2

d.f. and  SSEPure  has a   chi^2   distribution with   n - k = 8   d.f.

The ratio  [(SSE-SSEPure)/2]/[SSEPure/8]  has an F-distribution.

>    FLOF:=(SSE-SSEPure)/(2*SSEPure/8);

FLOF := 1.813534284

This is the test statistic for the ANOVA test for linearity of

regression (a one-tailed   F- test) .   H[0]   is the hypothesis that the linear model is

adequate;  the alternative is that it is not.  The  p- value of the

test is rather large:

>    1-FCDF(2,8,FLOF);

.2241191741

We conclude that the linear model is reasonably appropriate for this data.