MATH 376 -- Probability and Statistics II
ANOVA for testing linearity of regression
April 26, 2004
The data XList represents a list of temperatures in degrees C.
There are three repeated entries of each value because
we have three different YList elements (yields from a
chemical reaction) for each distinct x.
> | restart; |
> | read "/home/fac/little/public_html/ProbStat/MaplePackage/MSP.map"; |
Warning, the name changecoords has been redefined
The temperatures:
> | XList:=evalf([150,150,150,200,200,200,250,250,250,300,300,300]); |
The yields:
> | YList:=[77.4,76.7,78.2,84.1,84.5,83.7,88.9,89.2,89.7,94.8,94.7,95.9]; |
First we compute the equation of the regression line corresponding to this data:
> | YM:=Mean(YList); |
> | XM:=Mean(XList); |
> | Sxx:=add((XList[j]-XM)^2,j=1..12); |
> | Sxy:=add((XList[j]-XM)*(YList[j]-YM),j=1..12); |
> | Syy:=add((YList[j]-YM)^2,j=1..12); |
> | beta[1]:=Sxy/Sxx; |
> | beta[0]:=YM-beta[1]*XM; |
So the regression line is y = + x
Now, the question is: How well does a linear model actually fit this
data? Here are the ANOVA computations. First we find the means
of the three y 's for each distinct x
> | Ybar[1]:=Mean(YList[1..3]); |
> | Ybar[2]:=Mean(YList[4..6]); |
> | Ybar[3]:=Mean(YList[7..9]); |
> | Ybar[4]:=Mean(YList[10..12]); |
and the ``pure error'' sum of squares:
> | SSEPure:=add(add((YList[3*(j-1)+i]-Ybar[j])^2,i=1..3),j=1..4); |
(this can be used to get an unbiased estimator of -- variance of
error in this case where we repeat the experiment a number of times
with each )
Next, we start the ANOVA procedure. Compute the SSE first:
> | SSE:=Syy-beta[1]*Sxy; |
The ``lack of fit'' component of the error is SSE - SSEPure:
> | LOF:=SSE-SSEPure; |
We have n = 12, k = 4 here. LOF has a distribution with k - 2 = 2
d.f. and SSEPure has a distribution with n - k = 8 d.f.
The ratio [(SSE-SSEPure)/2]/[SSEPure/8] has an F-distribution.
> | FLOF:=(SSE-SSEPure)/(2*SSEPure/8); |
This is the test statistic for the ANOVA test for linearity of
regression (a one-tailed F- test) . is the hypothesis that the linear model is
adequate; the alternative is that it is not. The p- value of the
test is rather large:
> | 1-FCDF(2,8,FLOF); |
We conclude that the linear model is reasonably appropriate for this data.