MATH 376 -- Probability and Statistics II
Lab Project 4 Solutions
> | read "/home/fac/little/public_html/ProbStat/MaplePackage/MSP.map"; |
Warning, the name changecoords has been redefined
A) This is a procedure to compute the regression line for a collection
of data points in the plane and plot the points with the line:
> | RegLine:=proc(Xlist,Ylist) |
> | local beta,i,Xbar,Ybar,Sxy,Sxx,PP,LP,CP,points; |
> | Xbar:=Mean(Xlist); |
> | Ybar:=Mean(Ylist); |
> | Sxy:=add((Xlist[i]-Xbar)*(Ylist[i]-Ybar),i=1..nops(Xlist)); |
> | Sxx:=add((Xlist[i]-Xbar)^2,i=1..nops(Xlist)); |
> | beta[1]:=Sxy/Sxx; |
> | beta[0]:=Ybar-beta[1]*Xbar; points:=[seq([Xlist[i],Ylist[i]],i=1..nops(Xlist))]; PP:=plot(points,style=point,color=blue): LP:=plot(beta[0]+beta[1]*x,x=min(op(Xlist))-1..max(op(Xlist))+1,color=red): CP:=display(PP,LP): return beta[0] + beta[1]*x,CP; |
> | end proc: |
Here's the result on the carbon layer thickness versus soak time data from
the lab problem: The equation of the line,
> | RL:=RegLine([.58,.66,.66,.66,.66,.66,1.0,1.17,1.17,1.17,1.17,1.17,1.17,1.20,2.00,2.00, |
> | 2.20,2.20,2.20],[.013,.016,.015,.016,.015,.016,.014,.021,.018,.019,.021,.019,.021,.025,.025,.026,.024,.025,.024]): |
> | RL[1]; |
and the plot:
> | display(RL[2]); |
> | XL:=[.58,.66,.66,.66,.66,.66,1.0,1.17,1.17,1.17,1.17,1.17,1.17,1.20,2.00,2.00,2.20,2.20,2.20];YL:=[.013,.016,.015,.016,.015,.016,.014,.021,.018,.019,.021,.019,.021,.025,.025,.026,.024,.025,.024]; |
> |
The step-by-step calculations:
> | Xbar:=Mean(XL); |
> | Ybar:=Mean(YL); |
> | Sxy:=add((XL[i]-Xbar)*(YL[i]-Ybar),i=1..nops(XL)); |
> | Sxx:=add((XL[i]-Xbar)^2,i=1..nops(XL)); |
> | beta[1]:=Sxy/Sxx; |
> | beta[0]:=Ybar-beta[1]*Xbar; |
> | Syy:=add((YL[i]-Ybar)^2,i=1..nops(YL)); |
> | S2:=(Syy-beta[1]*Sxy)/(nops(XL)-2); |
t -test for : versus alternative
> | t:=beta[1]/(sqrt(S2/Sxx)); |
> | pval:=2*(1-TCDF(nops(XL)-2,t)); |
The fact that the p -value is so small is strong evidence for rejecting the null hypothesis.
In other words, the soak time definitely seems to be affecting the carbon layer
thickness measurements.
C) You summary to the boss should say something like: The results of our
tests give strong evidence that the length of time that the gears are soaked
increases the carbon layer thickness measurement. To get reliable results,
it would be good to use just one soak time.
D) ANOVA for linearity of regression:
> | SSE:=S2*(nops(XL)-2); |
> | XL,YL; |
There are k = 7 distinct values of x here, but different numbers of
y for each one. To compute the SSEPure, we need to
compute the mean of the y's for each x by picking out
the corresponding sublists:
> | k:=7; |
> | n:=nops(XL); |
> | y[1]:=YL[1]; |
> | y[2]:=Mean(YL[2..6]); |
> | y[3]:=YL[7]; |
> | y[4]:=Mean(YL[8..13]); |
> | y[5]:=YL[14]; |
> | y[6]:=Mean(YL[15..16]); |
> | y[7]:=Mean(YL[17..19]); |
Then adding the squared difference of each y from the appropriate
mean:
> | s2:=((YL[1]-y[1])^2+add((YL[i]-y[2])^2,i=2..6)+(YL[7]-y[3])^2+add((YL[i]-y[4])^2,i=8..13)+(YL[14]-y[5])^2+add((YL[i]-y[6])^2,i=15..16)+add((YL[i]-y[7])^2,i=17..19))/(n-k); |
> | SSEP:=s2*(n-k); |
> | LOFF:=(SSE-SSEP)/(s2*(k-2)); |
> | 1-FCDF(k-2,n-k,LOFF); |
> | k-2,n-k; |
Again, strong evidence to reject null hypothesis (here is that the
data is well-fit by a linear function).
E) What about a quadratic?
> | with(linalg): |
Warning, the protected names norm and trace have been redefined and unprotected
> | YM:=transpose(matrix([[seq(YL[i],i=1..17)]])): |
> | XM:=transpose(matrix([[seq(1,i=1..17)],[seq(XL[i],i=1..17)],[seq(XL[i]^2,i=1..17)]])): |
> | XtX:=multiply(transpose(XM),XM); |
> | XtXI:=inverse(XtX); |
> | XtY:=multiply(transpose(XM),YM); |
The vector of coefficients of quadratic is the solution of the linear
equations
> | b:=linsolve(XtX,XtY); |
> | QP:=plot(b[1,1]+b[2,1]*x+b[3,1]*x^2,x=0..3,color=cyan): |
> | display(RL[2],QP); |
F) See Example 11.14 for method of computing SSE here. This is not the same as
SSE above because that used the linear model. Here we want
which can be computed as follows:
Test : against alternative . In general formulas from
page 587, we take a = ( 0 0 1 ) (transposed to make a column vector).
Then
> | SSE:=matadd(multiply(transpose(YM),YM),scalarmul(multiply(transpose(b),XtY),-1))[1,1]; |
n - 3 degrees of freedom:
> | S2:=SSE/16; |
> | t2:=b[3,1]/(sqrt(S2*XtXI[3,3])); |
> | pval:=2*(1-TCDF(n-3,abs(t2))); |
Rather weak evidence for rejecting null here. The quadratic is not obviously
a lot better fit than the regression line here. This is confirmed on an intuitive
level by looking at the plot above.