PS2Lab4Sols.html

MATH 376 -- Probability and Statistics II

Lab Project 4 Solutions

> read "/home/fac/little/public_html/ProbStat/MaplePackage/MSP.map";

Warning, the name changecoords has been redefined

A) This is a procedure to compute the regression line for a collection

of data points in the plane and plot the points with the line:

> RegLine:=proc(Xlist,Ylist)

> local beta,i,Xbar,Ybar,Sxy,Sxx,PP,LP,CP,points;

> Xbar:=Mean(Xlist);

> Ybar:=Mean(Ylist);

> Sxy:=add((Xlist[i]-Xbar)*(Ylist[i]-Ybar),i=1..nops(Xlist));

> Sxx:=add((Xlist[i]-Xbar)^2,i=1..nops(Xlist));

> beta[1]:=Sxy/Sxx;

>      beta[0]:=Ybar-beta[1]*Xbar;
  points:=[seq([Xlist[i],Ylist[i]],i=1..nops(Xlist))];
  PP:=plot(points,style=point,color=blue):
  LP:=plot(beta[0]+beta[1]*x,x=min(op(Xlist))-1..max(op(Xlist))+1,color=red):
  CP:=display(PP,LP):
  return beta[0] + beta[1]*x,CP;

> end proc:

Here's the result on the carbon layer thickness versus soak time data from

the lab problem: The equation of the line,

> RL:=RegLine([.58,.66,.66,.66,.66,.66,1.0,1.17,1.17,1.17,1.17,1.17,1.17,1.20,2.00,2.00,

> 2.20,2.20,2.20],[.013,.016,.015,.016,.015,.016,.014,.021,.018,.019,.021,.019,.021,.025,.025,.026,.024,.025,.024]):

> RL[1];

and the plot:

> display(RL[2]);

> XL:=[.58,.66,.66,.66,.66,.66,1.0,1.17,1.17,1.17,1.17,1.17,1.17,1.20,2.00,2.00,2.20,2.20,2.20];YL:=[.013,.016,.015,.016,.015,.016,.014,.021,.018,.019,.021,.019,.021,.025,.025,.026,.024,.025,.024];

>

The step-by-step calculations:

> Xbar:=Mean(XL);

> Ybar:=Mean(YL);

> Sxy:=add((XL[i]-Xbar)*(YL[i]-Ybar),i=1..nops(XL));

> Sxx:=add((XL[i]-Xbar)^2,i=1..nops(XL));

> beta[1]:=Sxy/Sxx;

> beta[0]:=Ybar-beta[1]*Xbar;

> Syy:=add((YL[i]-Ybar)^2,i=1..nops(YL));

> S2:=(Syy-beta[1]*Sxy)/(nops(XL)-2);

t -test for : versus alternative

> t:=beta[1]/(sqrt(S2/Sxx));

> pval:=2*(1-TCDF(nops(XL)-2,t));

The fact that the p -value is so small is strong evidence for rejecting the null hypothesis.

In other words, the soak time definitely seems to be affecting the carbon layer

thickness measurements.

C) You summary to the boss should say something like: The results of our

tests give strong evidence that the length of time that the gears are soaked

increases the carbon layer thickness measurement. To get reliable results,

it would be good to use just one soak time.

D) ANOVA for linearity of regression:

> SSE:=S2*(nops(XL)-2);

> XL,YL;

There are k = 7 distinct values of x here, but different numbers of

y for each one. To compute the SSEPure, we need to

compute the mean of the y's for each x by picking out

the corresponding sublists:

> k:=7;

> n:=nops(XL);

> y[1]:=YL[1];

> y[2]:=Mean(YL[2..6]);

> y[3]:=YL[7];

> y[4]:=Mean(YL[8..13]);

> y[5]:=YL[14];

> y[6]:=Mean(YL[15..16]);

> y[7]:=Mean(YL[17..19]);

Then adding the squared difference of each y from the appropriate

mean:

> s2:=((YL[1]-y[1])^2+add((YL[i]-y[2])^2,i=2..6)+(YL[7]-y[3])^2+add((YL[i]-y[4])^2,i=8..13)+(YL[14]-y[5])^2+add((YL[i]-y[6])^2,i=15..16)+add((YL[i]-y[7])^2,i=17..19))/(n-k);

> SSEP:=s2*(n-k);

> LOFF:=(SSE-SSEP)/(s2*(k-2));

> 1-FCDF(k-2,n-k,LOFF);

> k-2,n-k;

Again, strong evidence to reject null hypothesis (here is that the

data is well-fit by a linear function).

E) What about a quadratic?

> with(linalg):

Warning, the protected names norm and trace have been redefined and unprotected

> YM:=transpose(matrix([[seq(YL[i],i=1..17)]])):

> XM:=transpose(matrix([[seq(1,i=1..17)],[seq(XL[i],i=1..17)],[seq(XL[i]^2,i=1..17)]])):

> XtX:=multiply(transpose(XM),XM);

> XtXI:=inverse(XtX);

> XtY:=multiply(transpose(XM),YM);

The vector of coefficients of quadratic is the solution of the linear

equations

> b:=linsolve(XtX,XtY);

> QP:=plot(b[1,1]+b[2,1]*x+b[3,1]*x^2,x=0..3,color=cyan):

> display(RL[2],QP);

F) See Example 11.14 for method of computing SSE here. This is not the same as

SSE above because that used the linear model. Here we want

which can be computed as follows:

Test : against alternative . In general formulas from

page 587, we take a = ( 0 0 1 ) (transposed to make a column vector).

Then

> SSE:=matadd(multiply(transpose(YM),YM),scalarmul(multiply(transpose(b),XtY),-1))[1,1];

n - 3 degrees of freedom:

> S2:=SSE/16;

> t2:=b[3,1]/(sqrt(S2*XtXI[3,3]));

> pval:=2*(1-TCDF(n-3,abs(t2)));

Rather weak evidence for rejecting null here. The quadratic is not obviously

a lot better fit than the regression line here. This is confirmed on an intuitive

level by looking at the plot above.