MATH 376 -- Probability and Statistics II

Lab Project 4 Solutions

>    read "/home/fac/little/public_html/ProbStat/MaplePackage/MSP.map";

Warning, the name changecoords has been redefined

140788592

A)  This is a procedure to compute the regression line for a collection

of data points in the plane and plot the points with the line:

>    RegLine:=proc(Xlist,Ylist)

>      local beta,i,Xbar,Ybar,Sxy,Sxx,PP,LP,CP,points;

>      Xbar:=Mean(Xlist);

>      Ybar:=Mean(Ylist);

>      Sxy:=add((Xlist[i]-Xbar)*(Ylist[i]-Ybar),i=1..nops(Xlist));

>      Sxx:=add((Xlist[i]-Xbar)^2,i=1..nops(Xlist));

>      beta[1]:=Sxy/Sxx;

>      beta[0]:=Ybar-beta[1]*Xbar;
  points:=[seq([Xlist[i],Ylist[i]],i=1..nops(Xlist))];
  PP:=plot(points,style=point,color=blue):
  LP:=plot(beta[0]+beta[1]*x,x=min(op(Xlist))-1..max(op(Xlist))+1,color=red):
  CP:=display(PP,LP):
  return beta[0] + beta[1]*x,CP;

>      end proc:

Here's the result on the carbon layer thickness versus soak time data from

the lab problem:  The equation of the line,

>    RL:=RegLine([.58,.66,.66,.66,.66,.66,1.0,1.17,1.17,1.17,1.17,1.17,1.17,1.20,2.00,2.00,

>    2.20,2.20,2.20],[.013,.016,.015,.016,.015,.016,.014,.021,.018,.019,.021,.019,.021,.025,.025,.026,.024,.025,.024]):

>    RL[1];

.1157093340e-1+.6462120909e-2*x

and the plot:

>    display(RL[2]);

[Maple Plot]

>    XL:=[.58,.66,.66,.66,.66,.66,1.0,1.17,1.17,1.17,1.17,1.17,1.17,1.20,2.00,2.00,2.20,2.20,2.20];YL:=[.013,.016,.015,.016,.015,.016,.014,.021,.018,.019,.021,.019,.021,.025,.025,.026,.024,.025,.024];

>   

XL := [.58, .66, .66, .66, .66, .66, 1.0, 1.17, 1.17, 1.17, 1.17, 1.17, 1.17, 1.20, 2.00, 2.00, 2.20, 2.20, 2.20]

YL := [.13e-1, .16e-1, .15e-1, .16e-1, .15e-1, .16e-1, .14e-1, .21e-1, .18e-1, .19e-1, .21e-1, .19e-1, .21e-1, .25e-1, .25e-1, .26e-1, .24e-1, .25e-1, .24e-1]

The step-by-step calculations:

>    Xbar:=Mean(XL);

>    Ybar:=Mean(YL);

>    Sxy:=add((XL[i]-Xbar)*(YL[i]-Ybar),i=1..nops(XL));

>    Sxx:=add((XL[i]-Xbar)^2,i=1..nops(XL));

>    beta[1]:=Sxy/Sxx;

>    beta[0]:=Ybar-beta[1]*Xbar;

Xbar := 1.247368421

Ybar := .1963157895e-1

Sxy := .3958157894e-1

Sxx := 6.125168423

beta[1] := .6462120909e-2

beta[0] := .1157093340e-1

>    Syy:=add((YL[i]-Ybar)^2,i=1..nops(YL));

Syy := .3324210530e-3

>    S2:=(Syy-beta[1]*Sxy)/(nops(XL)-2);

S2 := .4508241418e-5

t -test for H[0] :   beta[1] = 0   versus alternative   beta[1] <> 0

>    t:=beta[1]/(sqrt(S2/Sxx));

t := 7.532350397

>    pval:=2*(1-TCDF(nops(XL)-2,t));

pval := .821e-6

The fact that the   p -value is so small is strong  evidence for rejecting the null hypothesis.

In other words, the soak time definitely seems to be affecting the carbon layer

thickness measurements.  

C)  You summary to the boss should say something like:  The results of our

tests give strong evidence that the length of time that the gears are soaked

increases the carbon layer thickness measurement.  To get reliable results,

it would be good to use just one soak time.

D)  ANOVA for linearity of regression:

>    SSE:=S2*(nops(XL)-2);

SSE := .7664010411e-4

>    XL,YL;

[.58, .66, .66, .66, .66, .66, 1.0, 1.17, 1.17, 1.17, 1.17, 1.17, 1.17, 1.20, 2.00, 2.00, 2.20, 2.20, 2.20], [.13e-1, .16e-1, .15e-1, .16e-1, .15e-1, .16e-1, .14e-1, .21e-1, .18e-1, .19e-1, .21e-1, .19...
[.58, .66, .66, .66, .66, .66, 1.0, 1.17, 1.17, 1.17, 1.17, 1.17, 1.17, 1.20, 2.00, 2.00, 2.20, 2.20, 2.20], [.13e-1, .16e-1, .15e-1, .16e-1, .15e-1, .16e-1, .14e-1, .21e-1, .18e-1, .19e-1, .21e-1, .19...

There are   k = 7   distinct values of x here, but different numbers of

y  for each one.  To compute the SSEPure, we need to

compute the mean of the   y's   for each   x   by picking out

the corresponding sublists:

>    k:=7;

k := 7

>    n:=nops(XL);

n := 19

>    y[1]:=YL[1];

y[1] := .13e-1

>    y[2]:=Mean(YL[2..6]);

y[2] := .1560000000e-1

>    y[3]:=YL[7];

y[3] := .14e-1

>    y[4]:=Mean(YL[8..13]);

y[4] := .1983333333e-1

>    y[5]:=YL[14];

y[5] := .25e-1

>    y[6]:=Mean(YL[15..16]);

y[6] := .2550000000e-1

>    y[7]:=Mean(YL[17..19]);

y[7] := .2433333333e-1

Then adding the squared difference of each   y   from the appropriate

mean:

>    s2:=((YL[1]-y[1])^2+add((YL[i]-y[2])^2,i=2..6)+(YL[7]-y[3])^2+add((YL[i]-y[4])^2,i=8..13)+(YL[14]-y[5])^2+add((YL[i]-y[6])^2,i=15..16)+add((YL[i]-y[7])^2,i=17..19))/(n-k);

s2 := .9333333333e-6

>    SSEP:=s2*(n-k);

SSEP := .1120000000e-4

>    LOFF:=(SSE-SSEP)/(s2*(k-2));

LOFF := 14.02287945

>    1-FCDF(k-2,n-k,LOFF);

.1168649e-3

>    k-2,n-k;

5, 12

Again, strong evidence to reject null hypothesis (here H[0]   is that the

data is well-fit by a linear function).

E)  What about a quadratic?

>    with(linalg):

Warning, the protected names norm and trace have been redefined and unprotected

>    YM:=transpose(matrix([[seq(YL[i],i=1..17)]])):

>    XM:=transpose(matrix([[seq(1,i=1..17)],[seq(XL[i],i=1..17)],[seq(XL[i]^2,i=1..17)]])):

>    XtX:=multiply(transpose(XM),XM);

XtX := matrix([[17, 19.30, 26.0078], [19.30, 26.0078, 40.618270], [26.0078, 40.618270, 70.80442502]])

>    XtXI:=inverse(XtX);

XtXI := matrix([[2.794948388, -4.523492384, 1.568347973], [-4.523492384, 7.690557119, -2.750266530], [1.568347973, -2.750266530, 1.015780977]])

>    XtY:=multiply(transpose(XM),YM);

XtY := matrix([[.324], [.39705], [.5714091]])

The vector of coefficients of quadratic is the solution of the linear

equations   X^t*X*b = X^t*Y

>    b:=linsolve(XtX,XtY);

b := matrix([[.5678929515e-2], [.1639684985e-1], [-.3422088907e-2]])

>    QP:=plot(b[1,1]+b[2,1]*x+b[3,1]*x^2,x=0..3,color=cyan):

>    display(RL[2],QP);

[Maple Plot]

F)  See Example 11.14 for method of computing SSE here.  This is   not  the same as

SSE above because that used the linear model.  Here we want

sum((y[i]-beta[0]-beta[1]*x[i]-beta[2]*x[i]^2)^2,i = 1 .. n)   which can be computed as follows:

Test H[0]  :   beta[2] = 0   against alternative   beta[2] <> 0  .  In general formulas from

page 587, we take  a = ( 0 0 1 )  (transposed to make a column vector).

Then  

>    SSE:=matadd(multiply(transpose(YM),YM),scalarmul(multiply(transpose(b),XtY),-1))[1,1];

SSE := .59070346e-4

n - 3 degrees of freedom:

>    S2:=SSE/16;

S2 := .3691896625e-5

>    t2:=b[3,1]/(sqrt(S2*XtXI[3,3]));

t2 := -1.767121801

>    pval:=2*(1-TCDF(n-3,abs(t2)));

pval := .96272927e-1

Rather weak evidence for rejecting null here.   The quadratic  is not obviously

a lot better fit than the regression line here.  This is confirmed on an intuitive

level by looking at the plot above.