MultReg.html

MATH 376 -- Probability and Statistics 2

Multiple Regression and Hypothesis testing

April 12, 2006

The following comes from problem 11.79 in our text.

We have two strains of bacteria and want to compare

their growth rates. If we do simple regression with

each data set as follows:

> restart;

> read "/home/fac/little/public_html/ProbStat0506/MSP.map";

Warning, the name changecoords has been redefined

> times:=[-2,-1,0,1,2];

> A:=[8.0,9.0,9.1,10.2,10.4];

> B:=[10.0,10.3,12.2,12.6,13.9];

> Tbar:=Mean(times);

> Abar:=Mean(A);

> Bbar:=Mean(B);

> SXYA:=add((times[i]-Tbar)*(A[i]-Abar),i=1..5);

> SXXA:=add((times[i]-Tbar)^2,i=1..5);

> Ahatbeta[1]:=SXYA/SXXA;

> Ahatbeta[0]:=Abar-Ahatbeta[1]*Tbar;

> SXYB:=add((times[i]-Tbar)*(B[i]-Bbar),i=1..5);

> SXXB:=SXXA;

> Bhatbeta[1]:=SXYB/SXXB;

> Bhatbeta[0]:=Bbar-Bhatbeta[1]*Tbar;

> APoints:=[seq([times[i],A[i]],i=1..5)];

> BPoints:=[seq([times[i],B[i]],i=1..5)];

> with(plots):

> AP:=plot(APoints,style=point,symbol=circle,color=blue):

> BP:=plot(BPoints,style=point,symbol=circle,color=red):

> AL:=plot(Ahatbeta[0]+Ahatbeta[1]*t,t=-3..3,color=blue):

> BL:=plot(Bhatbeta[0]+Bhatbeta[1]*t,t=-3..3,color=red):

> display(AP,BP,AL,BL);

Note: From this plot, it seems that the lines have different slopes , indicating different growth rates .

How can we test whether this is statistically significant though ? Here is one method, where we

incorporate both data sets into a single set, but include a "category variable" indicating

which data set the point came from. Let if the measurement is from type A, and

if the measurement is from type B. Then will indicate the time, and Y will

be the number of bacteria. The following type of linear model will

allow us to fit the straight lines for both data types, and compare their slopes:

Y = +

Why do we do it this way? Note: if (type A), we get

On the other hand if (type B), then we get

To compare the two slopes we just want to look at = . We know

how to set up a test of : versus : based on our general

discussions(!) Now we go to the matrix formulation for the multiple regression

model. We will set up X putting the data points for type A first, then type B,

but any order would give equivalent results:

> with(linalg):

Warning, the protected names norm and trace have been redefined and unprotected

> X:=matrix([[1,0,-2,0],[1,0,-1,0],[1,0,0,0],[1,0,1,0],[1,0,2,0],[1,1,-2,-2],[1,1,-1,-1],[1,1,0,0],[1,1,1,1],[1,1,2,2]]):

> Y:=matrix([[8.0],[9.0],[9.1],[10.2],[10.4],[10.0],[10.3],[12.2],[12.6],[13.9]]):

The normal equations:

> XtX:=multiply(transpose(X),X);

> XtY:=multiply(transpose(X),Y);

> beta:=linsolve(XtX,XtY);

Note that, contained here are the values for the two slopes we saw before(!)

The estimate for

Note this is (0,0,0,1) , so we take in our general formulas:

> a:=matrix([[0],[0],[0],[1]]):

> S2:=1/(10-(3+1))*(multiply(transpose(Y),Y)[1,1] - multiply(multiply(transpose(beta),transpose(X)),Y)[1,1]);

> t:=beta[4,1]/sqrt(S2*multiply(transpose(a),multiply(inverse(XtX),a))[1,1]);

With 10 - (4+1) = 5 d.f. this gives a p - value less than 0.05:

> 2*(1 - TCDF(5,t));

>

So at = .05 level, for instance, this would be sufficient evidence to reject

and conclude . As above, this indicates that the rates of growth

of the two strains of bacteria are different.