MATH 376 -- Probability and Statistics 2

Multiple Regression and Hypothesis testing

April 12, 2006

The following comes from problem 11.79 in our text.

We have two strains of bacteria and want to compare

their growth rates.  If we do simple regression with

each data set as follows:

>    restart;

>    read "/home/fac/little/public_html/ProbStat0506/MSP.map";

Warning, the name changecoords has been redefined

158078419

>    times:=[-2,-1,0,1,2];

times := [-2, -1, 0, 1, 2]

>    A:=[8.0,9.0,9.1,10.2,10.4];

A := [8.0, 9.0, 9.1, 10.2, 10.4]

>    B:=[10.0,10.3,12.2,12.6,13.9];

B := [10.0, 10.3, 12.2, 12.6, 13.9]

>    Tbar:=Mean(times);

Tbar := 0

>    Abar:=Mean(A);

Abar := 9.340000000

>    Bbar:=Mean(B);

Bbar := 11.80000000

>    SXYA:=add((times[i]-Tbar)*(A[i]-Abar),i=1..5);

SXYA := 6.000000000

>    SXXA:=add((times[i]-Tbar)^2,i=1..5);

SXXA := 10

>    Ahatbeta[1]:=SXYA/SXXA;

Ahatbeta[1] := .6000000000

>    Ahatbeta[0]:=Abar-Ahatbeta[1]*Tbar;

Ahatbeta[0] := 9.340000000

>    SXYB:=add((times[i]-Tbar)*(B[i]-Bbar),i=1..5);

SXYB := 10.10000000

>    SXXB:=SXXA;

SXXB := 10

>    Bhatbeta[1]:=SXYB/SXXB;

Bhatbeta[1] := 1.010000000

>    Bhatbeta[0]:=Bbar-Bhatbeta[1]*Tbar;

Bhatbeta[0] := 11.80000000

>    APoints:=[seq([times[i],A[i]],i=1..5)];

APoints := [[-2, 8.0], [-1, 9.0], [0, 9.1], [1, 10.2], [2, 10.4]]

>    BPoints:=[seq([times[i],B[i]],i=1..5)];

BPoints := [[-2, 10.0], [-1, 10.3], [0, 12.2], [1, 12.6], [2, 13.9]]

>    with(plots):

>    AP:=plot(APoints,style=point,symbol=circle,color=blue):

>    BP:=plot(BPoints,style=point,symbol=circle,color=red):

>    AL:=plot(Ahatbeta[0]+Ahatbeta[1]*t,t=-3..3,color=blue):

>    BL:=plot(Bhatbeta[0]+Bhatbeta[1]*t,t=-3..3,color=red):

>    display(AP,BP,AL,BL);

[Maple Plot]

Note:  From this plot, it seems that the lines have different slopes , indicating different growth rates .

How can we test whether this is statistically significant though ?    Here is one method, where we

incorporate both data sets into a single set, but include a "category variable" indicating

which data set the point came from.  Let x[1] = 0   if the measurement is from type A, and

x[1] = 1  if the measurement is from type B. Then   x[2]   will indicate the time, and  Y will

be the number of bacteria.   The following type of linear model will

allow us to fit the straight lines for both data types, and compare their slopes:

      Y = beta[0]+beta[1]*x[1]+beta[2]*x[2]+beta[3]*x[1]*x[2]  + epsilon

Why do we do it this way?  Note:  if x[1] = 0   (type A), we get Y = beta[0]+beta[2]*x[2]

On the other hand if x[1] = 1   (type B), then we get   Y = beta[0]+beta[1]+(beta[2]+beta[3])*x[2]

To compare the two slopes we just want to look at beta[3]+beta[2]-beta[2]  = beta[3] .  We know

how to set up a test of H[0]  : beta[3] = 0   versus   H[a]  : beta[3] <> 0   based on our general

discussions(!)  Now we go to the matrix formulation for the multiple regression

model.  We will set up   X   putting the data points for type A first, then type B,

but any order would give equivalent results:

>    with(linalg):

Warning, the protected names norm and trace have been redefined and unprotected

>    X:=matrix([[1,0,-2,0],[1,0,-1,0],[1,0,0,0],[1,0,1,0],[1,0,2,0],[1,1,-2,-2],[1,1,-1,-1],[1,1,0,0],[1,1,1,1],[1,1,2,2]]):

>    Y:=matrix([[8.0],[9.0],[9.1],[10.2],[10.4],[10.0],[10.3],[12.2],[12.6],[13.9]]):

The normal equations:

>    XtX:=multiply(transpose(X),X);

XtX := matrix([[10, 5, 0, 0], [5, 5, 0, 0], [0, 0, 20, 10], [0, 0, 10, 10]])

>    XtY:=multiply(transpose(X),Y);

XtY := matrix([[105.7], [59.0], [16.1], [10.1]])

>    beta:=linsolve(XtX,XtY);

beta := matrix([[9.340000000], [2.460000000], [.6000000000], [.4100000000]])

Note that, contained here are the values for the two slopes we saw before(!)

The estimate for beta[3]

Note this is   (0,0,0,1) beta ,  so we take   a^t = (0, 0, 0, 1)   in our general formulas:

>    a:=matrix([[0],[0],[0],[1]]):

>    S2:=1/(10-(3+1))*(multiply(transpose(Y),Y)[1,1] - multiply(multiply(transpose(beta),transpose(X)),Y)[1,1]);

S2 := .1218334

>    t:=beta[4,1]/sqrt(S2*multiply(transpose(a),multiply(inverse(XtX),a))[1,1]);

t := 2.626550025

With  10 - (4+1) = 5  d.f.  this gives a  p - value  less than  0.05:

>    2*(1 - TCDF(5,t));

.46727170e-1

>   

So at alpha  = .05 level, for instance, this would be sufficient evidence to reject

H[0]   and conclude   beta[3] <> 0 .  As above, this indicates that the rates of growth

of the two strains of bacteria are different.