MATH 376 -- Probability and Statistics 2 

Multiple Regression and Hypothesis testing  

April 26, 2010 

 

We have two strains of bacteria and want to compare their growth rates.  If we do  

simple regression with each data set as follows: 

> restart;
 

> with(Statistics):
 

53104469 (1)
 

> times:=[-2,-1,0,1,2];
 

[-2, -1, 0, 1, 2] (2)
 

> A:=[8.0,9.0,9.1,10.2,10.4];
 

[8.0, 9.0, 9.1, 10.2, 10.4] (3)
 

> B:=[10.0,10.3,12.2,12.6,13.9];
 

[10.0, 10.3, 12.2, 12.6, 13.9] (4)
 

> Tbar:=Mean(times);
 

0. (5)
 

> Abar:=Mean(A);
 

9.340000000 (6)
 

> Bbar:=Mean(B);
 

11.80000000 (7)
 

> SXYA:=add((times[i]-Tbar)*(A[i]-Abar),i=1..5);
 

6.000000000 (8)
 

> SXXA:=add((times[i]-Tbar)^2,i=1..5);
 

10. (9)
 

> Ahatbeta[1]:=SXYA/SXXA;
 

.6000000000 (10)
 

> Ahatbeta[0]:=Abar-Ahatbeta[1]*Tbar;
 

9.340000000 (11)
 

> SXYB:=add((times[i]-Tbar)*(B[i]-Bbar),i=1..5);
 

10.10000000 (12)
 

> SXXB:=SXXA;
 

10. (13)
 

> Bhatbeta[1]:=SXYB/SXXB;
 

1.010000000 (14)
 

> Bhatbeta[0]:=Bbar-Bhatbeta[1]*Tbar;
 

11.80000000 (15)
 

> APoints:=[seq([times[i],A[i]],i=1..5)];
 

[[-2, 8.0], [-1, 9.0], [0, 9.1], [1, 10.2], [2, 10.4]] (16)
 

> BPoints:=[seq([times[i],B[i]],i=1..5)];
 

[[-2, 10.0], [-1, 10.3], [0, 12.2], [1, 12.6], [2, 13.9]] (17)
 

> with(plots):
 

> AP:=plot(APoints,style=point,symbol=circle,color=blue):
 

> BP:=plot(BPoints,style=point,symbol=circle,color=red):
 

> AL:=plot(Ahatbeta[0]+Ahatbeta[1]*t,t=-3..3,color=blue):
 

> BL:=plot(Bhatbeta[0]+Bhatbeta[1]*t,t=-3..3,color=red):
 

> display(AP,BP,AL,BL);
 

Plot_2d
 

Note:  From this plot, it seems that the lines have different slopes, indicating different growth rates. 

 

How can we test whether this is statistically significant though?    Here is one method, where we 

incorporate both data sets into a single set, but include a "category variable" indicating 

which data set the point came from.  Let x[1] = 0  if the measurement is from type A, and  

x[1] = 1 if the measurement is from type B. Then  x[2]  will indicate the time, and  Y will 

be the number of bacteria.   The following type of linear model will  

allow us to fit the straight lines for both data types, and compare their slopes: 

 

     Y = `+`(beta[0], Typesetting:-delayDotProduct(beta[1], x[1], true), Typesetting:-delayDotProduct(beta[2], x[2], true), `*`(Typesetting:-delayDotProduct(beta[3], x[1], true), `*`(x[2]))) + epsilon 

 

Why do we do it this way? Note:  if x[1] = 0  (type A), we get Y = `+`(beta[0], Typesetting:-delayDotProduct(beta[2], x[2], true)) 

On the other hand if x[1] = 1  (type B), then we get  Y = `+`(beta[0], beta[1], `*`(`+`(beta[2], beta[3]), `*`(x[2]))) 

To compare the two slopes we just want to look at `+`(`+`(beta[2], beta[3]), `+`(`-`(beta[2]))) = beta[3].  We know 

how to set up a test of H[0] : beta[3] = 0  versus  H[a] : `<>`(beta[3], 0)  based on our general  

discussions(!)  Now we go to the matrix formulation for the multiple regression 

model.  We will set up  X  putting the data points for type A first, then type B, 

but any order would give equivalent results: 

> with(LinearAlgebra):
 

> X:=Matrix([[1,0,-2,0],[1,0,-1,0],[1,0,0,0],[1,0,1,0],[1,0,2,0],[1,1,-2,-2],[1,1,-1,-1],[1,1,0,0],[1,1,1,1],[1,1,2,2]]):
 

> Y:=Matrix([[8.0],[9.0],[9.1],[10.2],[10.4],[10.0],[10.3],[12.2],[12.6],[13.9]]):
 

The normal equations: 

> XtX:=Multiply(Transpose(X),X);
 

Matrix(%id = 164463160) (18)
 

> XtY:=Multiply(Transpose(X),Y);
 

Matrix(%id = 164463288) (19)
 

> beta:=LinearSolve(XtX,XtY);
 

Matrix(%id = 164464056) (20)
 

Note that, contained here are the values for the two slopes we saw before(!) 

The estimate for beta[3] 

 

Note this is   (0,0,0,1) beta,  so we take  `^`(a, t) = (0, 0, 0, 1)  in our general formulas: 

> a:=Matrix([[0],[0],[0],[1]]):
 

> S2:=1/(10-(3+1))*(Multiply(Transpose(Y),Y)[1,1] - Multiply(Multiply(Transpose(beta),Transpose(X)),Y)[1,1]);
 

.1218334 (21)
 

> t:=beta[4,1]/sqrt(S2*Multiply(Transpose(a),LinearSolve(XtX,a))[1,1]);
 

2.626550025 (22)
 

With  10 - (3+1) = 6  d.f.  this gives a p - value  less than  0.05: 

> T6:=RandomVariable(StudentT(6));
 

_R (23)
 

> 2*(1 - CDF(T6,t));
 

0.39240448e-1 (24)
 

So at alpha = .05 level, for instance, this would be sufficient evidence to reject  

H[0]  and conclude  `<>`(beta[3], 0).  As above, this indicates that the rates of growth 

of the two strains of bacteria are different. 

>