MATH 376 -- Probability and Statistics 2
Multiple Regression and Hypothesis testing
April 26, 2010
We have two strains of bacteria and want to compare their growth rates. If we do
simple regression with each data set as follows:
![53104469](images/HypTestMultReg_1.gif) |
(1) |
![[-2, -1, 0, 1, 2]](images/HypTestMultReg_2.gif) |
(2) |
> |
A:=[8.0,9.0,9.1,10.2,10.4]; |
![[8.0, 9.0, 9.1, 10.2, 10.4]](images/HypTestMultReg_3.gif) |
(3) |
> |
B:=[10.0,10.3,12.2,12.6,13.9]; |
![[10.0, 10.3, 12.2, 12.6, 13.9]](images/HypTestMultReg_4.gif) |
(4) |
![0.](images/HypTestMultReg_5.gif) |
(5) |
![9.340000000](images/HypTestMultReg_6.gif) |
(6) |
![11.80000000](images/HypTestMultReg_7.gif) |
(7) |
> |
SXYA:=add((times[i]-Tbar)*(A[i]-Abar),i=1..5); |
![6.000000000](images/HypTestMultReg_8.gif) |
(8) |
> |
SXXA:=add((times[i]-Tbar)^2,i=1..5); |
![10.](images/HypTestMultReg_9.gif) |
(9) |
> |
Ahatbeta[1]:=SXYA/SXXA; |
![.6000000000](images/HypTestMultReg_10.gif) |
(10) |
> |
Ahatbeta[0]:=Abar-Ahatbeta[1]*Tbar; |
![9.340000000](images/HypTestMultReg_11.gif) |
(11) |
> |
SXYB:=add((times[i]-Tbar)*(B[i]-Bbar),i=1..5); |
![10.10000000](images/HypTestMultReg_12.gif) |
(12) |
![10.](images/HypTestMultReg_13.gif) |
(13) |
> |
Bhatbeta[1]:=SXYB/SXXB; |
![1.010000000](images/HypTestMultReg_14.gif) |
(14) |
> |
Bhatbeta[0]:=Bbar-Bhatbeta[1]*Tbar; |
![11.80000000](images/HypTestMultReg_15.gif) |
(15) |
> |
APoints:=[seq([times[i],A[i]],i=1..5)]; |
![[[-2, 8.0], [-1, 9.0], [0, 9.1], [1, 10.2], [2, 10.4]]](images/HypTestMultReg_16.gif) |
(16) |
> |
BPoints:=[seq([times[i],B[i]],i=1..5)]; |
![[[-2, 10.0], [-1, 10.3], [0, 12.2], [1, 12.6], [2, 13.9]]](images/HypTestMultReg_17.gif) |
(17) |
> |
AP:=plot(APoints,style=point,symbol=circle,color=blue): |
> |
BP:=plot(BPoints,style=point,symbol=circle,color=red): |
> |
AL:=plot(Ahatbeta[0]+Ahatbeta[1]*t,t=-3..3,color=blue): |
> |
BL:=plot(Bhatbeta[0]+Bhatbeta[1]*t,t=-3..3,color=red): |
Note: From this plot, it seems that the lines have different slopes, indicating different growth rates.
How can we test whether this is statistically significant though? Here is one method, where we
incorporate both data sets into a single set, but include a "category variable" indicating
which data set the point came from. Let
if the measurement is from type A, and
if the measurement is from type B. Then
will indicate the time, and Y will
be the number of bacteria. The following type of linear model will
allow us to fit the straight lines for both data types, and compare their slopes:
Y =
+
Why do we do it this way? Note: if
(type A), we get
On the other hand if
(type B), then we get
To compare the two slopes we just want to look at
=
. We know
how to set up a test of
:
versus
:
based on our general
discussions(!) Now we go to the matrix formulation for the multiple regression
model. We will set up X putting the data points for type A first, then type B,
but any order would give equivalent results:
> |
X:=Matrix([[1,0,-2,0],[1,0,-1,0],[1,0,0,0],[1,0,1,0],[1,0,2,0],[1,1,-2,-2],[1,1,-1,-1],[1,1,0,0],[1,1,1,1],[1,1,2,2]]): |
> |
Y:=Matrix([[8.0],[9.0],[9.1],[10.2],[10.4],[10.0],[10.3],[12.2],[12.6],[13.9]]): |
The normal equations:
> |
XtX:=Multiply(Transpose(X),X); |
![Matrix(%id = 164463160)](images/HypTestMultReg_34.gif) |
(18) |
> |
XtY:=Multiply(Transpose(X),Y); |
![Matrix(%id = 164463288)](images/HypTestMultReg_35.gif) |
(19) |
> |
beta:=LinearSolve(XtX,XtY); |
![Matrix(%id = 164464056)](images/HypTestMultReg_36.gif) |
(20) |
Note that, contained here are the values for the two slopes we saw before(!)
The estimate for
Note this is (0,0,0,1)
, so we take
in our general formulas:
> |
a:=Matrix([[0],[0],[0],[1]]): |
> |
S2:=1/(10-(3+1))*(Multiply(Transpose(Y),Y)[1,1] - Multiply(Multiply(Transpose(beta),Transpose(X)),Y)[1,1]); |
![.1218334](images/HypTestMultReg_40.gif) |
(21) |
> |
t:=beta[4,1]/sqrt(S2*Multiply(Transpose(a),LinearSolve(XtX,a))[1,1]); |
![2.626550025](images/HypTestMultReg_41.gif) |
(22) |
With 10 - (3+1) = 6 d.f. this gives a p - value less than 0.05:
> |
T6:=RandomVariable(StudentT(6)); |
![_R](images/HypTestMultReg_42.gif) |
(23) |
![0.39240448e-1](images/HypTestMultReg_43.gif) |
(24) |
So at
= .05 level, for instance, this would be sufficient evidence to reject
and conclude
. As above, this indicates that the rates of growth
of the two strains of bacteria are different.