>
 

MONT 106N -- Identifying Patterns Seminar 

Regression for exploratory data analysis 

November 9, 2009 

 

We want to work through an example illustrating one way 

that regression is often used to try to identify a functional 

relation between  x  and  y.   

 

> restart; -1
 

How the data was generated 

> with(Statistics); -1
 

> ScatterPlot(X, Y)
 

Plot_2d
 

> `assign`(SP, ScatterPlot(X, Y)); -1
 

> `assign`(RLine, Fit(`+`(a, `*`(b, `*`(x))), X, Y, x)); 1
 

`+`(4.57581962066867121, `-`(`*`(1.59386005694449850, `*`(x)))) (1)
 

> `assign`(LP, plot(RLine, x = 0 .. 2.5)); -1
 

> with(plots); -1
 

> display(LP, SP); 1
 

Plot_2d
 

> `assign`(residuals1, `<,>`(seq(`+`(Y[i], `-`(subs(x = X[i], RLine))), i = 1 .. 200))); -1
 

> ScatterPlot(X, residuals1); 1
 

Plot_2d
 

 

The residuals indicate that there is not a very good fit with a linear  

model relation  y = `+`(mx, `*`(Typesetting:-delayDotProduct(b, Any), `*`(case, `*`(like, `*`(this, `*`(where, `*`(the))))))) 

residuals look like they are tending to be positive, then negative,  

then positive again on different ranges of  x-values  is a tip-off that   

y  probably does not depend linearly on  y. 

 

Let's see if things look different for some different functional forms.   

What about  a  "power law"  relation:    

had an exact relation of this form and we took logarithms of both 

sides, then we would have   so the  

points (ln(x), ln(y))  would lie on a straight line with slope a 

and intercept   

 

> `assign`(r1, Correlation(X, Y)); 1
 

-.9871097488 (2)
 

> `assign`(lnX, `<,>`(seq(ln(X[i]), i = 1 .. 200))); -1
 

> `assign`(lnY, `<,>`(seq(ln(Y[i]), i = 1 .. 200))); -1
 

> `assign`(SP2, ScatterPlot(lnX, lnY)); -1
 

> `assign`(RL2, Fit(`+`(a, `*`(b, `*`(x))), lnX, lnY, x)); 1
 

`+`(.935774292106958505, `-`(`*`(.445060827560002370, `*`(x)))) (3)
 

> `assign`(LP2, plot(RL2, x = -2 .. 1)); -1
 

> display(LP2, SP2); 1
 

Plot_2d
 

> `assign`(r2, Correlation(lnX, lnY)); 1
 

-.9425827575 (4)
 

This residual plot also shows a pattern along the same lines as the previous one  

(most residuals negative to the left, then positive in the middle, and 

negative again to the right). 

 

Hence this model relation  y = `^`(cx, a)is probably not that good either.   Next, let's try 

a relation of the form  Taking logarithms again gives   ln(y) = `+`(kx, ln(c)) 

> `assign`(SP3, ScatterPlot(X, lnY)); -1
 

> `assign`(RL3, Fit(`+`(a, `*`(b, `*`(x))), X, lnY, x)); 1
 

`+`(1.61819993088653247, `-`(`*`(.578001613441206041, `*`(x)))) (5)
 

> `assign`(LP3, plot(RL3, x = 0 .. 2.5)); -1
 

> display(SP3, LP3); 1
 

Plot_2d
 

> `assign`(r3, Correlation(X, lnY)); 1
 

-.9983749641 (6)
 

 

The following is more like what we want to see for the residuals -- a  

cloud around the horizontal axis! 

 

> `assign`(residuals3, `<,>`(seq(`+`(lnY[i], `-`(subs(x = X[i], RL3))), i = 1 .. 200))); -1
 

> ScatterPlot(X, residuals3); 1
 

Plot_2d
 

 

Finally, we plot the best fitting model relation with the original data 

 

> `assign`(MP, plot(`*`(exp(1.618), `*`(exp(`+`(`-`(`*`(.578, `*`(x))))))), x = 0 .. 2.5)); -1
 

> display(MP, SP); 1
 

Plot_2d
 

>