MONT 106N -- Identifying Patterns Seminar
Regression for exploratory data analysis
November 9, 2009
We want to work through an example illustrating one way
that regression is often used to try to identify a functional
relation between x and y.
> |
![restart; -1](images/DataAnalysis_1.gif) |
How the data was generated
> |
![with(Statistics); -1](images/DataAnalysis_5.gif) |
> |
![ScatterPlot(X, Y)](images/DataAnalysis_6.gif) |
> |
![`assign`(SP, ScatterPlot(X, Y)); -1](images/DataAnalysis_8.gif) |
> |
![`assign`(RLine, Fit(`+`(a, `*`(b, `*`(x))), X, Y, x)); 1](images/DataAnalysis_9.gif) |
![`+`(4.57581962066867121, `-`(`*`(1.59386005694449850, `*`(x))))](images/DataAnalysis_10.gif) |
(1) |
> |
![`assign`(LP, plot(RLine, x = 0 .. 2.5)); -1](images/DataAnalysis_11.gif) |
> |
![with(plots); -1](images/DataAnalysis_12.gif) |
> |
![display(LP, SP); 1](images/DataAnalysis_13.gif) |
> |
![`assign`(residuals1, `<,>`(seq(`+`(Y[i], `-`(subs(x = X[i], RLine))), i = 1 .. 200))); -1](images/DataAnalysis_15.gif) |
> |
![ScatterPlot(X, residuals1); 1](images/DataAnalysis_16.gif) |
The residuals indicate that there is not a very good fit with a linear
model relation
residuals look like they are tending to be positive, then negative,
then positive again on different ranges of x-values is a tip-off that
y probably does not depend linearly on y.
Let's see if things look different for some different functional forms.
What about a "power law" relation:
had an exact relation of this form and we took logarithms of both
sides, then we would have
so the
points (
) would lie on a straight line with slope
and intercept
> |
![`assign`(r1, Correlation(X, Y)); 1](images/DataAnalysis_24.gif) |
![-.9871097488](images/DataAnalysis_25.gif) |
(2) |
> |
![`assign`(lnX, `<,>`(seq(ln(X[i]), i = 1 .. 200))); -1](images/DataAnalysis_26.gif) |
> |
![`assign`(lnY, `<,>`(seq(ln(Y[i]), i = 1 .. 200))); -1](images/DataAnalysis_27.gif) |
> |
![`assign`(SP2, ScatterPlot(lnX, lnY)); -1](images/DataAnalysis_28.gif) |
> |
![`assign`(RL2, Fit(`+`(a, `*`(b, `*`(x))), lnX, lnY, x)); 1](images/DataAnalysis_29.gif) |
![`+`(.935774292106958505, `-`(`*`(.445060827560002370, `*`(x))))](images/DataAnalysis_30.gif) |
(3) |
> |
![`assign`(LP2, plot(RL2, x = -2 .. 1)); -1](images/DataAnalysis_31.gif) |
> |
![display(LP2, SP2); 1](images/DataAnalysis_32.gif) |
> |
![`assign`(r2, Correlation(lnX, lnY)); 1](images/DataAnalysis_34.gif) |
![-.9425827575](images/DataAnalysis_35.gif) |
(4) |
This residual plot also shows a pattern along the same lines as the previous one
(most residuals negative to the left, then positive in the middle, and
negative again to the right).
Hence this model relation
is probably not that good either. Next, let's try
a relation of the form
Taking logarithms again gives
> |
![`assign`(SP3, ScatterPlot(X, lnY)); -1](images/DataAnalysis_39.gif) |
> |
![`assign`(RL3, Fit(`+`(a, `*`(b, `*`(x))), X, lnY, x)); 1](images/DataAnalysis_40.gif) |
![`+`(1.61819993088653247, `-`(`*`(.578001613441206041, `*`(x))))](images/DataAnalysis_41.gif) |
(5) |
> |
![`assign`(LP3, plot(RL3, x = 0 .. 2.5)); -1](images/DataAnalysis_42.gif) |
> |
![display(SP3, LP3); 1](images/DataAnalysis_43.gif) |
> |
![`assign`(r3, Correlation(X, lnY)); 1](images/DataAnalysis_45.gif) |
![-.9983749641](images/DataAnalysis_46.gif) |
(6) |
The following is more like what we want to see for the residuals -- a
cloud around the horizontal axis!
> |
![`assign`(residuals3, `<,>`(seq(`+`(lnY[i], `-`(subs(x = X[i], RL3))), i = 1 .. 200))); -1](images/DataAnalysis_47.gif) |
> |
![ScatterPlot(X, residuals3); 1](images/DataAnalysis_48.gif) |
Finally, we plot the best fitting model relation with the original data
> |
![`assign`(MP, plot(`*`(exp(1.618), `*`(exp(`+`(`-`(`*`(.578, `*`(x))))))), x = 0 .. 2.5)); -1](images/DataAnalysis_50.gif) |
> |
![display(MP, SP); 1](images/DataAnalysis_51.gif) |