MATH 376 -- Mathematical Statistics Confidence and Prediction Bands in Simple Linear Regression April 30, 2012 # The following data set is supposed to represent peak loads on an electric power plant measured on a random sample of days of different maximum temperature (in degrees F) # > temp <- c(95,82,90,81,99,100,93,95,93,87) > load <- c(214,152,156,129,254,266,210,204,213,150) # We fit a simple regression model: # > model <- lm(load~temp) > summary(model) Call: lm(formula = load ~ temp) Residuals: Min 1Q Median 3Q Max -28.724 -11.811 4.929 8.645 21.016 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -419.8491 76.0578 -5.52 0.00056 *** temp 6.7175 0.8294 8.10 3.99e-05 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 16.18 on 8 degrees of freedom Multiple R-squared: 0.8913, Adjusted R-squared: 0.8777 F-statistic: 65.6 on 1 and 8 DF, p-value: 3.994e-05 # The goal of fitting this kind of model is usually to allow us to draw conclusions about the response (y = peak load) for other values of x = temperature. There are two things one can do here -- "confidence" and "prediction" intervals. # For instance, say we wanted to understand possible values of the peak load for a day with max temperature 96: # > predict(model,newdata=data.frame(temp=96),interval="conf"); fit lwr upr 1 225.0286 210.426 239.6313 > predict(model,newdata=data.frame(temp=96),interval="pred"); fit lwr upr 1 225.0286 184.9668 265.0905 # # We can also construct confidence and prediction "bands" about the regression line: # These are computed by the following R commands: # > temp.frame <- data.frame(temp=seq(80,110,1)) > pc <- predict(model,newdata=temp.frame,interval="conf") > pp <- predict(model,newdata=temp.frame,interval="pred") # Next, we generate a plot showing all of the things computed: # > plot(temp,load,xlim=c(85,110),ylim=c(0,350)) > ablines(model) > matlines(temp.frame$temp,pc,lty=c(1,3,3),col="black") > matlines(temp.frame$temp,pp,lty=c(1,2,2),col="black")