In my last blog we created two variables and used the `lm()`

command to perform a least squares regression on them, treating one of them as the dependent variable and the other as the independent variable. Here they are again.

`height = c(176, 154, 138, 196, 132, 176, 181, 169, 150, 175)`

bodymass = c(82, 49, 53, 112, 47, 69, 77, 71, 62, 78)

Today we learn how to obtain useful diagnostic information about a regression model and then how to draw residuals on a plot. As before, we perform the regression.

`lm(height ~ bodymass)`

Now let’s find out more about the regression. First, let’s store the regression model as an object called `mod`

and then use the `summary()`

command to learn about the regression.

`mod <- lm(height ~ bodymass)`

`summary(mod)`

Here is what R gives you.

R has given you a great deal of diagnostic information about the regression. The most useful of this information are the coefficients themselves, the Adjusted *R*-squared, the *F*-statistic and the *p*-value for the model.

Now let’s use R’s `predict()`

command to create a vector of fitted values.

`regmodel <- predict(lm(height ~ bodymass))`

`regmodel`

Here are the fitted values:

Now let’s plot the data and regression line again.

`plot(bodymass, height, pch = 16, cex = 1.3, col = "blue", main = "HEIGHT PLOTTED AGAINST BODY MASS", xlab = "BODY MASS (kg)", ylab = "HEIGHT (cm)")`

abline(lm(height ~ bodymass))

We can plot the residuals using R’s for loop and a subscript `k`

that runs from 1 to the number of data points. We know that there are 10 data points, but if we do not know the number of data we can find it using the `length()`

command on either the height or body mass variable.

`npoints <- length(height)`

`npoints`

Now let’s implement the loop and draw the residuals (the differences between the observed data and the corresponding fitted values) using the `lines()`

command. Note the syntax we use to draw in the residuals.

`for (k in 1: npoints) lines(c(bodymass[k], bodymass[k]), c(height[k], regmodel[k]))`

Here is our plot, including the residuals.

None of this was so difficult!

Next time we will look at more advanced aspects of regression models and see what R has to offer. See you then!

David

#### Annex: R codes used

# Create two variables. height = c(176, 154, 138, 196, 132, 176, 181, 169, 150, 175) bodymass = c(82, 49, 53, 112, 47, 69, 77, 71, 62, 78) # Estimate the regression model. lm(height ~ bodymass) # Store the regression model as an object. mod <- lm(height ~ bodymass) summary(mod) # Create a vector of fitted values. regmodel <- predict(lm(height ~ bodymass)) regmodel # Plot the data and regression line. plot(bodymass, height, pch = 16, cex = 1.3, col = "blue", main = "HEIGHT PLOTTED AGAINST BODY MASS", xlab = "BODY MASS (kg)", ylab = "HEIGHT (cm)") abline(lm(height ~ bodymass)) # Find the number of data. npoints <- length(height) npoints # Draw in the residuals. for (k in 1: npoints) lines(c(bodymass[k], bodymass[k]), c(height[k], regmodel[k]))

Senior Academic Manager in *New Zealand Institute of Sport* and Director of *Sigma Statistics and Research Ltd*. Author of the book: *R Graph Essentials*.