In my last blog we created two variables and used the `lm()`

command to perform a least squares regression on them, treating one of them as the dependent variable and the other as the independent variable. Here they are again.

`height = c(176, 154, 138, 196, 132, 176, 181, 169, 150, 175)`

bodymass = c(82, 49, 53, 112, 47, 69, 77, 71, 62, 78)

Today we learn how to obtain useful diagnostic information about a regression model and then how to draw residuals on a plot. As before, we perform the regression.

`lm(height ~ bodymass)`

Now let’s find out more about the regression. First, let’s store the regression model as an object called `mod`

and then use the `summary()`

command to learn about the regression.

`mod <- lm(height ~ bodymass)`

`summary(mod)`

Here is what R gives you.

R has given you a great deal of diagnostic information about the regression. The most useful of this information are the coefficients themselves, the Adjusted *R*-squared, the *F*-statistic and the *p*-value for the model.

Now let’s use R’s `predict()`

command to create a vector of fitted values.

`regmodel <- predict(lm(height ~ bodymass))`

`regmodel`

Here are the fitted values:

Now let’s plot the data and regression line again.

`plot(bodymass, height, pch = 16, cex = 1.3, col = "blue", main = "HEIGHT PLOTTED AGAINST BODY MASS", xlab = "BODY MASS (kg)", ylab = "HEIGHT (cm)")`

abline(lm(height ~ bodymass))

We can plot the residuals using R’s for loop and a subscript `k`

that runs from 1 to the number of data points. We know that there are 10 data points, but if we do not know the number of data we can find it using the `length()`

command on either the height or body mass variable.

`npoints <- length(height)`

`npoints`

Now let’s implement the loop and draw the residuals (the differences between the observed data and the corresponding fitted values) using the `lines()`

command. Note the syntax we use to draw in the residuals.

`for (k in 1: npoints) lines(c(bodymass[k], bodymass[k]), c(height[k], regmodel[k]))`

Here is our plot, including the residuals.

None of this was so difficult!

Next time we will look at more advanced aspects of regression models and see what R has to offer. See you then!

David

#### Annex: R codes used

[code lang="r"]

# Create two variables.

height = c(176, 154, 138, 196, 132, 176, 181, 169, 150, 175)

bodymass = c(82, 49, 53, 112, 47, 69, 77, 71, 62, 78)

# Estimate the regression model.

lm(height ~ bodymass)

# Store the regression model as an object.

mod <- lm(height ~ bodymass)

summary(mod)

# Create a vector of fitted values.

regmodel <- predict(lm(height ~ bodymass))

regmodel

# Plot the data and regression line.

plot(bodymass, height, pch = 16, cex = 1.3, col = "blue", main = "HEIGHT PLOTTED AGAINST BODY MASS", xlab = "BODY MASS (kg)", ylab = "HEIGHT (cm)")

abline(lm(height ~ bodymass))

# Find the number of data.

npoints <- length(height)

npoints

# Draw in the residuals.

for (k in 1: npoints) lines(c(bodymass[k], bodymass[k]), c(height[k], regmodel[k]))

[/code]