In Part 15, we saw how to create a simple Generalised Linear Model on binary data using the `glm()`

command. We continue with the same glm (modelling the vs variable on the weight and engine displacement) on the same data set – the `mtcars`

dataset – which is built into R.

```
model <- glm(formula = vs ~ wt + disp, data = mtcars, family = binomial)
summary(model)
```

We see that `weight`

influences `vs`

positively, while `displacement`

has a slightly negative effect. We also see that the coefficient of `weight`

is non-significant (*p*-value < 0.05) while the coefficient of `displacement`

is significant. Later we will see how to investigate ways of improving our model.

However, the estimates (coefficients of the predictors `weight`

and `displacement`

) are now in units called logits. We will define the logit in a later blog.

We see the word Deviance twice over in the model output. Deviance is a measure of goodness of fit. R reports two forms of deviance – the null deviance and the residual deviance. The null deviance shows how well the response variable is predicted by a model that includes only the intercept (grand mean). For our example, we have a Chi square value of 43.9 on 31 degrees of freedom. This value indicates little fit (a significant difference between fitted values and observed values). Including the independent variables (`disp`

and `weight`

) decreased the deviance to 21.4 points on 29 degrees of freedom, so this Chi square value indicates a significant reduction in deviance. The Residual Deviance has reduced by 21.46 with a loss of three degrees of freedom. We have a very small (and therefore highly significant) *p*-value for the model.

`1 - pchisq(21.4, 3)`

Let us apply Hosmer and Lemeshow goodness of fit (GOF) test.

`library(ResourceSelection)`

hoslem.test(mtcars$vs, fitted(model))

Unfortunately, we must reject the null hypothesis (i.e. our model) because the fitted values are significantly different from the data.

`plot(sort(model$fitted), pch = 16, cex.lab = 1.3, ylim = c(0, 1))`

points(sort(mtcars$vs), pch = 19, col = "red", cex = 1.3)

Let’s try another plot, this time joining the fitted values with straight lines.

`plot(sort(mtcars$vs), pch = 19, col = "red", cex = 1.3, ylab = "Proportion")`

points(sort(model$fitted), ty = "o", pch = 16, cex.lab = 1.3, ylim = c(0, 1))

The `glm()`

has produced a kind of density that relates the observations for `vs = 0`

to those for `vs = 1`

.

We may wish to plot for each predictor separately, so first we fit a glm for `weight`

.

```
model_weight <- glm(formula = vs ~ wt, data = mtcars, family = binomial)
summary(model_weight)
```

`range(mtcars$wt)`

```
xweight <- seq(0, 6, 0.01)
yweight <- predict(model_weight, list(wt = xweight),type="response")
plot(mtcars$wt, mtcars$vs, pch = 16, xlab = "WEIGHT (g)", ylab = "VS")
lines(xweight, yweight)
```

```
model_disp <- glm(formula = vs ~ disp, data = mtcars, family = binomial)
summary(model_disp)
```

`range(mtcars$disp)`

```
xdisp <- seq(0, 500, 0.01)
ydisp <- predict(model_disp, list(disp = xdisp),type="response")
plot(mtcars$disp, mtcars$vs, pch = 16, xlab = "DISPLACEMENT (cubic inches)", ylab = "VS")
lines(xdisp, ydisp)
```

That wasn’t so hard! In Blog 17 I will explain more about the output we got from the `glm()`

function.

See you later!

David

#### Annex: R codes used

[code lang="r"]

# Estimate Generalised Linear Model and display results.

model <- glm(formula = vs ~ wt + disp, data = mtcars, family = binomial)

summary(model)

# Calculate Chi-square statistic.

1 - pchisq(21.4, 3)

# Load ResourceSelection package.

library(ResourceSelection)

# Apply Hosmer and Lemeshow goodness of fit (GOF) test.

hoslem.test(mtcars$vs, fitted(model))

# Graph estimated model.

plot(sort(model$fitted), pch = 16, cex.lab = 1.3, ylim = c(0, 1))

points(sort(mtcars$vs), pch = 19, col = "red", cex = 1.3)

# Alternative graph of the estimated model.

plot(sort(mtcars$vs), pch = 19, col = "red", cex = 1.3, ylab = "Proportion")

points(sort(model$fitted), ty = "o", pch = 16, cex.lab = 1.3, ylim = c(0, 1))

# Estimate Generalised Linear Model and display results.

model_weight <- glm(formula = vs ~ wt, data = mtcars, family = binomial)

summary(model_weight)

# Calculate range of values.

range(mtcars$wt)

# Graph estimated model.

xweight <- seq(0, 6, 0.01)

yweight <- predict(model_weight, list(wt = xweight), type = "response")

plot(mtcars$wt, mtcars$vs, pch = 16, xlab = "WEIGHT (g)", ylab = "VS")

lines(xweight, yweight)

# Estimate Generalised Linear Model and display results.

model_disp <- glm(formula = vs ~ disp, data = mtcars, family = binomial)

summary(model_disp)

# Graph estimated model.

xdisp <- seq(0, 500, 0.01)

ydisp <- predict(model_disp, list(disp = xdisp), type = "response")

plot(mtcars$disp, mtcars$vs, pch = 16, xlab = "DISPLACEMENT (cubic inches)", ylab = "VS")

lines(xdisp, ydisp)

[/code]