Quick start with R: Symbol sizes in qplot (Part 24)

In Blog 24, let’s see how to use qplot to map symbol colour to a categorical variable. Copy in the following dataset (a medical dataset relating to patients in a randomised controlled trial):
M <- structure(list(PATIENT = structure(c(32L, 15L, 41L, 42L, 44L, 17L, 31L, 10L, 38L, 18L, 22L, 30L), .Label = c("Adrienne", "Alan", "Andy", "Ann ", "Anne ", "Anton", "Audrey", "Ben", "Bernie", "Beth", "Bob", "Bobby", "Bruce", "Charles", "Dave", "Dianne", "Frida", "Guy", "Henry", "Hugh", "Ian", "Irina", "James", "Jim", "Jo ", "John", "Jonah", "Joseph", "Lesley", "Liz", "Magnus", "Mary", "Max", "Merril", "Mike", "Mikhail", "Nick", "Peter", "Robert", "Robin", "Simon", "Steve", "Stuart", "Sue", "Telu"), class = "factor"), GENDER = structure(c(1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L), .Label = c("F", "M"), class = "factor"), TREATMENT = structure(c(1L, 2L, 3L, 1L, 1L, 2L, 1L, 3L, 1L, 3L, 2L, 3L), .Label = c("A", "B", "C"), class = "factor"), AGE = structure(c(3L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L), .Label = c("E", "M", "Y"), class = "factor"), WEIGHT_1 = c(79.2, 58.8, 72, 59.7, 79.6, 83.1, 68.7, 67.6, 79.1, 39.9, 64.7, 65.6), WEIGHT_2 = c(76.6, 59.3, 70.1, 57.3, 79.8, 82.3, 66.8, 67.4, 76.8, 41.4, 65.3, 63.2), HEIGHT = c(169L, 161L, 175L, 149L, 179L, 177L, 175L, 170L, 177L, 138L, 170L, 165L), SMOKE = structure(c(2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L), .Label = c("N", "Y"), class = "factor"), EXERCISE = c(TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE), RECOVER = c(1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L)), .Names = c("PATIENT", "GENDER", "TREATMENT", "AGE", "WEIGHT_1", "WEIGHT_2", "HEIGHT", "SMOKE", "EXERCISE", "RECOVER"), class = "data.frame", row.names = c(1L, 4L, 5L, 13L, 15L, 17L, 22L, 29L, 33L, 41L, 42L, 43L))
M

Now we create a scatterplot of patient height against weight before treatment, and we map both symbol size and shape to GENDER using factor(). Enter the following syntax:
qplot(HEIGHT, WEIGHT_1, data = M, xlab = "HEIGHT (cm)", ylab = "WEIGHT BEFORE TREATMENT (kg)" , size = factor(GENDER), color = factor(GENDER)) + scale_size_manual(values = c(5, 7))

Note how we mapped symbol size and colour to GENDER using the syntax:
size = factor(GENDER) and color = factor(GENDER))
Also note how we controlled symbol size using the layer:
+ scale_size_manual(values = c(5, 7))
In this example I have chosen symbol sizes of 5 and 7. You may select different sizes, depending on your preferences. Very quickly you will gain experience and select the symbol sizes that suit your graphs best. Of course you can experiment with the above syntax yourselves, each time changing the symbol size values. For example:
qplot(HEIGHT, WEIGHT_1, data = M, xlab = "HEIGHT (cm)", ylab = "WEIGHT BEFORE TREATMENT (kg)" , size = factor(GENDER), color = factor(GENDER)) + scale_size_manual(values = c(2, 9))

The difference in point sizes is now rather extreme, but you now see how to control symbol size. Soon we will learn how to control symbol colour too. See you later!
David

Annex: R codes used

[code lang="r"]
# Create and display the dataset.
M <- structure(list(PATIENT = structure(c(32L, 15L, 41L, 42L, 44L, 17L, 31L, 10L, 38L, 18L, 22L, 30L), .Label = c("Adrienne", "Alan", "Andy", "Ann ", "Anne ", "Anton", "Audrey", "Ben", "Bernie", "Beth", "Bob", "Bobby", "Bruce", "Charles", "Dave", "Dianne", "Frida", "Guy", "Henry", "Hugh", "Ian", "Irina", "James", "Jim", "Jo ", "John", "Jonah", "Joseph", "Lesley", "Liz", "Magnus", "Mary", "Max", "Merril", "Mike", "Mikhail", "Nick", "Peter", "Robert", "Robin", "Simon", "Steve", "Stuart", "Sue", "Telu"), class = "factor"), GENDER = structure(c(1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L), .Label = c("F", "M"), class = "factor"), TREATMENT = structure(c(1L, 2L, 3L, 1L, 1L, 2L, 1L, 3L, 1L, 3L, 2L, 3L), .Label = c("A", "B", "C"), class = "factor"), AGE = structure(c(3L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L), .Label = c("E", "M", "Y"), class = "factor"), WEIGHT_1 = c(79.2, 58.8, 72, 59.7, 79.6, 83.1, 68.7, 67.6, 79.1, 39.9, 64.7, 65.6), WEIGHT_2 = c(76.6, 59.3, 70.1, 57.3, 79.8, 82.3, 66.8, 67.4, 76.8, 41.4, 65.3, 63.2), HEIGHT = c(169L, 161L, 175L, 149L, 179L, 177L, 175L, 170L, 177L, 138L, 170L, 165L), SMOKE = structure(c(2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L), .Label = c("N", "Y"), class = "factor"), EXERCISE = c(TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE), RECOVER = c(1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L)), .Names = c("PATIENT", "GENDER", "TREATMENT", "AGE", "WEIGHT_1", "WEIGHT_2", "HEIGHT", "SMOKE", "EXERCISE", "RECOVER"), class = "data.frame", row.names = c(1L, 4L, 5L, 13L, 15L, 17L, 22L, 29L, 33L, 41L, 42L, 43L))
M

# Create a scatterplot of patient height against weight before treatment.
qplot(HEIGHT, WEIGHT_1, data = M, xlab = "HEIGHT (cm)", ylab = "WEIGHT BEFORE TREATMENT (kg)" , size = factor(GENDER), color = factor(GENDER)) + scale_size_manual(values = c(5, 7))

# Change the symbol size values.
qplot(HEIGHT, WEIGHT_1, data = M, xlab = "HEIGHT (cm)", ylab = "WEIGHT BEFORE TREATMENT (kg)" , size = factor(GENDER), color = factor(GENDER)) + scale_size_manual(values = c(2, 9))
[/code]