In Part 9, let’s look at sub-setting in R. Let’s provide summary tables on the following data set of tourists from different countries, the numbers of their children, and the amount of money they spent while on vacation. Copy and paste the following array into R.
A <- structure(list(NATION = structure(c(3L, 3L, 3L, 3L, 1L, 3L, 2L, 3L, 1L, 3L, 3L, 1L, 2L, 2L, 3L, 3L, 3L, 2L, 3L, 1L, 1L, 3L, 1L, 2L), .Label = c("CHINA", "GERMANY", "FRANCE"), class = "factor"),GENDER = structure(c(2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L), .Label = c("F", "M"), class = "factor"), CHILDREN = c(2L, 1L, 3L, 2L, 2L, 3L, 1L, 0L, 1L, 0L, 1L, 2L, 2L, 1L, 1L, 1L, 0L, 2L, 1L, 2L, 4L, 2L, 5L, 1L), SPEND = c(8500L, 23000L, 4000L, 9800L, 2200L, 4800L, 12300L, 8000L, 7100L, 10000L, 7800L, 7100L, 7900L, 7000L, 14200L, 11000L, 7900L, 2300L, 7000L, 8800L, 7500L, 15300L, 8000L, 7900L)), .Names = c("NATION", "GENDER", "CHILDREN", "MONEY"), class = "data.frame", row.names = c(NA, -24L))
A
The generic form of the syntax we will use is as follows:
Z <- A[ A[ , colnum ] == val, ]
Note that we have two sets of square brackets and a comma just before the second closing bracket. Z
gives all rows for which an indicator in column colnum
has the value val
. We can say it like this: “Z
is the set of rows of A
such that the elements of column colnum
have the value val
”. OK. Let’s subset for females.
FE <- A[ A[, 2] == "F", ]
FE
However, easier is the following syntax, using the subset()
function:
subset(A, GENDER == "F")
Now isolate all rows for which the third column (number of children) is less than 2.
C1 <- A[ A[, 3] < 2, ]
C1
However, easier is the following syntax, using the subset()
function:
C1 <- subset(A, CHILDREN < 2)
C1
Finally, we isolate all rows for Females with less than two children.
F1 <- A[ A[, 2] == "F" & A[, 3] < 2, ]
F1
Again, easier is the following syntax, using the subset()
function:
F1 <- subset(A, GENDER == "F" & CHILDREN < 2)
F1
That wasn’t so hard! In blog 10 we will look at further analytic techniques in R.
See you soon!
David
Annex: R codes used
[code lang=”r”]
# Create and display the following array.
A <- structure(list(NATION = structure(c(3L, 3L, 3L, 3L, 1L, 3L, 2L, 3L, 1L, 3L, 3L, 1L, 2L, 2L, 3L, 3L, 3L, 2L, 3L, 1L, 1L, 3L, 1L, 2L), .Label = c("CHINA", "GERMANY", "FRANCE"), class = "factor"),GENDER = structure(c(2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L), .Label = c("F", "M"), class = "factor"), CHILDREN = c(2L, 1L, 3L, 2L, 2L, 3L, 1L, 0L, 1L, 0L, 1L, 2L, 2L, 1L, 1L, 1L, 0L, 2L, 1L, 2L, 4L, 2L, 5L, 1L), SPEND = c(8500L, 23000L, 4000L, 9800L, 2200L, 4800L, 12300L, 8000L, 7100L, 10000L, 7800L, 7100L, 7900L, 7000L, 14200L, 11000L, 7900L, 2300L, 7000L, 8800L, 7500L, 15300L, 8000L, 7900L)), .Names = c("NATION", "GENDER", "CHILDREN", "MONEY"), class = "data.frame", row.names = c(NA, -24L))
A
# The generic form of the syntax to be used.
Z <- A[ A[ , colnum ] == val, ]
# Subset and display the array for females.
FE <- A[ A[, 2] == "F", ]
FE
# Alternatively, the same result could be achieved but using the subset() command.
subset(A, GENDER == "F")
# Isolate and display all rows for which the third column (number of children) is less than 2.
C1 <- A[ A[, 3] < 2, ]
C1
# Alternatively, the same result could be achieved but using the subset() command.
C1 <- subset(A, CHILDREN < 2)
C1
# Isolate and display all rows for Females with less than two children.
F1 <- A[ A[, 2] == "F" & A[, 3] < 2, ]
F1
# Alternatively, the same result could be achieved but using the subset() command.
F1 <- subset(A, GENDER == "F" & CHILDREN < 2)
F1
[/code]