## Exercises - Factors and Tables

1. Heights and eye-color for 7 people are stored in two vectors, as shown below. Find the mean height for each eye-color.

height = c(5.8,5.7,5.7,5.9,6.2,4.9,5.2)
eye.color = c("brown","blue","brown","brown","brown","green","brown")


> height = c(5.8,5.7,5.7,5.9,6.2,4.9,5.2)
> eye.color = c("brown","blue","brown","brown","brown","green","brown")
> tapply(height,factor(eye.color),mean)
blue brown green
5.70  5.76  4.90

Notably, you can actually omit the call to factor() above, and just pass eye.color as the second argument to the tapply() function. This is due to the fact that tapply() automatically converts (as it is able) its second argument to a factor.

2. Instructors $A$ and $B$ collect grades and genders for 10 students each, storing them in in the following vectors:

A_grades = c("A","B","B","D","A","A","C","A","B","B")
A_genders = c(1,1,1,0,1,0,0,0,1,1)  # here, 1 represents a male and 0 a female
B_grades = c(97,93,92,57,75,90,72,88,82,60)
B_genders = c("M","F","F","M","M","M","F","F","F","M")

1. Convert A_grades into a factor a.grade.fac with ordered levels $A \gt B \gt C \gt D \gt F$.

2. Convert B_grades into a factor b.grade.fac with ordered levels $A \gt B \gt C \gt D \gt F$.

Assume letter grades are associated with the following ranges:

$$\begin{array}{cc} A & 90-100\\ B & 80-89\\ C & 70-79\\ D & 60-69\\ F & 0-59 \end{array}$$
3. Convert A_genders into a factor a.gender.fac with levels M and F.

4. Convert B_genders into a factor b.gender.fac with levels M and F.

5. Combine the two grade factors into a single factor grade.fac

6. Combine the two gender factors into a single factor gender.fac

7. Make a table showing how many earned each possible grade by gender, with marginal totals

> A_grades = c("A","B","B","D","A","A","C","A","B","B")
> A_genders = c(1,1,1,0,1,0,0,0,1,1)
> B_grades = c(97,93,92,57,75,90,72,88,82,60)
> B_genders = c("M","F","F","M","M","M","F","F","F","M")

# (a)
> a.grades.fac = factor(A_grades,ordered=TRUE,levels=c("F","D","C","B","A"))
> a.grades.fac
[1] A B B D A A C A B B
Levels: F < D < C < B < A

# (b)
> b.grades.fac = cut(B_grades,breaks=c(-0.5,59.5,69.5,79.5,89.5,100.5),
labels=c("F","D","C","B","A"), ordered_result=TRUE)
> b.grades.fac
[1] A A A F C A C B B D
Levels: F < D < C < B < A

# (c)
> a.genders.fac = factor(A_genders)
> a.genders.fac
[1] 1 1 1 0 1 0 0 0 1 1
Levels: 0 1
> levels(a.genders.fac) = c("F","M")
> a.genders.fac
[1] M M M F M F F F M M
Levels: F M

# (d)
> b.genders.fac = factor(B_genders)
> b.genders.fac
[1] M F F M M M F F F M
Levels: F M

# (e)
> grade.fac = factor(c(as.character(a.grades.fac),
as.character(b.grades.fac)),ordered = TRUE)
> grade.fac
[1] A B B D A A C A B B A A A F C A C B B D
Levels: A < B < C < D < F

# (f)
> gender.fac = factor(c(as.character(a.genders.fac),
as.character(b.genders.fac)))
> gender.fac
[1] M M M F M F F F M M M F F M M M F F F M
Levels: F M

# (g)
> t = table(gender.fac,grade.fac)
> addmargins(t)
grade.fac
gender.fac  A  B  C  D  F Sum
F    4  2  2  1  0   9
M    4  4  1  1  1  11
Sum  8  6  3  2  1  20

3. The following vectors correspond to: 1) the alphabet; and 2) the letters in a famous quote by Ray Bradbury.

alphabet = c("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p",
"q","r","s","t","u","v","w","x","y","z")

quote = c("l", "i", "f", "e", "i", "s", "t", "r", "y", "i", "n", "g", "t", "h",
"i", "n", "g", "s", "t", "o", "s", "e", "e", "i", "f", "t", "h", "e",
"y", "w", "o", "r", "k")

Construct a table in R that gives the frequency of occurrence (as a percentage) for each letter in the alphabet in Ray Bradbury's quote

> quote.fac = factor(quote,levels=alphabet)
> t = table(quote.fac)
> round(t/sum(t),digits=2)
quote.fac
a    b    c    d    e    f    g    h    i    j    k    l
0.00 0.00 0.00 0.00 0.12 0.06 0.06 0.06 0.15 0.00 0.03 0.03
m    n    o    p    q    r    s    t    u    v    w    x
0.00 0.06 0.06 0.00 0.00 0.06 0.09 0.12 0.00 0.00 0.03 0.00
y    z
0.06 0.00

4. The responses for a survey question are broken down by gender, and the results are shown below. Build this table in R in such a way that the last column is computed by R (instead of you). $$\begin{array}{l|c|c|c|c|} & \textrm{agree} & \textrm{no opinion} & \textrm{disagree} & \textrm{Sum}\\\hline \textrm{males} & 75 & 10 & 85 & 170\\\hline \textrm{females} & 121 & 8 & 51 & 180\\\hline \end{array}$$
> t = as.table(matrix(c(75,10,85,121,8,51),ncol=3,byrow=TRUE))
> colnames(t) = c("agree","no opinion","disagree")
> rownames(t) = c("males","females")
> t.with.margins = addmargins(t)
> t.with.margins.and.last.row.removed = t.with.margins[1:2,]
> t.with.margins.and.last.row.removed