R and Objects

A full discussion of object-oriented programming -- while admittedly one of the most powerful paradigms in modern computing -- is probably better suited to an environment where the primary focus is on learning about computer science, as opposed to statistics. As such, we will not attempt to describe working with objects in R in all its full glory. However, having at least a minimal understanding of objects can help us understand why we see some of the things we see in R.

R uses two different (competing) structures for objects, one referred to as S3 and the other as S4. The S3 structure is simpler than S4, and at the heart of many familiar R actions. In these notes, we will focus solely on the S3 structure.

Before discussing exactly what an object is, let us first talk about generic functions. You may have noticed that many of the functions you use in R do different things when presented with different types of input. For example, consider the summary() function:

> v = c(1,2,6,5,7,3,4,5)
> summary(v)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   2.750   4.500   4.125   5.250   7.000 

> t = as.table(matrix(c(51,43,22,92,28,21,68,22,9),ncol=3,byrow=TRUE))
> summary(t)
Number of cases in table: 356 
Number of factors: 2 
Test for independence of all factors:
    Chisq = 18.51, df = 4, p-value = 0.0009808

Here, we see summary() does one thing for vectors and something different for tables.

The print() function behaves in a similar way -- doing different things for different types of input:

> print(v)
[1] 1 2 6 5 7 3 4 5

> print(t)
   A  B  C
A 51 43 22
B 92 28 21
C 68 22  9

Functions in R like these, whose behavior depends on the types of their inputs, are called generic functions.

Now suppose that you wanted to write an R program to run simulations on a number of card games. Given that each card has a suit and rank, it would be nice if we could consolidate both pieces of information into something that could be stored in a single variable. Of course, R provides the list data type that serves that purpose well.

Assuming that the ranks of cards in order was given by {Ace, 2, 3, ..., 10, Jack, Queen, King} and the suits of cards in order was given by {Hearts, Clubs, Diamonds, Spades}, you might start with something like this:

> my.card = list(rank=11,suit=3)  # a possible way to represent the Jack of Diamonds

However, when we print my.card, the result is similar to what one would see when printing any other list:

> print(my.card)
$rank
[1] 11

$suit
[1] 3

Wouldn't it be better if -- when dealing with cards, anyways -- R could show us something more descriptive, like "Jack of Diamonds"?

If we register this list as being associated with a particular class of card objects which we will name as "card", then we can supply a custom implementation of print() that is just used for objects of the "card" class, as shown below:

> my.card = list(rank=11,suit=3)
> class(my.card) = "card"          # <-- this "registers" my.card as 
                                   #     an object of class "card"

Now all we need to do is provide a custom implementation for printing cards. As mentioned before, the print() function is an existing generic function. It knows to check the class name of whatever object it is given and then take an appropriate action based on the class it sees. The function print() takes this action by essentially dispatching its work to another function.

For example, if executing print(x) and x is a table, then the print() function would ask the function named print.table() to do the work. If x is a card, print() will instead look for a function named print.card() to do its work.

The print.table() function is one of the many functions already built into R, but the print.card() function we will need to provide:

print.card = function(c) {
  suit.names = c("hearts","clubs","diamonds","spades")
  rank.names = c("ace",paste(2:10),"jack","queen","king")
  cat(rank.names[c$rank],"of",suit.names[c$suit],"\n")
}

Now, look what happens when we try to print my.card:

> print(my.card)
jack of diamonds 

That looks better!

Additional card objects we might create will be printed in the same way -- just remember that each such creation involves populating a list with the information associated with the card and registering the list with the class "card"

> another.card = list(rank=1,suit=4)
> class(another.card) = "card"
> print(another.card)
ace of spades 

To simplify the creation of objects, a common approach is to write another function (called a constructor) to attend to these two tasks, such as the one below for cards:

card = function(r,s) {
  c = list(rank=r,suit=s)
  class(c) = "card"
  return(c)
}

# now we can use the constructor above 
# to create multiple card objects...

> c1 = card(1,4)
> print(c1)
ace of spades 
 
> c2 = card(2,1)
> print(c2)
2 of hearts 

Now would be a good time to note that the generic print() method is unique in that it gets called automatically when evaluating a variable by itself. To see this consider the following, which produces outputs identical to the last two outputs above:

> c1 = card(1,4)
> c1
ace of spades 

> c2 = card(2,1)
> c2
2 of hearts

Beyond just print(), there are other built-in generic functions for which one can provide custom implementations associated with different classes. We've already mentioned the summary() function. The plot() function too is generic.

Certainly though, the makers of R were not able to predict all of the function names that people would ever want to be generic. This suggests a natural question: "How does one make a function generic?"

Let us make things more concrete. Suppose in creating code to run simulations of a number of card games you not only create a "card" class, but you also create a "winnings" class. Perhaps objects associated with this latter class include the various amounts won from a variety of different games.

In some games, different cards have different "values". For example, suppose in the game Crazy Face, face cards are worth 10 points, while other cards are worth points equal to their rank. That said, there is a total "value" associated with all of one's winnings too.

Wouldn't it be great if we had a generic function value() that we could associate with both the card and winnings classes, so that value(x) would produce the appropriate output for the nature of the input it was given? In other words, we desire value() to be a generic function.

We can make value() generic with the following:

value = function(obj) {
  UseMethod("value")
}

After executing the above, we can supply the associated custom implementations. For example, we could add the following implementation for objects associated with the card class:

value.card = function(c) {
  return(ifelse(c$rank <= 10, c$rank, 10))
}

Here's an example of its use:

> drawn.card = card(13,2)
> drawn.card
king of clubs 

> value(drawn.card)   # note, a king is a face card
                      # and our value.card() function
                      # assigns face cards the value 10  
[1] 10

If one is curious about which classes are associated with a given generic function, one can use the methods() function.

As an example, the print() function, as you might expect, is associated with many other classes, as suggested by what follows:

> methods(print)
  [1] print.acf*                                          
  [2] print.anova*                                        
  [3] print.aov*                                          
  [4] print.aovlist*                                      
  [5] print.ar*                                           
  [6] print.Arima*                                        
  [7] print.arima0*                                       
  [8] print.AsIs                                          
  [9] print.aspell*                                       
 [10] print.aspell_inspect_context*                       
 [11] print.bibentry*                                     
 [12] print.Bibtex*                                       
 [13] print.browseVignettes*                              
 [14] print.by                                            
 [15] print.card                                          
 [16] print.changedFiles*                                 
 [17] print.check_code_usage_in_package*                  
 [18] print.check_compiled_code*                          
 [19] print.check_demo_index*                             
 [20] print.check_depdef* 
 ...

While all are not shown above, there are a total of 184 built-in classes associated with print(). Of course, we have just added the card class, so it shows up in this list as well. (see [15] above).

(If you are wondering why some of these are asterisked, this happens when they are hidden under other namespaces. We won't talk about namespaces here, but you can always google "R namespaces" if you are curious! 😊)

Now that you have some minimal understanding of objects in R, think back on the outputs you have seen from various functions in R, especially those whose output seemed a bit verbose (e.g. any of the hypothesis test functions).

Take the function t.test() for example. Here is an example of its application:

> men = c(102,87,101,96,107,101,91,85,108,67,85,82)
> women = c(73,81,111,109,143,95,92,120,93,89,119,79,90,126,62,92,77,106,105,111)
> t.test(x=men, y=women, alternative="two.sided", conf.level=0.95, var.equal=TRUE)

    Two Sample t-test

data:  men and women
t = -0.93758, df = 30, p-value = 0.3559
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -19.016393   7.049727
sample estimates:
mean of x mean of y 
 92.66667  98.65000

What's really happening here is that the t.test() outputs an object associated with the class "htest".

We can see the list sitting behind the scenes in any object by using the unclass() function. Below, we apply this function to the htest object resulting from an application of the t.test() function.

> results = t.test(x=men, y=women, alternative="two.sided", conf.level=0.95, var.equal=TRUE)
> unclass(results)
$statistic
         t 
-0.9375846 

$parameter
df 
30 

$p.value
[1] 0.3559453

$conf.int
[1] -19.016393   7.049727
attr(,"conf.level")
[1] 0.95

$estimate
mean of x mean of y 
 92.66667  98.65000 

$null.value
difference in means 
                  0 

$stderr
[1] 6.381646

$alternative
[1] "two.sided"

$method
[1] " Two Sample t-test"

$data.name
[1] "men and women"

Indeed, the elements in the list above are exactly those discussed in the Value section of the help file for t.test() (Type ?t.test to see this in RStudio.)

However, simply printing results results in the same text we saw earlier.

> results

    Two Sample t-test

data:  men and women
t = -0.93758, df = 30, p-value = 0.3559
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -19.016393   7.049727
sample estimates:
mean of x mean of y 
 92.66667  98.65000 

The reason we get a more consolidated version of the same information printed for us is due to the presence of a (built-in) custom print.htest function, that the (untyped) application of print() uses to accomplish the printing.