Random Variables / Discrete Random Variables

The idea of a random variable starts with a numerical value determined by some chance process (i.e., a random experiment). As some examples, consider the following: Note how in each case, there is a numerical value in which we are interested. Things like the "prize won at the fair" or the "type of fish caught by a fisherman" didn't make this list. There are many things we can do with numerical values that we would have trouble doing with other types of information.

As an example -- and hopefully it won't spoil what is to come -- but sometimes in statistics we will want to talk about the average value (i.e., the mean) of something investigated. Finding averages should not be new to anyone -- they consist of adding up things and then dividing by how many things one has. However, how does one average different prizes won at the fair, or types of fish?

Being more accurate in our wording, random variables are methods for turning (possibly non-numerical) outcomes of a random experiment into numbers. Indeed, we define a random variable to be a function $X$, that assigns to each outcome $x$ in the sample space $S$ one and only one number.

Let us consider an example:

We've seen before the sample space for rolling two fair dice:

$$\begin{array}{c|c|c|c|c|c|c} & 1 & 2 & 3 & 4 & 5 & 6\\\hline 1 & (1,1) & (1,2) & (1,3) & (1,4) & (1,5) & (1,6)\\\hline 2 & (2,1) & (2,2) & (2,3) & (2,4) & (2,5) & (2,6)\\\hline 3 & (3,1) & (3,2) & (3,3) & (3,4) & (3,5) & (3,6)\\\hline 4 & (4,1) & (4,2) & (4,3) & (4,4) & (4,5) & (4,6)\\\hline 5 & (5,1) & (5,2) & (5,3) & (5,4) & (5,5) & (5,6)\\\hline 6 & (6,1) & (6,2) & (6,3) & (6,4) & (6,5) & (6,6)\\\hline \end{array}$$ Note that each element in the sample space is an ordered pair -- not a number.

However, one can turn each ordered pair into a number by summing the two coordinates. The random variable in this case is then the function $X$ that does the summing: i.e., $X(i,j) = i+j$.

In this way, $X$ is associated with a new sample space, call it $\mathscr{D} = \{2,3,4,\ldots,12\}$, as under $X$ the above table turns into: $$\begin{array}{c|c|c|c|c|c|c} & 1 & 2 & 3 & 4 & 5 & 6\\\hline 1 & 2 & 3 & 4 & 5 & 6 & 7\\\hline 2 & 3 & 4 & 5 & 6 & 7 & 8\\\hline 3 & 4 & 5 & 6 & 7 & 8 & 9\\\hline 4 & 5 & 6 & 7 & 8 & 9 & 10\\\hline 5 & 6 & 7 & 8 & 9 & 10 & 11\\\hline 6 & 7 & 8 & 9 & 10 & 11 & 12\\\hline \end{array}$$ ...and a new probability set function, $P_X$ (where, for example $P_X(7 \textrm{ or } 11) = 8/36)$.

It is important to note that summing the two coordinates is not the only way to create a number in this context. Another very flexible way to do this is to think of the number associated with each roll as a "net pay-out" (i.e, profit minus cost) for that roll when playing some game at a carnival. As an example, suppose you were rolling the two dice in the context of a game that costs $\$7$ dollars to play and awards $\$100$ for a roll of "box cars" (i.e., two 6's), $\$10$ for each 5 rolled, and nothing for the rest. In this case, the function $X$ is defined by

$$X(i,j) = \left\{ \begin{array}{ll} 93 & \textrm{ if } i = 6 \textrm{ and } j = 6\\ 3 & \textrm{ if } i = 5 \textrm{ or } j = 5, \textrm{ but not both}\\ 13 & \textrm{ if both } i = 5 \textrm{ and } j = 5\\ -7 & \textrm{ otherwise } \end{array} \right.$$ with a new sample space of $\{93,3,13,-7\}$, as suggested by the corresponding table: $$\begin{array}{c|c|c|c|c|c|c} & 1 & 2 & 3 & 4 & 5 & 6\\\hline 1 & -7 & -7 & -7 & -7 & 3 & -7\\\hline 2 & -7 & -7 & -7 & -7 & 3 & -7\\\hline 3 & -7 & -7 & -7 & -7 & 3 & -7\\\hline 4 & -7 & -7 & -7 & -7 & 3 & -7\\\hline 5 & 3 & 3 & 3 & 3 & 13 & 3\\\hline 6 & -7 & -7 & -7 & -7 & 3 & 93\\\hline \end{array}$$ ..and a new probability set function $P_X$ (where, for example $P_X(3) = 10/36$)

Note, that by adjusting the pay-out values, we can assign pretty much whatever numbers we wish -- which is consistent with the lack of any restrictions on the function in the definition of a random variable other than it assign some number to every outcome in the sample space.

In the examples above, the new sample spaces that resulted all had a finite number of outcomes (i.e., 36). In other contexts (i.e., those not involving the rolling of two dice) one might have more or less. Indeed, one can even consider scenarios where there are an infinite number of outcomes -- consider the number of times one must flip a coin before seeing a head. It's incredibly unlikely for this number to correspond to more than a handful of flips -- but we can't say with certainty that it will take less than any given number of tosses of the coin. In both of these cases, however, we say that the set of outcomes is countable, which is just a fancy way of saying we could put all of the outcomes in a (possibly infinite) list.

There are uncountable sets too. The set of real numbers is one that will be important to us later. (Wait -- did we just suggest the real numbers can't be put into a list? Yes we did! Further, you can prove this amazing fact -- look up "Cantor's Diagonal Argument" if you are curious.)

Just as we were able to calculate the probabilities of rolling different sums with two dice (recall our earlier conclusion that $P_X(7 \textrm{ or } 11) = 8/36$), we seek to calculate the probabilities associated with different values corresponding to whatever random variable $X$ we might need to investigate. But how do we do that? As it turns out, we need to do different things depending on whether the sample space associated with $X$ is countable or uncountable.

As such, we classify random variables based on this countability (or uncountability). When the sample space is countable, we say the random variable is a discrete random variable. When the sample space is uncountable, we say the random variable is a continuous random variable.

Let's consider the case of finding $P_X$ first when $X$ is a discrete variable.

Recall the new sample space $\mathscr{D}$ that resulted from considering the random variable $X$ equal to the sum of the values showing on two rolled dice: $$\begin{array}{c|c|c|c|c|c|c} & 1 & 2 & 3 & 4 & 5 & 6\\\hline 1 & 2 & 3 & 4 & 5 & 6 & 7\\\hline 2 & 3 & 4 & 5 & 6 & 7 & 8\\\hline 3 & 4 & 5 & 6 & 7 & 8 & 9\\\hline 4 & 5 & 6 & 7 & 8 & 9 & 10\\\hline 5 & 6 & 7 & 8 & 9 & 10 & 11\\\hline 6 & 7 & 8 & 9 & 10 & 11 & 12\\\hline \end{array}$$

To find $P_X(7 \textrm{ or } 11)$, it is helpful to partition the sample space into a mutually exclusive and exhaustive collection of sets by value, and then consider the probabilities that each such set occurs (which correspond to outputs of the related probability set function $P$). That is to say, we make a table like the one below:

$$\begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|}\hline \textrm{Value} & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 & 12\\\hline \textrm{Probability} & \frac{1}{36} & \frac{2}{36} & \frac{3}{36} & \frac{4}{36} & \frac{5}{36} & \frac{6}{36} & \frac{5}{36} & \frac{4}{36} & \frac{3}{36} & \frac{2}{36} & \frac{1}{36}\\\hline \end{array}$$

To make talking about the probabilities associated with different values easier, we view this table as a new function, denoted $p_X$, whose inputs are the values and whose outputs are the probabilities those outputs occur.

This leads to an important definition. Let our original sample space be $S$ and our new sample space for $X$ be $\mathscr{D} = \{d_1,d_2,d_3,\ldots\}$. Further, suppose $P$ is the probability set function corresponding to the original random experiment and whose domain consists of subsets of $S$. Now, let event $C$ be the subset of all $c$ in $S$ where $X(c) = d_i$. Finally, we define the probability mass function, $p_X$ with domain $\mathscr{D}$ so that $p_X(d_i) = P(C)$.

One should be aware, given how $p_X$ is defined in terms of a probability set function $P$, not just any table can represent a probability mass function. Specifically, for the new sample space $\mathscr{D}$ in question, the following two properties must hold: $$ 0 \le p_X(d_i) \le 1 \textrm{ for all } d_i \in \mathscr{D} \quad \quad \textrm{and} \quad \quad \sum_{d_i \in \mathscr{D}} p_X(d_i) = 1$$

Again turning our attention to how we previously calculated $P_X(7 \textrm{ or } 11)$, note that the set of rolls corresponding to a $7$ and the set of rolls corresponding to a $11$ are mutually exclusive. Consequently, we can find $P_X(7 \textrm{ or } 11)$ by finding a sum: $$P_X(7 \textrm{ or } 11) = p_X(7) + p_X(11) = \frac{6}{36} + \frac{2}{36} = \frac{8}{36}$$

More generally, we can find the probability of any event $D$ in our new sample space $\mathscr{D} = \{d_1,d_2,d_3,\ldots\}$ by summing similar probabilities coming from the probability mass function: $$P_X(D) = \sum_{d_i \in D} p_X(d_i)$$

Simplifying Notation

Finally, a few remarks about how one can simplify the notation used above are in order:

Putting Things Together

Let's try to solidify all of these ideas (notational and otherwise) with one final example:

Suppose we are interested in the random experiment where one flips a coin three times.

The sample space for this random experiment consists of 8 equally likely possibilities:

HHHHHTHTHHTTTHHTHTTTHTTT

Now let the random variable $X$ be the number of heads seen.

The new sample space is then

3 2 2 1 2 1 1 0

The probability mass function is given by $$\begin{array}{|c|c|c|c|c|}\hline x & 0 & 1 & 2 & 3 \\\hline P(x) & \frac{1}{8} & \frac{3}{8} & \frac{3}{8} & \frac{1}{8}\\\hline \end{array}$$

We can do a partial check on our calculations by recognizing we must end up with a legitimate probability mass function, where the following two properties are satisfied, relative to its domain $\mathscr{D}$: $$0 \le P(x) \le 1 \textrm{ for every } x \in \mathscr{D} \quad \textrm{ and } \quad \sum_{x \in \mathscr{D}} P(x) = 1$$ That is to say, the probability of any particular number of heads can't be negative or greater than 100%, and the sum of the probabilities should equal one (as 100% of the time, there is an outcome).

We are now armed to calculate probability questions like "What's the probability that one sees more than one head in 3 flips of a coin?"

Answer: $$P(X > 1) = P(2) + P(3) = \frac{3}{8} + \frac{1}{8} = \frac{1}{2}$$


One last observation -- instead of using a table, one can also specify the probability mass function associated with a random variable using a formula -- as in the case of the binomial distribution, which gives the probability of observing $x$ successes in $n$ Bernoulli trials (more on what these words mean later..):

$$P(x) = {}_nC_x p^x q^{n-x} \quad \textrm{where } x = 0,1,2,\ldots, n$$

The fact that all of the probabilities so produced are between zero and one, with a sum of exactly one is less obvious here -- but still present. Indeed, if we consider the case when $n=3$ and $p=1/2$, we produce precisely the same probability mass function as that just seen when counting numbers of heads seen in 3 flips of a coin!