Probability Set Functions

The relative frequency of an event (also called the empirical probability) is the number of times an event $E$ occurs divided by the number of trials conducted relative to a particular probability experiment. As an example, suppose we roll a die 5 times, yielding rolls 2, 6, 3, 6, 4. The relative frequency in this experiment of rolling an even value is thus $$P_e(E) = \frac{\textrm{# of rolls of 2's, 4's, and 6's observed}}{\textrm{total # of rolls}} = \frac{4}{5} = 0.80$$ where $P_e(E)$ here denotes the empirical probability of event $E$.

First, some observations are in order...

  1. Note that as one runs more and more trials, the relative frequencies tied to an event will of course change. That said, over the long haul our intuition tells us that the overall relative frequency should settle down (in a limiting sense). This is often referred to as the Law of Large Numbers, which says that as the number of trials increases, $P_e(E)$ generally approach the true probability of event $E$. As an example, the more times a die is rolled, the closer we can expect the relative frequency of even rolls to get to $0.50$.

    Importantly, know that the Law of Large Numbers does NOT say that if a particular event hasn't previously occurred as often as we might have expected, knowing the true probability associated with the event, that we are in any way "due" for this event to occur. Being "due" suggests the probability for the event to occur has somehow increased for these later trials, which should not be the case.

  2. Additionally, as the fractions that calculate empirical probabilities only involve counts of things -- the probabilities should never be negative. Using an inequality to say the same thing more efficiently: If $C$ is any event, then $P_e(C) \ge 0$.

  3. Third, recall that an event was defined as a subset of the sample space. Notably absent from this definition is any requirement that the event be a proper subset of the sample space (i.e., a set with a non-empty complement). Thus, one may consider the event $S$ that consists of the entirety of the sample space and thereby contains all possible outcomes. The occurrence of $S$ in any trial is clearly a certainty, and should have probability 100%. Writing this probability in decimal form (which will be a common practice going forward), we can again write this more efficiently as $P_e(S) = 1$.

  4. Lastly, consider the relationship between the empirical probabilities observed for disjoint events $A$ and $B$ and the empirical probability of the event that corresponds to their union. An example might help us see this relationship more clearly: Suppose for a single roll of a die, event $A$ is associated with seeing an even value (i.e., 2, 4, or 6), while event $B$ is seeing the value 3. A die is rolled 100 times. In those 100 rolls, suppose $A$ occurs 47 times and $B$ occurs 19 times. As $A$ and $B$ are disjoint events, they can't both occur at the same time. Thus, either $A$ or $B$ occurs $47 + 19 = 66$ times. Consequently, $$P_e(A) = \frac{47}{100}, \quad \quad P_e(B) = \frac{19}{100}, \quad \quad \textrm{and} \quad \quad P_e(A \textrm{ or } B) = \frac{47+19}{100} = \frac{66}{100}$$

    It should be clear from this example that, more generally, if $A$ and $B$ are disjoint events, then $$P_e(A \textrm{ or } B) = P_e(A) + P_e(B)$$

    We are not limited to only considering two events $A$ and $B$, however. Suppose $\mathscr{C} = \{C_1,C_2,\ldots C_m\}$ is a collection of mutually exclusive events. This means that events in the collection are pairwise disjoint (i.e., $C_i \cap C_j = \varnothing$ for every possible pair $(C_i,C_j)$ of distinct sets in $\mathscr{C}$). Note that we can apply the above result repeatedly to find that $$P_e( C_1 \textrm{ or } C_2 \textrm{ or } \ldots \textrm{ or } C_m) = P_e(C_1)+P_e(C_2)+\cdots+P_e(C_m)$$

    Remembering that when we consider the event $A \textrm{ or } B$, we are really considering their union (i.e., $P_e(A \textrm{ or } B) = P_e(A \cup B)$), we can both simplify and generalize the equations above by adopting the following notation:

    Similar to how $\sum_{C \in \mathscr{C}} C$ describes the sum of all elements $C$ in a set $\mathscr{C}$, Let the union of all events $C$ in the collection $\mathscr{C}$ (even if the collection is infinite) be denoted by $\bigcup_{C \in \mathscr{C}} C$. Thus, if $\mathscr{C} = \{C_1, C_2, C_3, \ldots\}$, $$\bigcup_{C \in \mathscr{C}} C = C_1 \cup C_2 \cup C_3 \cup \cdots$$

    Then, when considering any collection $\mathscr{C} = \{C_1, C_2, C_3, \ldots\}$ of mutually exclusive events (even an infinite collection), we expect the following to hold: $$P_e \left(\bigcup_{C \in \mathscr{C}} C \right) = \sum_{C \in \mathscr{C}} P_e(C)$$


Recall again that for a given random experiment and its sample space $S$, empirical probabilities over the long haul seem to get close to the "true" probabilities associated with the events in question. These "true" probabilities should, of course, share the same properties that are enjoyed by their corresponding empirical probabilities.

With this in mind, we are ready to make a definition that will make our meaning of the word probability more precise...

Probability Functions

Let us define a function $P$ to be a probability set function (or when the context is clear, simply a probability function) relative to a sample space $S$ and say $P(C)$ is the probability of event $C$, when the following three properties hold:
  1. $P(C) \ge 0$, for all events $C$
  2. $P(S) = 1$
  3. For any collection of mutually exclusive events $\mathscr{C} =\{C_1, C_2, C_3, \ldots\}$, $$P \left(\bigcup_{C \in \mathscr{C}} C \right) = \sum_{C \in \mathscr{C}} P(C)$$

There are a host of useful properties that directly result from these three defining properties. The following details some of the more critical ones for our purposes:

Unusual Events

Much of the study of statistics involves looking at certain situations and deciding whether or not under certain assumptions what is observed is unlikely. If the observations are "unlikely enough", one can feel fairly confident in rejecting the assumptions initially made. To setup a (somewhat arbitrary) standard for what we mean by "unlikely enough", let us call an event unusual if its probability is less than or equal to 0.05.