R Calculations, Variables, and Simple Data Sets

The Text-Based Nature of R

R is a program designed to run in a terminal or console environment. RStudio provides a nice "front end" for R, adding to the console environment other windows to keep track of files, images, variable values, etc... However, this front end is not a necessity. For those that wish to do so, you can launch and use R with just a simple "Terminal" application under OS X, or the "Windows Subsystem for Linux" under Windows. Below is a picture of what working with R inside OS X's "Terminal" program looks like:

While most all of our efforts will be confined to using R through RStudio, the point of bringing up these other ways of working with R is to emphasize that R, by itself, is entirely text-based. Without an additional integrated development environment (IDE) like R Studio, it doesn't even recognize the mouse!

An entirely text-based application may seem like a throw-back to the days of old when MS-DOS ruled the world, and Apple and Windows operating systems were still finding their footings. Indeed, the great masses of computer users rarely run text-based programs anymore. Instead, they use programs that provide intuitive and easy-to-use graphical user interfaces (GUIs), full of windows, colorful icons, buttons, etc.

Text-based programs, on the other hand, generally have a steeper learning curve -- but they also offer tremendous flexibility, application, and speed. As such, even today, they are frequently the programs of choice for the power users and computer gurus of the world.

Using R For Basic Calculations

You can think of working with R as a conversation. When you start R (or R Studio), you are presented with a prompt -- an invitation to say something to R. Upon typing some command or expression, R produces some response. Then you get another prompt, and the process starts all over again.

While R is well more powerful than a scientific calculator, it can certainly function as one, when desired. Suppose you want R to perform a simple addition problem, like $1 + 1$. Simply type 1+1 at the prompt, and hit enter to see the answer:

> 1+1
[1] 2

For the moment, you can ignore the "[1]" that precedes the answer of "2". Some input expressions we will deal with later can cause R to produce many, many values in response. The "[1]" in this case simply means that the "2" seen to its immediate right is the first value output in response to "1+1".

Of course, we can do more than add. R supports all of the regular arithmetic operators (addition, subtraction, multiplication, and division).

> 1+1
[1] 2
> 2+3
[1] 5
> 2+3
[1] 5
> 2-3
[1] -1
> 2*3
[1] 6
> 2/3
[1] 0.6666667
> 2^3
[1] 8

R also supports some arithmetic operations you may not have known about.

> 17 %/% 5
[1] 3
> 17 %% 5
[1] 2

The %/% above performs the operation of integer division, which is identical to regular division, except it throws away any remainder.

The %%, on the other hand, outputs just the remainder of the related quotient.

We can combine multiple operations into a single mathematical expression, of course. When doing so, order of operations are preserved -- that is to say, things inside parentheses are evaluated first, multiplication and division happen before addition or subtraction, exponentiation happens before multiplication or division, and so on...

> 2 + 3*(1 + 5)^2
[1] 110

R is aware of some important mathematical constants. To use $\pi$ in an expression, for example, one just types "pi":

> 2*pi
[1] 6.283185

In addition to the simple operators and constants discussed above, R has many, many functions that one can use to build more complicates expressions. For example, one may wish to create an expression involving square roots, trigonometric functions, or logarithms. The following example calculates the value of $$\sin 3\pi/2 + \ln e^5 - \sqrt{4}$$

> sin(3*pi/2) + log(exp(5)) - sqrt(4)
[1] 2

More information on any of these functions can be found by typing the function name preceded by a "?" character. So for example, suppose we were interested in finding out how to change the base for the log function from the default $e$ to some other value. As such, we might type "?log" at a prompt, whereby we will be presented with an entire web page detailing everything we might want to know about the log function and all its variants and related functions. Try it!

Working with Variables and Simple Data

The value of an expression can be assigned to a variable for later use. You can think of a variable as a box with a particular name. When we assign a value to a variable, we are essentially storing that value in the corresponding box.

In R, there are some limitations to the variable names one can use. Valid names consist of letters, numbers, and the dot (period) or underline characters. They must start with a letter or a dot not followed by a number. As examples, a name such as ".2way" would not be valid, while way_2 would be just fine.

There are also some reserved words that can not be used:

TRUE    NA             if        in
FALSE   NA_integer_    else      next
NULL    NA_real_       repeat    break
Inf     NA_complex_    while
NaN     NA_character_  function

To assign a value to a variable, one can use either the "=" or "<-" symbol, in a manner consistent with the following examples:

> x <- 3 + 4
> x
[1] 7

In this example, $7$ is stored as the value associated with variable $x$. Note that the assignment doesn't produce any visible output. To verify that $x$ has the value $7$, we type $x$ at a separate prompt.

Here's a slightly different example:

> y = 14 / 2
> z = y + 3
> z
[1] 10

Here, we use the alternate assignment operator ("="). We first assign the value of $7$ to $y$, and then $z$ is assigned the sum of $y$'s value and $3$. Neither of the assignments result in any visible output, so we type $z$ as a third input to verify things went as planned.

Notice that "=" in R does not mean "equals" in the mathematical sense. For this reason alone, many people prefer the "<-" form of the assignment operator.

Just as one can take something out of a box and replace it with something else -- one can re-assign a variable to a different value, as shown below.

> radius = 3
> 3.14 * radius^2
[1] 28.26
> radius = 5
> 3.14 * radius^2
[1] 78.5

Importantly, R is not limited to only associating a single value with a variable. It may associate a variable with a data set of many, many values. Let's make a simple data set (called a vector in R) consisting of the first five primes: 2, 3, 5, 7, and 11 and store it in a variable named $x$.

> x <- c(2,3,5,7,11)
> x
[1] 2 3 5 7 11

Here again, notice the first line only stores the values in the variable $x$, so to see the "value" of the vector $x$, we type just x on a separate later line.

We will talk more about vectors later, but let's take a look at a few basic ideas concerning them.

If we want to access to an individual element of a vector, we use square brackets. As an example, suppose one wishes to know the second element of the vector $x$ defined above. The following will do the trick:

> x[2]
[1] 3

We can find multiple elements of a vector in a similar way. Suppose one wishes to find the values of $x$ in the 3rd, 5th, and 4th positions (note: a more popular word for "position" in this context is index) in that order. The following code accomplishes this:

> x[c(3,5,4]
[1] 5 11  7

In the same way that the previously mentioned functions like sin(), log(), and sqrt() could be applied to individual values, other functions can be applied to entire data sets (e.g., vectors).

As an example -- in statistics, one is often interested in the average value of a data set, which is called its mean. In R, we can apply the mean() function to our data set $x$, and store the value calculated in another variable $m$:

> m <- mean(x)
> m
[1] 5.6

R comes "pre-loaded" with a bunch of data sets for demonstration purposes. You can see a list of these data sets by typing the following:

> data()

Let's take a peek at one of these data sets -- one called simply "rivers" -- that contains the lengths of major rivers in North America. If you want to see all 141 river lengths (in miles), simply type "rivers" at the prompt:

> rivers
  [1]  735  320  325  392  524  450 1459  135  465  600  330
 [12]  336  280  315  870  906  202  329  290 1000  600  505
 [23] 1450  840 1243  890  350  407  286  280  525  720  390
 [34]  250  327  230  265  850  210  630  260  230  360  730
 [45]  600  306  390  420  291  710  340  217  281  352  259
 [56]  250  470  680  570  350  300  560  900  625  332 2348
 [67] 1171 3710 2315 2533  780  280  410  460  260  255  431
 [78]  350  760  618  338  981 1306  500  696  605  250  411
 [89] 1054  735  233  435  490  310  460  383  375 1270  545
[100]  445 1885  380  300  380  377  425  276  210  800  420
[111]  350  360  538 1100 1205  314  237  610  360  540 1038
[122]  424  310  300  444  301  268  620  215  652  900  525
[133]  246  360  529  500  720  270  430  671 1770

At the risk of getting ahead of ourselves, we can probably get a better feel for this data, however, by calculating both its mean and its standard deviation (another useful concept from statistics) and plotting a histogram (a nice way to visualize some data sets).

R makes doing these three things (and so many other things) exceedingly easy, as seen below.

Note, in the commands below, a # indicates a comment meant for humans and all text to the right of this symbol is ignored by R.

> mean(rivers)
[1] 591.1844
> sd(rivers)    # sd() is a function which finds standard deviations
[1] 493.8708
> hist(rivers)  # hist() is a function which plots histograms

Data From Other Sources

I know what you are thinking -- what if you want to examine some data that's not automatically provided with R?

We'll answer this question more fully later -- but for now, if you just want to store a list of numerical values you found on the web somewhere into a vector so you can do all the things we just did to the rivers data set, don't worry -- there's a really easy way to do it!

We can use the scan() function, as shown in the example below:

Suppose you come across the following data set on the web, and wish to use it in R:

230, 146, 54, 463, 75, 377, 346, 77, 430, 30, 247, 251, 225, 488, 87, 424, 356, 80, 45, 171, 232, 466, 89, 209, 259

To store this data in a variable $x$, simply do the following:

  1. Select all of the values and use "Ctrl-C" or "⌘-C" as appropriate, to copy the data from the webpage.

  2. Then, noting that commas separate the values, type the following in R:

    x = scan(sep=",")
    

    Note: if there were no commas, you can just use x = scan().

  3. A new prompt of "1:" will be shown.

    > x = scan(sep=",")
    1: 
    

  4. Use "Ctrl-V" or "⌘-V", as appropriate to then paste the data in R, and hit "Enter".

  5. At this point R will display some number other than 1, followed by a colon.

    > x = scan(sep=",")
    1: 230, 146, 54, 463, 75, 377, 346, 77, 430, 30, 247, 251, 225, 488, 87, 424, 356,
    80, 45, 171, 232, 466, 89, 209, 259
    26: 
    

    The "26" above tells you that R has scanned 25 values and is waiting on you to type more if you wish, or to indicate that you are done by hitting "Enter" a second time. We have no more data, so hit "Enter" a second time.

  6. That's it! Your data is now stored in the variable $x$.

Quitting the Program

We've just scratched the surface of R, but that will do for now.

If you are working in R-Studio, you can quit the program in the normal way (i.e., select "Quit RStudio" from the menu). Alternatively (especially if you are working with R in a console or other environment) -- you can simply type q() at the prompt.

In both situations, however, you will be asked if you wish to save your workspace image. If you answer in the affirmative, all of the variables with which you have been working will be automatically loaded, so you can pick up right where you left off...