A vector in R is an ordered collection of values, and is a data type that is fundamental to how R functions.
The term "values" in R is broader in scope than what we normally take the term to mean. A value in R can be a numerical value (e.g., 1.2, 5, -79.843
), but it can also be a character or string of characters(e.g., "a", "alice", "trial_4"
). Values can also contain logical values (i.e., TRUE
or FALSE
). As a convenience, a special value, NA
, can also be used in any vector as a place-holder for a value that is "not available" for one reason or another.
An important characteristic for a vector, however, is that all of its values must be of the same type. Thus, you can't mix numerical values and strings of characters in the same vector, for instance.
A variable can store a vector just as easily as it stores a single numerical value (Spoiler alert: single numerical values actually are vectors in R). Indeed, variables can store any of the types of "values" mentioned above.
To create a vector and assign it to a variable, we use the c()
function and the =
assignment operator, as the next couple of examples illustrate. Recall that the =
operator as used here does not mean what it means in mathematics. It means instead to take the element specified on its right side and store it in the specified element on its left side.
> myNumericalVector = c(1,2,3) > myNumericalVector [1] 1 2 3 > myStringVector = c("alpha","beta","gamma") > myStringVector [1] "alpha" "beta" "gamma" > myLogicalVector = c(TRUE,FALSE,FALSE,FALSE) > myLogicalVector [1] TRUE FALSE FALSE FALSE
In the case where we only wish to store a single value, we may do so in a manner that requires slightly less typing. However, the result is still a vector.
> x = 5 # yields exactly the same output as: x = c(5) > x [1] 5
In the case where we wish to store a vector whose consecutive elements have a common difference of one (e.g., $(4,5,6,7,8)$), we can use a colon to more quickly define the vector, as shown below.
> a = 4:8 # yields exactly the same output as: a = c(4,5,6,7,8) > a [1] 4 5 6 7 8
To create a vector whose elements form an arithmetic sequence with a common difference other than one, we can use the seq()
function:
> b = seq(from=6,to=18,by=2) # here we specify the common difference with "by=" > b [1] 6 8 10 12 14 16 18 > c = seq(from=6,to=26,length.out=11) # the number of elements is determined by "length.out=" > c [1] 6 8 10 12 14 16 18 20 22 24 26 > c = seq(from=5,by=5,length.out=5) # other combinations of arguments can work as well > c [1] 5 10 15 20 25
Vectors can be easily concatenated (i.e., joined together) by simply using them as arguments to the c()
function.
> v1 = c(1,2,3) > v1 [1] 1 2 3 > v2 = 4:6 > v2 [1] 4 5 6 > v3 = c(v1,v2,7) > v3 [1] 1 2 3 4 5 6 7
To create a vector containing repeated values, we use the rep()
function
> rep(x=1,times=4) [1] 1 1 1 1 > rep(x=c(1,2,3),each=2) [1] 1 1 2 2 3 3 > rep(x=c(1,2,3),times=3,each=2) [1] 1 1 2 2 3 3 1 1 2 2 3 3 1 1 2 2 3 3 > rep(x=c(1,2,3),times=c(2,3,4)) [1] 1 1 2 2 2 3 3 3 3
We can use vectors to create other vectors too. What follows gives some examples of this (but it is certainly not an exhaustive list):
sort()
function to put a vector's elements in increasing or decreasing order:
> data = c(3,2,9,5,4,1,6,8,7) > sort(data, decreasing=FALSE) [1] 1 2 3 4 5 6 7 8 9 > sort(data, decreasing=TRUE) [1] 9 8 7 6 5 4 3 2 1
sqrt()
function to take square roots of the elements of a vector:
> squares = c(1,4,9,16,25) > sqrt(squares) [1] 1 2 3 4 5
Other mathematical functions, like log()
, exp()
, sin()
, etc work similarly.
is.na()
function to determine if any values of a vector are NA
or NaN
. Note that the result is a vector of logical values (i.e., TRUE
or FALSE
).
> v = c(1,5,NA,3,NaN,7,NA,NA) > is.na(v) [1] FALSE FALSE TRUE FALSE TRUE FALSE TRUE TRUE
> vec = c(1,2,3) + c(4,5,6) > vec [1] 5 7 9Similar things happen with other arithmetic operators. However, one should be careful when asking R to divide by zero or do some other inappropriate mathematical operation. In such cases, you may see special values (i.e.,
Inf
or NaN
) appear when the result of a calculation is either "infinite" or "not a number", as the below demonstrates:
> v1 = c(1,0,1,-1) > v2 = c(1,0,0,0) > v1/v2 [1] 1 NaN Inf -Inf
> vec = c(1,2,3,4) > vec+3 [1] 4 5 6 7
If the longer vector's length is not a multiple of the smaller vector length - while the calculation will proceed - R will issue a warning.
> c(1,2,3,4,5,6) + c(1,2,3) [1] 2 4 6 5 7 9 > c(1,2,3,4,5,6,7,8) + c(1,2,3) [1] 2 4 6 5 7 9 8 10 Warning message: In c(1, 2, 3, 4, 5, 6, 7, 8) + c(1, 2, 3) : longer object length is not a multiple of shorter object length
The list of functions that take a vector as input in R is ridiculously long. However, the following three functions may prove very useful in the near future:
We can find the number of elements in a given vector with the length()
function:
> nums = c(5,8,3,2) > length(nums) [1] 4
We can find the sum and product of elements in a given vector with the sum()
and product()
functions:
> nums = c(5,8,3,2) > sum(nums) [1] 18 > product(nums) [1] 240
Sometimes one might be interested in the value at a particular position in a vector. Extracting such an element, or a group of such elements is called subsetting, and can be accomplished through an appropriate use of square brackets, as illustrated below.
> nums = c(5,8,3,2) > nums[1] # The 1st element is at position 1, which is contrary to how vectors are treated in some other programming languages [1] 5 > nums[2] [1] 8 > nums[length(nums)] [1] 2 > nums[2:4] [1] 8 3 2 > nums[c(1,3,4)] [1] 5 3 2
One can also extract elements from a vector by using logical values (i.e., TRUE
and FALSE
). Essentially, for each TRUE
seen, the corresponding element of the vector being subsetted will be included in the subset and for each FALSE
seen, the corresponding element will be excluded. An example is shown below:
> nums = c(5, 8, 3, 2) > nums[c(TRUE,FALSE,FALSE,TRUE)] [1] 5 2
The example immediately above may seem like a cumbersome way to do things -- why would one want to type all those "TRUE
" and "FALSE
" values when one could simply use nums[c(1,4)]
instead?
If you knew the positions of the elements of the vector you want, you would be correct. However, we often want to extract elements of a vector that meet some condition instead.
For example, maybe one wishes to extract all of the even elements of the vector nums
. As we will soon see when we discuss logical values in R, there is a super fast way for R to decide if the condition "this element is even" is TRUE
or FALSE
for each element of a vector, producing a vector of TRUE
/FALSE
values as a result. The same can be said of many other conditions of interest. We can then use each such generated vector of TRUE
/FALSE
values to subset nums
or whatever other vector we might need to subset (instead of a hand-typed vector of TRUE
/FALSE
values).
One can also use square brackets to remove elements from a vector. The difference between this use of square brackets and subsetting is that here we use negative values inside of the brackets. Each negative value indicates a position in the original vector with its absolute value (e.g., $-3$ corresponds to position $3$, $-5$ corresponds to position $5$, and so on).
The result is a new vector with the elements of the original vector at the indicated positions removed. Note, actually altering the original vector will require another assignment, as shown below:
> nums = c(4,5,6,7,8,9,10,11,12) > nums[-3] # create vector identical to nums but with 3rd element removed [1] 4 5 7 8 9 10 11 12 > nums # nums is left unchanged [1] 4 5 6 7 8 9 10 11 12 > nums = nums[-c(1,6:8)] # remove elements in 1st, 6th, 7th, and 8th positions and reassign this new vector to nums > nums # now nums has been altered [1] 5 6 7 8 9