v = c("A", "B", "C", "D")) (
[1] "A" "B" "C" "D"
November 29, 2023
The most basic data type in R is the atomic vector, which is essentially a one-dimensional array comprised of elements of a single type. Therefore, even a scalar number (such as 4.17
) is actually a vector under the hood. This might be surprising, especially if you have experience in other programming languages, but it implies that array operations (which are extremely useful for data analysis) are built right into R. In this post, I will discuss how to extract one or more elements from a vector, a process commonly referred to as subsetting.
In general, R uses a pair of square brackets []
for selecting specific elements in a given object. Let’s consider the following vector v
:
This vector has four elements, and we can grab a specific element by specifying its position within the square brackets. For example, here’s how we can extract the second element of v
:
R uses 1-based indexing, so the position of the first element is 1
, followed by 2
for the second element, and so on.
If we want to select two or more elements, it is necessary to wrap the desired positions within c()
, because R always expects a single vector within the square brackets:
Negative indices grab all elements except for those specified by the negative numbers. For example, to get all elements except the third one:
There is no special syntax for selecting the last element of a vector, so we have to write:
We can also use logical vectors for subsetting. The outcome will comprise elements corresponding to positions where the logical vector evaluates to TRUE
. The following example illustrates this idea:
The result contains the second and third elements of v
, because only the second and third elements of the logical index vector are TRUE
. Subsetting with logical vectors usually involve comparisons (which evaluate to logical vectors), so we can filter elements based on the values of the original vector. The following example illustrates this idea with a numeric vector x
and a subset containing only positive values of x
:
Finally, we can select elements in a named vector not only by position but also by name: