Data frame is a two dimensional data structure in R. It is a special case of a list which has each component of equal length.
Each component forms the column and contents of the component form the rows.
Check if a variable is a data frame or not
We can check if a variable is a data frame or not using the class()
function.
x <- data.frame(SN = c(1, 2), Age = c(21, 15), Name = c("John", "Dora"))
# print the data frame
print(x)
# check the type of x
print(typeof(x))
# check the class of x
print(class(x))
Output
SN Age Name 1 1 21 John 2 215 Dora [1] "list" [1] "data.frame"
In this example, x
can be considered as a list of 3 components with each component having a two element vector. Some useful functions to know more about a data frame are given below.
How to create a Data Frame in R?
We can create a data frame using the data.frame()
function.
For example, the above shown data frame can be created as follows.
# create a dataframe
x <- data.frame("SN" = 1:2, "Age" = c(21, 15), "Name" = c("John", "Dora"))
# print the structure of x
str(x)
Output
'data.frame': 2 obs. of 3 variables: $ SN :int 1 2 $ Age :num 21 15 $ Name:chr "John" "Dora"
Notice above that the third column, Name
is of type factor, instead of a character vector.
By default, data.frame()
function converts character vectors into factors.
To suppress this behavior, we can pass the argument stringsAsFactors=FALSE
.
x <- data.frame("SN" = 1:2, "Age" = c(21, 15), "Name" = c("John", "Dora"), stringsAsFactors = FALSE)
# print the structure of x
str(x)
Output
'data.frame': 2 obs. of 3 variables: $ SN :int 1 2 $ Age :num 21 15 $ Name:chr "John" "Dora"
Many data input functions of R like, read.table()
, read.csv()
, read.delim()
, read.fwf()
also read data into a data frame.
How to Access Components of a Data Frame?
Components of the data frame can be accessed like a list or like a matrix. Let's discuss some of the ways.
Accessing like a list
We can use either [
, [[
or $
operator to access columns of data frame.
x <- data.frame("SN" = 1:2, "Age" = c(21, 15), "Name" = c("John", "Dora"), stringsAsFactors = FALSE)
# access the "Name" column using different methods
print(x["Name"])
print(x$Name)
print(x[["Name"]])
print(x[[3]])
Output
Name 1 John 2 Dora [1] "John" "Dora" [1] "John" "Dora" [1] "John" "Dora"
Accessing with [[
or $
is similar. However, it differs for [
in that, indexing with [
will return us a data frame but the other two will reduce it into a vector.
Accessing like a matrix
Data frames can be accessed like a matrix by providing indexes for row and column.
To illustrate this, we use datasets already available in R. Datasets that are available can be listed with the command library(help = "datasets")
.
We will use the trees
dataset which contains Girth
, Height
and Volume
for Black Cherry Trees.
A data frame can be examined using functions like str()
and head()
.
trees <- data.frame(
Girth = c(8.3, 8.6, 8.8, 10.5, 10.7, 10.8, 11, 11, 11.1, 11.2),
Height = c(70, 65, 63, 72, 81, 83, 66, 75, 80, 75),
Volume = c(10.3, 10.3, 10.2, 16.4, 18.8, 19.7, 15.6, 18.2, 22.6, 19.9)
)
# print the structure of trees
str(trees)
# display the first 3 rows of trees
head(trees, n = 3)
Output
'data.frame': 10 obs. of 3 variables: $ Girth :num 8.3 8.6 8.8 10.5 10.7 10.8 11 11 11.1 11.2 $ Height:num 70 65 63 72 81 83 66 75 80 75 $ Volume:num 10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 Girth Height Volume 1 8.3 70 10.3 2 8.6 65 10.3 3 8.8 63 10.2
We can see that trees
are a data frame with 31 rows and 3 columns. We also display the first 3 rows of the data frame.
Now we proceed to access the data frame like a matrix.
trees <- data.frame(
Girth = c(8.3, 8.6, 8.8, 10.5, 10.7, 10.8, 11, 11, 11.1, 11.2),
Height = c(70, 65, 63, 72, 81, 83, 66, 75, 80, 75),
Volume = c(10.3, 10.3, 10.2, 16.4, 18.8, 19.7, 15.6, 18.2, 22.6, 19.9)
)
# select rows 2 and 3 of trees
trees[2:3, ]
# select rows with Height greater than 82
trees[trees$Height > 82, ]
# select the Height column of rows 10 to 12
trees[10:12, "Height"]
Output
Girth Height Volume 2 8.6 65 10.3 3 8.8 63 10.2 Girth Height Volume 6 10.8 83 19.7 [1] 75 NA NA
We can see in the last case that the returned type is a vector since we extracted data from a single column.
This behavior can be avoided by passing the argument drop=FALSE
.
How to modify a Data Frame in R?
Data frames can be modified like we modified matrices through reassignment.
x <- data.frame(
SN = c(1, 2),
Age = c(21, 15),
Name = c("John", "Dora")
)
# print the initial data frame
print(x)
# update the Age value in the first row to 20
x[1, "Age"] <- 20
# print the updated data frame
print(x)
Output
SN Age Name 1 1 21 John 2 2 15 Dora SN Age Name 1 1 20 John 2 2 15 Dora
Adding Components to Data Frame
Rows can be added to a data frame using the rbind()
function.
x <- data.frame(
SN = c(1, 2),
Age = c(20, 15),
Name = c("John", "Dora")
)
# print the initial data frame
print(x)
# create a new row and bind it to the data frame
new_row <- list(SN = 1, Age = 16, Name = "Paul")
x <- rbind(x, new_row)
# print the updated data frame
print(x)
Output
SN Age Name 1 1 20 John 2 2 15 Dora SN Age Name 1 1 20 John 2 2 15 Dora 3 1 16 Paul
Similarly, we can add columns using cbind()
.
x <- data.frame(
SN = c(1, 2),
Age = c(20, 15),
Name = c("John", "Dora")
)
# print the initial data frame
print(x)
# add a new column "State" to the data frame using cbind()
x <- cbind(x, State = c("NY", "FL"))
# print the updated data frame
print(x)
Output
SN Age Name 1 1 20 John 2 2 15 Dora SN Age Name State 1 1 20 John NY 2 2 15 Dora FL
Since data frames are implemented as lists, we can also add new columns through simple list-like assignments.
Deleting Component of Data Frame
Data frame columns can be deleted by assigning NULL
to it.
x <- data.frame(
SN = c(1, 2),
Age = c(20, 15),
Name = c("John", "Dora"),
State = c("NY", "FL")
)
# print the initial data frame
print(x)
# remove the "State" column from the data frame
x$State <- NULL
# print the updated data frame
print(x)
Output
SN Age Name State 1 1 20 John NY 2 2 15 Dora FL SN Age Name 1 1 20 John 2 2 15 Dora