In order to write functions in a proper way and avoid unusual errors, we need to know the concept of environment and scope in R.
R Programming Environment
Environment can be thought of as a collection of objects (functions, variables etc.). An environment is created when we first fire up the R interpreter. Any variable we define, is now in this environment.
The top level environment available to us at the R command prompt is the global environment called R_GlobalEnv
. Global environment can be referred to as .GlobalEnv
in R codes as well.
We can use the ls()
function to show what variables and functions are defined in the current environment. Moreover, we can use the environment()
function to get the current environment.
Example of environment() function
# assign value to the variable a and b
a <- 2
b <- 5
# define a function and assign the value 0 to the parameter x
f <- function(x) x<-0
# ls() function list the objects in the current working environment
ls()
# retrieve the environment where a function is defined
environment()
Output
[1] "a" "b" "f" <environment: R_GlobalEnv>
In the above example, we can see that a
, b
and f
are in the R_GlobalEnv
environment.
Notice that x
(in the argument of the function) is not in this global environment. When we define a function, a new environment is created.
Here, the function f()
creates a new environment inside the global environment.
Actually an environment has a frame, which has all the objects defined, and a pointer to the enclosing (parent) environment.
Hence, x
is in the frame of the new environment created by the function f
. This environment will also have a pointer to R_GlobalEnv
.
Example: Cascading of environments
f <- function(f_x){
g <- function(g_x){
print("Inside g")
print(environment())
print(ls())
}
g(5)
print("Inside f")
print(environment())
print(ls())
}
f(6)
environment()
Output
[1] "Inside g" <environment: 0x5649bd74dec8> [1] "g_x" [1] "Inside f" <environment: 0x5649bd7471d0> [1] "f_x" "g" <environment: R_GlobalEnv>
In the above example, we have defined two nested functions: f
and g
.
The g()
function is defined inside the f()
function. When the f()
function is called, it creates a local variable g
and defines the g()
function within its own environment.
The g()
function prints "Inside g"
, displays its own environment using environment()
, and lists the objects in its environment using ls()
.
After that, the f()
function prints "Inside f"
, displays its own environment using environment()
, and lists the objects in its environment using ls()
.
R Programming Scope
In R programming, scope refers to the accessibility or visibility of objects (variables, functions, etc.) within different parts of your code.
In R, there are two main types of variables: global variables and local variables.
Let's consider an example:
outer_func
b
inner_func
c
}
}
a
Global Variables
Global variables are those variables which exist throughout the execution of a program. It can be changed and accessed from any part of the program.
However, global variables also depend upon the perspective of a function.
For example, in the above example, from the perspective of inner_func()
, both a
and b
are global variables.
However, from the perspective of outer_func()
, b
is a local variable and only a
is a global variable. The variable c
is completely invisible to outer_func()
.
Local Variables
On the other hand, local variables are those variables which exist only within a certain part of a program like a function, and are released when the function call ends.
In the above program the variable c
is called a local variable.
If we assign a value to a variable with the function inner_func()
, the change will only be local and cannot be accessed outside the function.
This is also the same even if names of both global variables and local variables match.
For example, if we have a function as below.
outer_func <- function(){
a <- 20
inner_func <- function(){
a <- 30
print(a)
}
inner_func()
print(a)
}
outer_func()
a <- 10
print(a)
Output
[1] 30 [1] 20 [1] 10
Here, the outer_func()
function is defined, and within it, a local variable a
is assigned the value 20.
Inside outer_func()
, there is an inner_func()
function defined. The inner_func()
function also has its own local variable a
, which is assigned the value 30.
When inner_func()
is called within outer_func()
, it prints the value of its local variable a (30)
. Then, outer_func()
continues executing and prints the value of its local variable a (20)
.
Outside the functions, a global variable a
is assigned the value 10. This code then prints the value of the global variable a (10)
.
Accessing global variables
Global variables can be read but when we try to assign to it, a new local variable is created instead.
To make assignments to global variables, super assignment operator, <<-
, is used.
When using this operator within a function, it searches for the variable in the parent environment frame, if not found it keeps on searching the next level until it reaches the global environment.
If the variable is still not found, it is created and assigned at the global level.
outer_func <- function(){
inner_func <- function(){
a <<- 30
print(a)
}
inner_func()
print(a)
}
outer_func()
print(a)
Output
[1] 30 [1] 30 [1] 30
When the statement a <<- 30
is encountered within inner_func()
, it looks for the variable a
in outer_func()
environment.
When the search fails, it searches in R_GlobalEnv
.
Since, a
is not defined in this global environment as well, it is created and assigned there which is now referenced and printed from within inner_func()
as well as outer_func()
.