Tidy Evaluation

2026-02-26

Tidy Evaluation is comprised of

  • quasiquotation,

  • quosures,

  • and data masks

Quosures

Idea: capture an expression with its environment (for later evaluation)

  • functions enquo and enquos (and lesser used, quo and quos):
quo(x <- 2*exp(1))
<quosure>
expr: ^x <- 2 * exp(1)
env:  global
f <- function(x) enquo(x)

f(a+b+d/2)
<quosure>
expr: ^a + b + d / 2
env:  global
  • also: new_quosure
qu <- new_quosure(expr(x^2+y), env(x = 2, y = 1))
qu
<quosure>
expr: ^x^2 + y
env:  0x7fbda79ce780
eval_tidy(qu)
[1] 5

Quosures

f <- function(...) {
  x <- 1
  g(..., f = x)
}
g <- function(...) {
  enquos(...)
}

x <- 0
qs <- f(global = x)
qs
<list_of<quosure>>

$global
<quosure>
expr: ^x
env:  global

$f
<quosure>
expr: ^x
env:  0x7fbd87f03bd0

q4 <- new_quosure(expr(x + y + z))

q4
<quosure>
expr: ^x + y + z
env:  global
q4 |> get_expr()
x + y + z
q4 |> get_expr() |> expr_text()
[1] "x + y + z"
q4 |> get_env()
<environment: R_GlobalEnv>

Your Turn

From Advanced R:

  1. Predict what each of the following quosures will return if evaluated.
x <- 0
q1 <- new_quosure(expr(x), env(x = 1))
q2 <- new_quosure(expr(x + !!q1), env(x = 10))
q3 <- new_quosure(x + eval_tidy(q2), env(x = 100))
  1. Write an enenv() function that captures the environment associated with an argument. (Hint: this should only require two function calls.)

enenv <- function(x) {
  qu <- enquo(x)
  get_env(qu)
}

enenv(q3)
<environment: R_GlobalEnv>

Data Masks

Sometimes we are dealing with expressions that contain both variables (symbols from a data frame) and objects from an environment

In evaluations we need to consider both the data frame and the environment

The function eval_tidy has data as a third argument; if it is specified, it will be used first when evaluating expressions:

eval_tidy(expr(mpg), data = mtcars)
 [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
[16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
[31] 15.0 21.4

Tidyverse programming

Data masking introduces some ambiguity about the environment in which each object lives, e.g

1/mpg / mile_to_100km * gallons_to_liter

In the .data or in the parent environment .env?

gallons_to_liter =  3.79
mile_to_100km = 1.60934 /100

mtcars |> 
  dplyr::mutate(
    lper100km = 1/mpg / mile_to_100km * gallons_to_liter
) |> dplyr::select(mpg, lper100km) |> dplyr::arrange(lper100km)
                     mpg lper100km
Toyota Corolla      33.9  6.946911
Fiat 128            32.4  7.268527
Honda Civic         30.4  7.746719
Lotus Europa        30.4  7.746719
Fiat X1-9           27.3  8.626383
Porsche 914-2       26.0  9.057703
Merc 240D           24.4  9.651650
Datsun 710          22.8 10.328959
Merc 230            22.8 10.328959
Toyota Corona       21.5 10.953501
Hornet 4 Drive      21.4 11.004685
Volvo 142E          21.4 11.004685
Mazda RX4           21.0 11.214298
Mazda RX4 Wag       21.0 11.214298
Ferrari Dino        19.7 11.954328
Merc 280            19.2 12.265639
Pontiac Firebird    19.2 12.265639
Hornet Sportabout   18.7 12.593597
Valiant             18.1 13.011064
Merc 280C           17.8 13.230352
Merc 450SL          17.3 13.612732
Merc 450SE          16.4 14.359772
Ford Pantera L      15.8 14.905080
Dodge Challenger    15.5 15.193566
Merc 450SLC         15.2 15.493439
AMC Javelin         15.2 15.493439
Maserati Bora       15.0 15.700018
Chrysler Imperial   14.7 16.020426
Duster 360          14.3 16.468550
Camaro Z28          13.3 17.706787
Cadillac Fleetwood  10.4 22.644256
Lincoln Continental 10.4 22.644256

Data Pronouns

.data$x is always the x variable in the data frame .data

.env$x is always the x object in the environment .env

When data masking is used in programming, the expectation of R CMD check is that the ambiguity is resolved

e.g.:

  ghc: no visible binding for global variable ‘cluster’

is resolved by using .data$cluster in the code (this introduces a dependence to the rlang package)

Your Turn

Can we implement a function my_select now that works the same way as dplyr::select?

my_select <- function (.data, ...) {

}

From Advanced R

select2 <- function(data, ...) {
  dots <- enquos(...)

  vars <- as.list(set_names(seq_along(data), names(data)))
  cols <- unlist(purrr::map(dots, eval_tidy, vars))

  data[, cols, drop = FALSE]
}
select2(mtcars, 1:2, "wt", dplyr::starts_with("d"))

Still a no.

my_select with a cheat

my_select <- function (.data, ...) {
  vars <- enquos(...)
  
  res <- lapply(vars, FUN = function(x) {
    e <- get_expr(x)
    if (is_symbol(e)) e <- as_string(e)
    if (is_call(e)) e <- tidyselect::eval_select(e, data=.data)
    `[`(.data, e) 
  }) |> data.frame()
}
mtcars |> dplyr::select("mpg", cyl, 3, tidyr::starts_with("d")) |> head()
                   mpg cyl disp drat
Mazda RX4         21.0   6  160 3.90
Mazda RX4 Wag     21.0   6  160 3.90
Datsun 710        22.8   4  108 3.85
Hornet 4 Drive    21.4   6  258 3.08
Hornet Sportabout 18.7   8  360 3.15
Valiant           18.1   6  225 2.76