map(quos(c(vs, am, gear, carb)),
some_function(mtcars, {{x}})) \(x)
I’m doing some mentoring for Posit Academy’s “Programming in R” course, and the learners in my group have been asking very clever and deep questions about how to use Non-Standard Eval in R functions.
Specifically, the thing that keeps cropping up that I haven’t been able to answer in a satisfying way is iterating through unquoted input.
This blog post from Albert Rapp is excellent pre-reading.
tl;dr
By request, I’m putting the final conclusions up front here for easy reference.
To map()
over unquoted names:
The trick here is you need quos()
to keep map()
from triggering the unquoted code, and then you need tunneling ({x}
) in the anonyous function as you would in any function:
To pass the dots (...
) into across()
:
First you need enquos(...)
to defuse the dots.
The sneaky bit in this one is that across()
wants a vector of unquoted column names to use, and enquos()
returns a list.
So, we splice the list into separate arguments with !!!
and re-concatenate them with c()
.
<- fucntion(data, ...) {
do_stuff
<- enquos(...)
args
|>
data summarize(across(c(!!!args), some_function))
}
Read on to see an example, with the many things I tried that didn’t work, why they didn’t work, and how I fixed it.
Set the scene
For the sake of example, let’s suppose the task I want to do is count how many ones are in a particular column.
I’ve written a nice function, using tunneling ({{}}
) to run on unquoted variable names.
<- function(data, var) {
count_ones
|>
data summarize(
n_ones = sum({{var}} == 1)
|>
) pull(n_ones)
}
count_ones(mtcars, vs)
#> [1] 14
Fabulous. We could clean this output up a bit, but we won’t, because lazy.
So, the question is, what if I want to do this to multiple columns at once?
Option 1: mapping
The challenge here lies in the fact that if we put unquoted variable names into the map()
function, the code “triggers” before it “gets to” the count_ones()
function.
map(c(vs, am, gear, carb),
count_ones(mtcars, x))
\(x) #> Error: object 'vs' not found
One solution is to fall back onto strings for the map()
input and then re-unquote-ify them for use in count_ones()
, which is highly unsatisfying.
map(c("vs", "am", "gear", "carb"),
count_ones(mtcars, !!sym(x)))
\(x) #> [[1]]
#> [1] 14
#>
#> [[2]]
#> [1] 13
#>
#> [[3]]
#> [1] 0
#>
#> [[4]]
#> [1] 7
It’s not terrible but the !!sym(x)
is far from intuitive. I always read !!
as “access the information stored in” and sym
as “turn this from a string to a name”. So, it kind of makes sense - we hand a string to count_ones()
but first we say “Don’t use this string, instead access the information in the name of the string.”
I’m still convinced there’s a better way, though. Or at least, a different way.
What I want to do is find a way to “freeze” the unquoted variable names so they can be passed into count_ones()
.
My first thought was to use quos()
. Here’s now I understand these functions:
quo()
= freeze this one unquoted thingquos()
= freeze this vector of unquoted thingsenquo()
= freeze this unquoted function argumentenquos()
= frees this vector of unquoted function arguments
map(quos(c(vs, am, gear, carb)),
count_ones(mtcars, x))
\(x) #> Error in `map()`:
#> ℹ In index: 1.
#> Caused by error in `summarize()`:
#> ℹ In argument: `n_ones = sum(x == 1)`.
#> Caused by error:
#> ! Base operators are not defined for quosures. Do you need to unquote
#> the quosure?
#>
#> # Bad: myquosure == rhs
#>
#> # Good: !!myquosure == rhs
Wait, this is great! The error is being triggered in sum()
inside of count_ones()
, not inside of map()
. So we did freeze it.
The error message suggests that I need to use !!
inside of count_ones()
to “unfreeze”. I’m skeptical, because I don’t want to unfreeze x
; I want to access the name vs
. Also my goal is not to modify that function.
Instead I think this might just be a missed tunneling, so that the frozen column names get passed through my anonymous function.
map(quos(c(vs, am, gear, carb)),
count_ones(mtcars, {{x}}))
\(x) #> [[1]]
#> [1] 34
Dang I really thought that would work, but it appears that by using quos()
, I’ve accidentally frozen the whole vector together and counted everything in all columns. Which, honestly, is kind of cool - but not what I meant to do.
I really don’t want to have to quo()
each individual column name.
Let me take a look a this output:
quos(c(vs, am, gear, carb))
#> <list_of<quosure>>
#>
#> [[1]]
#> <quosure>
#> expr: ^c(vs, am, gear, carb)
#> env: global
Okay so it froze the whole expression. Maybe we just don’t want the c()
, because quos()
is already concatenating?
quos(vs, am, gear, carb)
#> <list_of<quosure>>
#>
#> [[1]]
#> <quosure>
#> expr: ^vs
#> env: global
#>
#> [[2]]
#> <quosure>
#> expr: ^am
#> env: global
#>
#> [[3]]
#> <quosure>
#> expr: ^gear
#> env: global
#>
#> [[4]]
#> <quosure>
#> expr: ^carb
#> env: global
This is promising! A list of quosures is what we want!
map(quos(vs, am, gear, carb),
count_ones(mtcars, {{x}}))
\(x) #> [[1]]
#> [1] 14
#>
#> [[2]]
#> [1] 13
#>
#> [[3]]
#> [1] 0
#>
#> [[4]]
#> [1] 7
Option 2: Pass the dots
The other clever approach one of my learners took was to rewrite the original function to accept the variable names in the dots (...
).
This works great if you are just sending the variable names along to the next internal function:
<- function(data, ...) {
select_all
|>
data select(...) |>
head()
}
select_all(mtcars, vs, am, gear, carb)
#> vs am gear carb
#> Mazda RX4 0 1 4 4
#> Mazda RX4 Wag 0 1 4 4
#> Datsun 710 1 1 4 1
#> Hornet 4 Drive 1 0 3 1
#> Hornet Sportabout 0 0 3 2
#> Valiant 1 0 3 1
However, of course, this does not just slot in to our function:
<- function(data, ...) {
count_ones
|>
data summarize(
n_ones = sum(... == 1)
|>
) pull(n_ones)
}
count_ones(mtcars, vs, am, gear, carb)
#> Error in `summarize()`:
#> ℹ In argument: `n_ones = sum(... == 1)`.
#> Caused by error:
#> ! object 'vs' not found
The tidy approach to doing something to many columns is to use across()
:
|>
mtcars summarize(
across(c(vs, am, gear, carb),
~sum(.x == 1)
))#> vs am gear carb
#> 1 14 13 0 7
But inside of a function, this fails:
<- function(data, ...) {
count_ones
|>
mtcars summarize(
across(...,
~sum(.x == 1)
))
}
count_ones(mtcars, vs, am, gear, carb)
#> Error in `summarize()`:
#> ℹ In argument: `across(..., ~sum(.x == 1))`.
#> Caused by error in `across()`:
#> ! Can't compute column `vs`.
#> Caused by error:
#> ! object 'gear' not found
I surmise this is an arguments problem: across()
expects a single argument, which is a vector of the column names, while the dots are passing the inputs along as four separate arguments.
My first instinct was to use dots_list()
to smush the dots inputs into a single list object to hand to across()
. But this fails for perhaps predictable reasons:
<- function(data, ...) {
count_ones
<- dots_list(...)
args
|>
mtcars summarize(
across(args,
~sum(.x == 1)
))
}
count_ones(mtcars, vs, am, gear, carb)
#> Error: object 'vs' not found
Ye Olde NSE strikes again: dots_list()
is triggering the unquoted names to be evaluated, so vs
not found.
Well, we did just learn that quos()
will get us a list of quosures, so let’s hit the dots with that:
<- function(data, ...) {
count_ones
<- enquos(...)
args
|>
mtcars summarize(
across(args,
~sum(.x == 1)
))
}
count_ones(mtcars, vs, am, gear, carb)
#> Error in `summarize()`:
#> ℹ In argument: `across(args, ~sum(.x == 1))`.
#> Caused by error in `across()`:
#> ! Can't select columns with `args`.
#> ✖ `args` must be numeric or character, not a <quosures/list> object.
Alright, so across()
can’t handle a list. One thing we could definitely do at this point is just move our map()
approach to inside of the function:
<- function(data, ...) {
count_ones
<- enquos(...)
args
map(args,
count_ones(mtcars, {{x}}))
\(x)
}
count_ones(mtcars, vs, am, gear, carb)
Friends. I did not mean to put count_ones
inside of itself. The above code fully crashed my R Session, with this delightful error.
Let’s try this again.
<- function(data, ...) {
count_ones
<- enquos(...)
args
map(args,
\(x) |>
mtcars summarize(
n_ones = sum({{x}} == 1)
|>
) pull(n_ones))
}
count_ones(mtcars, vs, am, gear, carb)
#> [[1]]
#> [1] 14
#>
#> [[2]]
#> [1] 13
#>
#> [[3]]
#> [1] 0
#>
#> [[4]]
#> [1] 7
I’m tired and this is getting long …but I still really want to defeat the across()
problem, because the ...
+ across()
seems like an extremely handy construct.
There is one “free” solution, which is to just reduce our dataset to the columns we care about, and then tell across()
to apply to everything()
:
<- function(data, ...) {
count_ones
|>
mtcars select(...) |>
summarize(
across(everything(),
~sum(.x == 1)
))
}
count_ones(mtcars, vs, am, gear, carb)
#> vs am gear carb
#> 1 14 13 0 7
This would probably be fine for every use case I can think of. But it’s not technically the same as using across()
directly, because if you use across()
inside mutate()
it will keep all the other columns.
Exhibit A:
|>
mtcars mutate(
across(c(vs, am, gear, carb),
sqrt)|>
) head()
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 2.000000 2.000000
#> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 2.000000 2.000000
#> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 2.000000 1.000000
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 1.732051 1.000000
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 1.732051 1.414214
#> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 1.732051 1.000000
Exhibit B:
|>
mtcars select(vs, am, gear, carb) |>
mutate(
across(everything(),
sqrt|>
)) head()
#> vs am gear carb
#> Mazda RX4 0 1 2.000000 2.000000
#> Mazda RX4 Wag 0 1 2.000000 2.000000
#> Datsun 710 1 1 2.000000 1.000000
#> Hornet 4 Drive 1 0 1.732051 1.000000
#> Hornet Sportabout 0 0 1.732051 1.414214
#> Valiant 1 0 1.732051 1.000000
Team, we gotta crack this so I can go to bed. Let’s take stock:
We know how to “freeze” the variable names from the dots into a list of quosures with
enquos()
We need to find a way to pass that information as a vector object to
across()
.
Since this is a post about punctuation, let’s bring in the big guns: the TRIPLE BANG!!!
This guy !!!
is one of my all time favorite tricks. It lets you turn a list of things into separate function arguments, which is called splicing.
<- quos(vs, am, gear, carb)
args
## won't work, because it looks for the column named 'args'
|>
mtcars select(args) |>
head()
#> Error in `select()`:
#> ! Can't select columns with `args`.
#> ✖ `args` must be numeric or character, not a <quosures/list> object.
## will work, because it splices the contents of the `args` vector into separate inputs to select
|>
mtcars select(!!!args) |>
head()
#> vs am gear carb
#> Mazda RX4 0 1 4 4
#> Mazda RX4 Wag 0 1 4 4
#> Datsun 710 1 1 4 1
#> Hornet 4 Drive 1 0 3 1
#> Hornet Sportabout 0 0 3 2
#> Valiant 1 0 3 1
The bad news: What we want here is the opposite of splicing: we want our list of quosures to become a vector of quosures.
The good news: If only we had a function that takes multiple arguments and concatenates them into a vector….
<- function(data, ...) {
count_ones
<- enquos(...)
args
|>
data summarize(
across(c(!!!args),
~sum(.x == 1)
))
}
count_ones(mtcars, vs, am, gear, carb)
#> vs am gear carb
#> 1 14 13 0 7
Boom! It still feels a little annoying to me that we had to freeze - splice - concatenate, that feels like too many steps, but I’ll take it. I can go to bed unfrustrated!
Thus ends my stream-of-consciousness journey into NSE. If you came along with me this far, thanks for hanging out, and let me know if there is any rlang
trickery that I missed!