Look at your objects

random
Author

Kelly Bodwin

Published

October 31, 2024

Random thought today: There are a lot of ways to “check in” on your intermediate objects in R.

It’s definitely good practice and something I have trouble pushing my students to do. Maybe I need to be more deliberate about how to do it.

So, there’s the classic way of just printing it out. This is fine. I tend to peek at my objects this way, except I do the peeking in the console… I can NOT get my students to adopt a workflow that pops between notebook and console though. Maybe it’s not the best.

bob <- 1:10
bob
 [1]  1  2  3  4  5  6  7  8  9 10

Semicolons

I also see this in some folks’ code:

bob <- 1:10; bob
 [1]  1  2  3  4  5  6  7  8  9 10

I especially see it in plotting with ggplot for some reason:

library(tidyverse)
library(palmerpenguins)

p <- ggplot(penguins, aes(x = body_mass_g)) +
  geom_histogram(); p

I do not like this at all. Perhaps it’s a bias against semicolons, I thought I left those behind when I gave up on Java. But I don’t like the print statement being hidden on a line with code.

Parentheses

Now, a student taught me this trick, and I think it’s super rad:

(bob <- 1:10)
 [1]  1  2  3  4  5  6  7  8  9 10

But it does get a bit inelegant/cumbersome with multiline code and pipelines in my opinion:

(pen_ad <- penguins %>%
  filter(species == "Adelie") %>%
   summarize(mean(body_mass_g, na.rm = TRUE)))
# A tibble: 1 × 1
  `mean(body_mass_g, na.rm = TRUE)`
                              <dbl>
1                             3701.

Looking Inside Pipelines

Speaking of pipelines, I’m on the fence about the best way to “check in” on progress of a long pipeline. I tend to just highlight part of the pipeline and Cmd+Enter to run that section. But that’s kinda unreproducible and also gets annoying if I’m doing it many times.

Students tend to delete or comment out segments of pipelines and I do NOT like this, it’s so unwieldy.

Using the “passthrough” pipe

magrittr has a cute pipe %T>% that means “do this next step but don’t pass its results”, which we can use in conjunction with print() to check stuff.

It’s almost perfect but the necessity of print() and the subtlety of the %T>% pipe (it’s easy to miss) annoy me a bit.

library(magrittr)

penguins %>%
  filter(species == "Adelie") %T>%
  print() %>%  
  summarize(n_rows = n())
# A tibble: 152 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
 1 Adelie  Torgersen           39.1          18.7               181        3750
 2 Adelie  Torgersen           39.5          17.4               186        3800
 3 Adelie  Torgersen           40.3          18                 195        3250
 4 Adelie  Torgersen           NA            NA                  NA          NA
 5 Adelie  Torgersen           36.7          19.3               193        3450
 6 Adelie  Torgersen           39.3          20.6               190        3650
 7 Adelie  Torgersen           38.9          17.8               181        3625
 8 Adelie  Torgersen           39.2          19.6               195        4675
 9 Adelie  Torgersen           34.1          18.1               193        3475
10 Adelie  Torgersen           42            20.2               190        4250
# ℹ 142 more rows
# ℹ 2 more variables: sex <fct>, year <int>
# A tibble: 1 × 1
  n_rows
   <int>
1    152

(Honestly, I wish we in the tidyverse sphere used the other magrittr pipes more. Maybe another mini-post one day…)

Summary functions that return x

Finally, you might just use glimpse() in a pipeline, since it invisibly returns the data frame as well as printing a summary, so it can flow through the pipeline:

penguins %>%
  filter(species == "Adelie") %>%
  glimpse() %>%  
  summarize(n_rows = n())
Rows: 152
Columns: 8
$ species           <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
$ island            <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
$ bill_length_mm    <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
$ bill_depth_mm     <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
$ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
$ body_mass_g       <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
$ sex               <fct> male, female, female, NA, female, male, female, male…
$ year              <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…
# A tibble: 1 × 1
  n_rows
   <int>
1    152

My problem here is simply that I don’t love glimpse()… if I’m verifying a pipeline step, I’d rather just see the raw data.

Googling around lead me to textreadr::peek(), which seems to be exactly that:

# remotes::install("trinker/textreadr")
library(textreadr)

penguins %>%
  filter(species == "Adelie") %>%
  peek() %>%  
  summarize(n_rows = n())
Table: [152 x 8]

   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex    year
1  Adelie  Torgersen 39.1           18.7          181               3750        male   2007
2  Adelie  Torgersen 39.5           17.4          186               3800        female 2007
3  Adelie  Torgersen 40.3           18            195               3250        female 2007
4  Adelie  Torgersen <NA>           <NA>          <NA>              <NA>        <NA>   2007
5  Adelie  Torgersen 36.7           19.3          193               3450        female 2007
6  Adelie  Torgersen 39.3           20.6          190               3650        male   2007
7  Adelie  Torgersen 38.9           17.8          181               3625        female 2007
8  Adelie  Torgersen 39.2           19.6          195               4675        male   2007
9  Adelie  Torgersen 34.1           18.1          193               3475        <NA>   2007
10 Adelie  Torgersen 42             20.2          190               4250        <NA>   2007
.. ...     ...       ...            ...           ...               ...         ...    ...  
# A tibble: 1 × 1
  n_rows
   <int>
1    152

It’s not on CRAN anymore (sadface). Also, tibbles get downgraded to data.frames. But still, I like this a lot.

Conclusion

So, no perfect solution for pipelines that I know of. And all these options will also print their output in a rendered qmd/Rmd - so they have the same issue as print debugging in that you have to remember to go back and remove code when you are finished developing.

I think my personal wishlist would be, in no particular order:

  • A dplyr::peek() function.

  • A “print and pass” pipe that could be used in a pipeline without needing a function.

  • Some kind of interactive tool in Quarto that would let you flag lines to be previewed upon chunk run, without them being printed in a rendered doc.

Thoughts? Ideas?