Profiling Code

2026-03-26

Outline

  • Good or bad code?
  • timing code
  • profiling tools

Signs of inexperienced programmers

  • Violation of the language model

    • e.g. R is vector oriented language, i.e. for loops should be used sparingly,
    • e.g. use of rbind (it is known to be very slow)
  • Insufficient knowledge of the language. We’re all guilty of that at different levels

  • Reading other people’s code helps improve our language knowledge!

Removing for loops

x is vector with numbers 1,2,3,4

Objective: re-code these into weather events: “Sun”, “Rain”, “Snow”, “Hail”

x <- sample.int(4, size=100, replace = TRUE)
y <- vector("character", length=length(x))
for (i in 1:length(x)) {
   y[i] <- switch (x[i], "Sun", "Rain", "Snow", "Hail", default=NA)
}

table(y)
y
Hail Rain Snow  Sun 
  24   25   28   23 

Removing for loops (cont’d)

# the switch statement on the previous slide is equivalent to 
# the following lines:
for (i in 1:length(x)) {
  if (x[i] == 1) y[i] <- "Sun"
  if (x[i] == 2) y[i] <- "Rain"
  if (x[i] == 3) y[i] <- "Snow"
  if (x[i] == 4) y[i] <- "Hail"
}

# using R's vector system we can reduce this to a single line:
y <- c("Sun", "Rain", "Snow", "Hail")[x]

Removing for loops (cont’d)

All of the previous solutions doctor around symptoms

We are still breaking the language model :)

y <- factor(x, labels =  c("Sun", "Rain", "Snow", "Hail"))

summary(y)
 Sun Rain Snow Hail 
  23   25   28   24 

Spotting bad code

  • Ken Thompson:
    keep the number of indentations down, high number of local variables is suspicious

  • Hadley Wickham:
    functions with more than ten lines are suspicious, as is highly repetitive code

  • Generally Good Practices:
    structure code using indentation and spacing, write comments, use clear naming convention

Profiling

Code can be profiled at different levels:

  1. two (or more) alternatives of code with the same objectives are timed to determined the one which is the fastest (system.time() and proc.time())

  2. one piece of code is analyzed to find weak/slow spots (e.g. profvis )

Profiling process

  • Collect information on code performance:

  • memory usage

  • frequency and duration of function calls

  • Goal: optimization of code

system.time() or proc.time()

  • Probably simplest of all measures

  • Evaluates the time spent in the whole expression

  • doesn’t give any clues about what is done at a lower level

(Mini) code blocks

ptm <- proc.time()
t1 <- read.csv("data/nasadata.csv")
proc.time() - ptm
   user  system elapsed 
  0.226   0.021   0.263 
ptm <- proc.time()
t2 <- readr::read_csv("data/nasadata.csv", progress = FALSE)
proc.time() - ptm
   user  system elapsed 
  0.436   0.069   0.621 

(Mini) code blocks

ptm <- proc.time()
t3 <- readRDS("data/nasadata.rds")
proc.time() - ptm
   user  system elapsed 
  0.025   0.001   0.028 

Your Turn - which option is fastest?

x <- sample.int(4, 100, replace=TRUE)
y <- vector("character", length=length(x))

# Option #1
for (i in 1:length(x)) {
   y[i] <- switch (x[i], "Sun", "Rain", "Snow", "Hail", default=NA)
}

# Option #2
for (i in 1:length(x)) {
  if (x[i] == 1) y[i] <- "Sun"
  if (x[i] == 2) y[i] <- "Rain"
  if (x[i] == 3) y[i] <- "Snow"
  if (x[i] == 4) y[i] <- "Hail"
}

# Option #3
y <- c("Sun", "Rain", "Snow", "Hail")[x]

# Option #4
y <- factor(x)
levels(y) <- c("Sun", "Rain", "Snow", "Hail")

Based on 500 evaluations of samples of size 5000:

Profiling Code

Option 3 is fastest on average. But why?

profvis package:

  • wrapper around function Rprof

  • call stack is written every 0.02 seconds (interval) into a temporary file

  • output file is processed to produce a summary of the usage

  • time measurements varies by platform: on Unix-alike time is CPU time of the R process (excludes time waiting for input)

  • Note: profiling takes time, too; once the timer goes off, the information is not recorded until the next timing click (probably in the range 1–10 msecs). Can’t go below that.

profvis example

library(profvis)

p <- profvis({          
  data(diamonds, package = "ggplot2")           
  plot(price ~ carat, data = diamonds)          
  m <- lm(price ~ carat, data = diamonds)           
  abline(m, col = "red")            
})

profvis creates interactive response with code and flame plot.

Copy and paste the code into your console to run it.