Profiling Code

2026-03-26

Outline

Good or bad code?
timing code
profiling tools

Signs of inexperienced programmers

Violation of the language model
- e.g. R is vector oriented language, i.e. for loops should be used sparingly,
- e.g. use of rbind (it is known to be very slow)
Insufficient knowledge of the language. We’re all guilty of that at different levels
Reading other people’s code helps improve our language knowledge!

Removing for loops

x is vector with numbers 1,2,3,4

Objective: re-code these into weather events: “Sun”, “Rain”, “Snow”, “Hail”

x <- sample.int(4, size=100, replace = TRUE)
y <- vector("character", length=length(x))
for (i in 1:length(x)) {
   y[i] <- switch (x[i], "Sun", "Rain", "Snow", "Hail", default=NA)
}

table(y)

y
Hail Rain Snow  Sun 
  24   25   28   23

Removing for loops (cont’d)

# the switch statement on the previous slide is equivalent to 
# the following lines:
for (i in 1:length(x)) {
  if (x[i] == 1) y[i] <- "Sun"
  if (x[i] == 2) y[i] <- "Rain"
  if (x[i] == 3) y[i] <- "Snow"
  if (x[i] == 4) y[i] <- "Hail"
}

# using R's vector system we can reduce this to a single line:
y <- c("Sun", "Rain", "Snow", "Hail")[x]

Removing for loops (cont’d)

All of the previous solutions doctor around symptoms

We are still breaking the language model :)

y <- factor(x, labels =  c("Sun", "Rain", "Snow", "Hail"))

summary(y)

 Sun Rain Snow Hail 
  23   25   28   24

Spotting bad code

Ken Thompson:
keep the number of indentations down, high number of local variables is suspicious
Hadley Wickham:
functions with more than ten lines are suspicious, as is highly repetitive code
Generally Good Practices:
structure code using indentation and spacing, write comments, use clear naming convention

Profiling

Code can be profiled at different levels:

two (or more) alternatives of code with the same objectives are timed to determined the one which is the fastest (system.time() and proc.time())
one piece of code is analyzed to find weak/slow spots (e.g. profvis )

Profiling process

Collect information on code performance:
memory usage
frequency and duration of function calls
Goal: optimization of code

`system.time()` or `proc.time()`

Probably simplest of all measures
Evaluates the time spent in the whole expression
doesn’t give any clues about what is done at a lower level

(Mini) code blocks

ptm <- proc.time()
t1 <- read.csv("data/nasadata.csv")
proc.time() - ptm

   user  system elapsed 
  0.226   0.021   0.263

ptm <- proc.time()
t2 <- readr::read_csv("data/nasadata.csv", progress = FALSE)
proc.time() - ptm

   user  system elapsed 
  0.436   0.069   0.621

(Mini) code blocks

ptm <- proc.time()
t3 <- readRDS("data/nasadata.rds")
proc.time() - ptm

   user  system elapsed 
  0.025   0.001   0.028

Your Turn - which option is fastest?

x <- sample.int(4, 100, replace=TRUE)
y <- vector("character", length=length(x))

# Option #1
for (i in 1:length(x)) {
   y[i] <- switch (x[i], "Sun", "Rain", "Snow", "Hail", default=NA)
}

# Option #2
for (i in 1:length(x)) {
  if (x[i] == 1) y[i] <- "Sun"
  if (x[i] == 2) y[i] <- "Rain"
  if (x[i] == 3) y[i] <- "Snow"
  if (x[i] == 4) y[i] <- "Hail"
}

# Option #3
y <- c("Sun", "Rain", "Snow", "Hail")[x]

# Option #4
y <- factor(x)
levels(y) <- c("Sun", "Rain", "Snow", "Hail")

Based on 500 evaluations of samples of size 5000:

Profiling Code

Option 3 is fastest on average. But why?

profvis package:

wrapper around function Rprof
call stack is written every 0.02 seconds (interval) into a temporary file
output file is processed to produce a summary of the usage
time measurements varies by platform: on Unix-alike time is CPU time of the R process (excludes time waiting for input)
Note: profiling takes time, too; once the timer goes off, the information is not recorded until the next timing click (probably in the range 1–10 msecs). Can’t go below that.

`profvis` example

library(profvis)

p <- profvis({          
  data(diamonds, package = "ggplot2")           
  plot(price ~ carat, data = diamonds)          
  m <- lm(price ~ carat, data = diamonds)           
  abline(m, col = "red")            
})

profvis creates interactive response with code and flame plot.

Copy and paste the code into your console to run it.

Profiling Code

Outline

Signs of inexperienced programmers

Removing for loops

Removing for loops (cont’d)

Removing for loops (cont’d)

Spotting bad code

Profiling

Profiling process

system.time() or proc.time()

(Mini) code blocks

(Mini) code blocks

Your Turn - which option is fastest?

Based on 500 evaluations of samples of size 5000:

Profiling Code

profvis example

`system.time()` or `proc.time()`

`profvis` example