DISCLAIMER

The present benchmark was conducted on a Samsung laptop (550XCJ/550XCR, Intel i7, 8GB RAM) on Ubuntu 22.04.4 LTS x86_64, which is a pretty average machine. For similar hardware (e.g., Intel i5, 16GB RAM), the results should hold. In the other hand, for slower hardware (e.g., budget VMs like free plan on Posit Cloud, which is single core, 1GB RAM) results may vary. For faster machines (e.g., 16-core, 64GB RAM 1, Github Codespaces), differences are only marginal. That is, for fast enough machines, performance isn’t an issue and even base R code should be fast enough for input-output analysis with big matrices.

Introduction

This vignette presents a benchmarking analysis comparing the performance of functions from the fio package with equivalent base R functions. The fio package provides a set of functions for input-output analysis, a method used in economics to analyze the interdependencies between different sectors of an economy.

In this document, we will focus on two key functions: the technical coefficients matrix calculation and the Leontief inverse matrix calculation. These functions are fundamental to input-output analysis, and their performance can significantly impact the speed of larger analyses.

Our benchmarking tests, which involve running these functions repeatedely in simulated datasets, show that the fio package functions are faster than the equivalent base R functions. This improved performance can make a substantial difference in larger analyses, making the fio package a valuable tool for input-output analysis in R.

The tests were run on a simulated 2000×20002000 \times 2000 matrix, and each test was repeated 100 times to account for variability. Please note that the results of this benchmarking analysis are dependent on the specific test datasets used and the hardware on which the algorithms were run. Therefore, the results should be interpreted in the context of these specific conditions.

Technical coefficients matrix

The technical coefficients matrix calculation, a key and initial step in input-output analysis, was tested using the compute_tech_coeff() function from the fio package, equivalent functions from the leontief package, and a base R implementation. It consists on dividing each aija_{ij} element of intermediate transactions matrix by the correspondent xjx_j element of total production vector1. The results show that both fio and leontief functions execute almost instantaneously, with fio slightly faster (about 5 milliseconds apart). In contrast, the base R implementation is about 150 times slower than fio.

# set seed
set.seed(100)

# data
matrix_dim <- 2000
intermediate_transactions <- matrix(
  as.double(sample(1:1000, matrix_dim^2, replace = TRUE)),
  nrow = matrix_dim,
  ncol = matrix_dim
)
total_production <- matrix(
  as.double(sample(4000000:6000000, matrix_dim, replace = TRUE)),
  nrow = 1,
  ncol = matrix_dim
)

# Base R function
tech_coeff_r <- function(intermediate_transactions, total_production) {
  tech_coeff_matrix <- intermediate_transactions %*% diag(1 / as.vector(total_production))
  return(tech_coeff_matrix)
}

# {fio} setup
iom_fio <- fio::iom$new("iom", intermediate_transactions, total_production)

# benchmark
benchmark_a <- microbenchmark::microbenchmark(
  fio = fio:::compute_tech_coeff(intermediate_transactions, total_production),
  `Base R` = tech_coeff_r(intermediate_transactions, total_production),
  leontief = leontief::input_requirement(intermediate_transactions, total_production),
  times = 100
)
print(benchmark_a)
#> Unit: milliseconds
#>      expr        min         lq       mean     median         uq        max neval
#>       fio   19.79948   27.52306   32.46416   30.59574   35.91537   89.63771   100
#>    Base R 5397.60696 5871.49346 6400.70644 6236.49741 7004.75749 7892.10966   100
#>  leontief   25.69267   33.26089   40.73125   38.84038   44.42781  112.25745   100

# plot
ggplot2::autoplot(benchmark_a)
\label{fig:benchmark_a}Base R is about 100 times slower than {fio} and {leontief} functions.

Base R is about 100 times slower than {fio} and {leontief} functions.

Leontief inverse matrix

When we’re talking about inverting a 2000×20002000 \times 2000 there’s a lot more work involved. Leontief matrix (LL) is obtained from subtracting the technical coefficients matrix (AA) from the identity matrix (II), therefore it has no null rows or columns.

L=IAL = I - A

It allows for solving the linear system through LU decomposition, which is a more efficient method than the direct inverse matrix calculation. fio takes advantage of LU decomposition and becomes incredibly faster, while leontief is over 10 times slower, followed closed by the over 20 times slower base R implementation.

# data
iom_fio$compute_tech_coeff()
technical_coefficients_matrix <- iom_fio$technical_coefficients_matrix

# base R function
leontief_inverse_r <- function(technical_coefficients_matrix) {
  dim <- nrow(technical_coefficients_matrix)
  leontief_inverse_matrix <- solve(diag(dim) - technical_coefficients_matrix)
  return(leontief_inverse_matrix)
}

# benchmark
benchmark_b <- microbenchmark::microbenchmark(
  fio = fio:::compute_leontief_inverse(technical_coefficients_matrix),
  `Base R` = leontief_inverse_r(technical_coefficients_matrix),
  leontief = leontief::leontief_inverse(technical_coefficients_matrix),
  times = 100
)
print(benchmark_b)
#> Unit: milliseconds
#>      expr       min        lq      mean    median        uq      max neval
#>       fio  247.3609  285.6822  312.9184  301.5624  329.7859  543.802   100
#>    Base R 5177.8100 5531.5195 5700.7660 5596.1821 5722.3004 7489.067   100
#>  leontief 4671.1566 4970.6219 5143.4911 5065.7830 5216.6950 6979.434   100

# plot
ggplot2::autoplot(benchmark_b)
\label{fig:figs} {fio} is about 20 times faster than {leontief} and base R functions.

{fio} is about 20 times faster than {leontief} and base R functions.

Sensitivity of dispersion coefficients of variation

To represent linkage-based functions performance, we compute benchmark for sensitivity of dispersion coefficients of variation. All 3 implementations executes almost instantaneously, with fio being slightly slower than leontief and base R functions.

# data
iom_fio$compute_leontief_inverse()
leontief_inverse_matrix <- iom_fio$leontief_inverse_matrix

# base R function
sensitivity_r <- function(B) {
  n <- nrow(B)
  SL = rowSums(B)
  ML = SL / n
  (((1 / (n - 1)) * (colSums((B - ML) ** 2))) ** 0.5) / ML
}

# benchmark
benchmark_c <- microbenchmark::microbenchmark(
  fio = fio:::compute_sensitivity_dispersion_cv(leontief_inverse_matrix),
  `Base R` = sensitivity_r(leontief_inverse_matrix),
  leontief = leontief::sensitivity_dispersion_cv(leontief_inverse_matrix),
  times = 100
)
print(benchmark_c)
#> Unit: milliseconds
#>      expr      min       lq     mean   median       uq      max neval
#>       fio 18.84368 19.14402 20.55910 19.33092 21.95244 25.82193   100
#>    Base R 20.70250 21.81017 23.14538 22.41280 24.08331 34.13879   100
#>  leontief 14.07486 14.33243 14.60074 14.44927 14.58205 21.89314   100
ggplot2::autoplot(benchmark_c)
\label{fig:benchmark_c} {fio} is about 15 milliseconds slower than {leontief} and 2 milliseconds slower than base R functions.

{fio} is about 15 milliseconds slower than {leontief} and 2 milliseconds slower than base R functions.

Field of influence

Since field of influence involves computing Leontief inverse matrix for each element of technical coefficients matrix after an increment, it can be demanding for high dimensional matrices. Here, we evaluate benchmark for base R function and {fio}, since there’s no similiar function in {leontief}. For brevity, we cut dimensions to 100 and repetitions to 10.

# data
matrix_dim <- 100
intermediate_transactions <- matrix(
  as.double(sample(1:1000, matrix_dim^2, replace = TRUE)),
  nrow = matrix_dim,
  ncol = matrix_dim
)
total_production <- matrix(
  as.double(sample(4000000:6000000, matrix_dim, replace = TRUE)),
  nrow = 1,
  ncol = matrix_dim
)
iom_fio_reduced <- fio::iom$new(
  "iom_reduced",
  intermediate_transactions,
  total_production
)$compute_tech_coeff()$compute_leontief_inverse()

# base R function
field_influence_r <- function(A, B, ee = 0.001) {
  n = nrow(A)
  I = diag(n)
  E = matrix(0, ncol = n, nrow = n)
  SI = matrix(0, ncol = n, nrow = n)
  for (i in 1:n) {
    for (j in 1:n) {
      E[i, j] = ee
      AE = A + E
      BE = solve(I - AE)
      FE = (BE - B) / ee
      FEq = FE * FE
      S = sum(FEq)
      SI[i, j] = S
      E[i, j] = 0
    }
  }
}

# benchmark
benchmark_d <- microbenchmark::microbenchmark(
  fio = fio:::compute_field_influence(
    iom_fio_reduced$technical_coefficients_matrix,
    iom_fio_reduced$leontief_inverse_matrix,
    0.001
  ),
  `Base R` = field_influence_r(iom_fio_reduced$technical_coefficients_matrix, iom_fio_reduced$leontief_inverse_matrix),
  times = 10
)
print(benchmark_d)
#> Unit: seconds
#>    expr     min       lq     mean   median       uq      max neval
#>     fio 2.32139 2.581798 2.587570 2.630586 2.669976 2.713694    10
#>  Base R 7.78439 7.824686 7.972608 7.847324 7.934813 8.490848    10
ggplot2::autoplot(benchmark_d)
\label{fig:benchmark_d} {fio} is the best option for input-output analysis when matrix inversion is needed.

{fio} is the best option for input-output analysis when matrix inversion is needed.


  1. Or in a equivalent way, multiplying intermediate transactions matrix by a diagonal matrix constructed from total production vector.↩︎