The present benchmark was conducted on a Samsung laptop (550XCJ/550XCR, Intel i7, 8GB RAM) on Ubuntu 22.04.4 LTS x86_64, which is a pretty average machine. For similar hardware (e.g., Intel i5, 16GB RAM), the results should hold. In the other hand, for slower hardware (e.g., budget VMs like free plan on Posit Cloud, which is single core, 1GB RAM) results may vary. For faster machines (e.g., 16-core, 64GB RAM 1, Github Codespaces), differences are only marginal. That is, for fast enough machines, performance isn’t an issue and even base R code should be fast enough for input-output analysis with big matrices.
This vignette presents a benchmarking analysis comparing the
performance of functions from the fio
package with
equivalent base R functions. The fio
package provides a set
of functions for input-output analysis, a method used in economics to
analyze the interdependencies between different sectors of an
economy.
In this document, we will focus on two key functions: the technical coefficients matrix calculation and the Leontief inverse matrix calculation. These functions are fundamental to input-output analysis, and their performance can significantly impact the speed of larger analyses.
Our benchmarking tests, which involve running these functions
repeatedely in simulated datasets, show that the fio
package functions are faster than the equivalent base R functions. This
improved performance can make a substantial difference in larger
analyses, making the fio
package a valuable tool for
input-output analysis in R.
The tests were run on a simulated matrix, and each test was repeated 100 times to account for variability. Please note that the results of this benchmarking analysis are dependent on the specific test datasets used and the hardware on which the algorithms were run. Therefore, the results should be interpreted in the context of these specific conditions.
The technical coefficients matrix calculation, a key and initial step
in input-output analysis, was tested using the
compute_tech_coeff()
function from the fio
package, equivalent functions from the leontief package,
and a base R implementation. It consists on dividing each
element of intermediate transactions matrix by the correspondent
element of total production vector1. The results show that both
fio and leontief functions execute almost
instantaneously, with fio slightly faster (about 5
milliseconds apart). In contrast, the base R implementation is about 150
times slower than fio.
# set seed
set.seed(100)
# data
matrix_dim <- 2000
intermediate_transactions <- matrix(
as.double(sample(1:1000, matrix_dim^2, replace = TRUE)),
nrow = matrix_dim,
ncol = matrix_dim
)
total_production <- matrix(
as.double(sample(4000000:6000000, matrix_dim, replace = TRUE)),
nrow = 1,
ncol = matrix_dim
)
# Base R function
tech_coeff_r <- function(intermediate_transactions, total_production) {
tech_coeff_matrix <- intermediate_transactions %*% diag(1 / as.vector(total_production))
return(tech_coeff_matrix)
}
# {fio} setup
iom_fio <- fio::iom$new("iom", intermediate_transactions, total_production)
# benchmark
benchmark_a <- microbenchmark::microbenchmark(
fio = fio:::compute_tech_coeff(intermediate_transactions, total_production),
`Base R` = tech_coeff_r(intermediate_transactions, total_production),
leontief = leontief::input_requirement(intermediate_transactions, total_production),
times = 100
)
print(benchmark_a)
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> fio 19.79948 27.52306 32.46416 30.59574 35.91537 89.63771 100
#> Base R 5397.60696 5871.49346 6400.70644 6236.49741 7004.75749 7892.10966 100
#> leontief 25.69267 33.26089 40.73125 38.84038 44.42781 112.25745 100
# plot
ggplot2::autoplot(benchmark_a)
When we’re talking about inverting a there’s a lot more work involved. Leontief matrix () is obtained from subtracting the technical coefficients matrix () from the identity matrix (), therefore it has no null rows or columns.
It allows for solving the linear system through LU decomposition, which is a more efficient method than the direct inverse matrix calculation. fio takes advantage of LU decomposition and becomes incredibly faster, while leontief is over 10 times slower, followed closed by the over 20 times slower base R implementation.
# data
iom_fio$compute_tech_coeff()
technical_coefficients_matrix <- iom_fio$technical_coefficients_matrix
# base R function
leontief_inverse_r <- function(technical_coefficients_matrix) {
dim <- nrow(technical_coefficients_matrix)
leontief_inverse_matrix <- solve(diag(dim) - technical_coefficients_matrix)
return(leontief_inverse_matrix)
}
# benchmark
benchmark_b <- microbenchmark::microbenchmark(
fio = fio:::compute_leontief_inverse(technical_coefficients_matrix),
`Base R` = leontief_inverse_r(technical_coefficients_matrix),
leontief = leontief::leontief_inverse(technical_coefficients_matrix),
times = 100
)
print(benchmark_b)
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> fio 247.3609 285.6822 312.9184 301.5624 329.7859 543.802 100
#> Base R 5177.8100 5531.5195 5700.7660 5596.1821 5722.3004 7489.067 100
#> leontief 4671.1566 4970.6219 5143.4911 5065.7830 5216.6950 6979.434 100
# plot
ggplot2::autoplot(benchmark_b)
To represent linkage-based functions performance, we compute benchmark for sensitivity of dispersion coefficients of variation. All 3 implementations executes almost instantaneously, with fio being slightly slower than leontief and base R functions.
# data
iom_fio$compute_leontief_inverse()
leontief_inverse_matrix <- iom_fio$leontief_inverse_matrix
# base R function
sensitivity_r <- function(B) {
n <- nrow(B)
SL = rowSums(B)
ML = SL / n
(((1 / (n - 1)) * (colSums((B - ML) ** 2))) ** 0.5) / ML
}
# benchmark
benchmark_c <- microbenchmark::microbenchmark(
fio = fio:::compute_sensitivity_dispersion_cv(leontief_inverse_matrix),
`Base R` = sensitivity_r(leontief_inverse_matrix),
leontief = leontief::sensitivity_dispersion_cv(leontief_inverse_matrix),
times = 100
)
print(benchmark_c)
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> fio 18.84368 19.14402 20.55910 19.33092 21.95244 25.82193 100
#> Base R 20.70250 21.81017 23.14538 22.41280 24.08331 34.13879 100
#> leontief 14.07486 14.33243 14.60074 14.44927 14.58205 21.89314 100
ggplot2::autoplot(benchmark_c)
Since field of influence involves computing Leontief inverse matrix for each element of technical coefficients matrix after an increment, it can be demanding for high dimensional matrices. Here, we evaluate benchmark for base R function and {fio}, since there’s no similiar function in {leontief}. For brevity, we cut dimensions to 100 and repetitions to 10.
# data
matrix_dim <- 100
intermediate_transactions <- matrix(
as.double(sample(1:1000, matrix_dim^2, replace = TRUE)),
nrow = matrix_dim,
ncol = matrix_dim
)
total_production <- matrix(
as.double(sample(4000000:6000000, matrix_dim, replace = TRUE)),
nrow = 1,
ncol = matrix_dim
)
iom_fio_reduced <- fio::iom$new(
"iom_reduced",
intermediate_transactions,
total_production
)$compute_tech_coeff()$compute_leontief_inverse()
# base R function
field_influence_r <- function(A, B, ee = 0.001) {
n = nrow(A)
I = diag(n)
E = matrix(0, ncol = n, nrow = n)
SI = matrix(0, ncol = n, nrow = n)
for (i in 1:n) {
for (j in 1:n) {
E[i, j] = ee
AE = A + E
BE = solve(I - AE)
FE = (BE - B) / ee
FEq = FE * FE
S = sum(FEq)
SI[i, j] = S
E[i, j] = 0
}
}
}
# benchmark
benchmark_d <- microbenchmark::microbenchmark(
fio = fio:::compute_field_influence(
iom_fio_reduced$technical_coefficients_matrix,
iom_fio_reduced$leontief_inverse_matrix,
0.001
),
`Base R` = field_influence_r(iom_fio_reduced$technical_coefficients_matrix, iom_fio_reduced$leontief_inverse_matrix),
times = 10
)
print(benchmark_d)
#> Unit: seconds
#> expr min lq mean median uq max neval
#> fio 2.32139 2.581798 2.587570 2.630586 2.669976 2.713694 10
#> Base R 7.78439 7.824686 7.972608 7.847324 7.934813 8.490848 10
ggplot2::autoplot(benchmark_d)
Or in a equivalent way, multiplying intermediate transactions matrix by a diagonal matrix constructed from total production vector.↩︎