On this page, we introduce **vectorized numerical quadrature**: an interesting technique we developed for evaluating integrals numerically that, to our knowledge, has not been explored in-depth by anyone else. We'll show that this technique can achieve surprising performance gains over traditional quadrature rules.
## Simpson's rule, vectorized
To start, let's review the typical methods of numerical integral evaluation, also called _numerical quadrature_[^2]. numerical quadrature rules are a collection of diverse methods used for approximating the value of an integral. One of the most widely-used methods of numerical quadrature is [Simpson's rule](https://en.wikipedia.org/wiki/Simpson's_rule), which is actually a collection of several rules and even more variants.
Simpson's (composite) 1/3 rule is given by:
$
\int_a^b f(x) dx \approx \frac{h}{3}\left[f_0 + 4f_1 + 2 f_2 + 4 f_3 + 2 f_4 + \dots 2f_{n-2} + 4 f_{n-1} + f_n\right]
$
Where $f_i \equiv f(x_i)$, $h = (b-a)/N$, and $N$ subintervals are used (and for the 1/3 rule, it must be an **odd number**). We can write this in a vectorized form as follows. Let $\mathbf{I}_S$ be the vector defined by $\mathbf{I}_S = K\mathbf{f}$, where $K$ is a matrix and $\mathbf{f}$ is a vector, defined as follows:
$
K =
\begin{pmatrix}
1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
0 & 4 & 0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 2 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 4 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & \vdots & \ddots & \vdots & \vdots \\
0 & 0 & 0 & 0 & 0 & 2 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 & 4 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\
\end{pmatrix}, \quad
\mathbf{f} =
\begin{pmatrix}
f_0 \\ f_1 \\ f_2 \\ f_3 \\ f_4 \\ \vdots \\ f_{n-2} \\ f_{n-1} \\ f_n
\end{pmatrix}
$
Thus, by expanding out $\mathbf{I}_S = K \mathbf{f}$ we have:
$
\mathbf{I}_S =
\begin{pmatrix}
1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
0 & 4 & 0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 2 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 4 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & \vdots & \ddots & \vdots & \vdots \\
0 & 0 & 0 & 0 & 0 & 2 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 & 4 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\
\end{pmatrix}
\begin{pmatrix}
f_0 \\ f_1 \\ f_2 \\ f_3 \\ f_4 \\ \vdots \\ f_{n-2} \\ f_{n-1} \\ f_n
\end{pmatrix}
$
The value of the integral is then given by:
$
\int_{a}^b f(x) dx \approx \alpha( \mathbf{I}_S \cdot \mathbf{1}), \quad \alpha = \frac{h}{3}
$
> **Note:** the factor of $\alpha$ is there to absorb any fractional coefficients as well as absorbing the step size $h$. By leaving it as a simple multiplication *after* the sum is computed, it allows the $K$-matrix to be composed of purely integers, increasing computational efficiency.
Then, the Simpson's 1/3 rule is simply taking the dot product of $\mathbf{I}_S$ with a vector of all ones (which we notate as $\mathbf{1}$), or equivalently, the sum of all its elements, and then multiplying by $h/3$:
$
\int_a^b f(x) dx \approx \frac{h}{3} (\mathbf{I}_S \cdot \mathbf{1}) = \frac{h}{3} \sum_j (\mathbf{I}_S)_j
$
Doing it this way, we can write far more general code that can simply take in a stencil for a particular integral rule and construct the matrix $K$, rather than needing to write a for-loop for every individual method, since the evaluation steps are the same:
1. Construct the vector $\mathbf{f}$ by evaluating $f(x)$ on every point in the integration domain
2. Construct the matrix $K$ and calculate $\mathbf{I}_S$ via $\mathbf{I}_S = K \mathbf{f}$
3. Compute $\alpha \cdot \operatorname{sum}(\mathbf{I}_S)$ to get the approximated value of the integral
For instance, Simpson's 3/8 rule can be written in a very similar form; indeed, one only needs to change the matrix and constant to the following forms:
$
\alpha = \frac{3h}{8}, \quad
K =
\begin{pmatrix}
1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
0 & 3 & 0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 3 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 2 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & \vdots & \ddots & \vdots & \vdots \\
0 & 0 & 0 & 0 & 0 & 3 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 & 3 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\
\end{pmatrix}
$
Where the central diagonals repeat in the form $(3, 3, 2), (3, 3, 2), (3, 3,2), \dots$ in the region represented by the $\ddots$ symbol. The generalizability of the algorithm does not sacrifice simplicity; in fact, assuming that you're using a language that supports vectorized operations, the code to implement the algorithm is so simple that it is under 10 lines of code(!) in pseudocode:
```python
# Note: this is pseudocode but Python/Matlab-inspired
# Assuming that the matrix K
# and the function f(x) are defined already
# define integration domain & subintervals
a = 0
b = 1
N = 100
h = (b - a)/N
alpha = h/3
# construct grid and evaluate function
x = linspace(a, b, N)
f = f(x)
# perform the matrix-vector multiplication
I_s = matmul(K, f) # for Python we can use the special syntax K @ f
result = alpha*sum(I_s) # integral result
```
A functional (but un-optimized) example, using Python and SciPy, is given below. It evaluates the following integral, which has no elementary solution:
$
\int_{2}^5 \frac{1}{\sqrt{ \ln x }} dx
$
```python
import numpy as np
import scipy.sparse as sp
N = 1000 # must be even!
a = 2
b = 5
h = (b-a)/N
# ensure that the number of subintervals
# is an even number
assert N % 2 == 0 # remove this
def integral_matrix(n):
# it is assumed that n is checked
# to be an even number prior to calling
# this function
central_diags = np.tile([2, 4], (n-2)//2)
diags = np.pad(central_diags, pad_width=(1, 1),
mode="constant", constant_values=1)
K = sp.block_diag(diags)
return K
# function to integrate
func = lambda x: np.log(x)**(-1/2)
def vectorized_simpsons(func, bounds, samples):
"""
func - function to integrate
bounds - bounds of integration
samples - number of subintervals
"""
a, b = bounds
alpha = h/3 # specific to Simpson's 1/3 rule
x = np.linspace(a, b, N)
f = func(x)
K = integral_matrix(N)
result = alpha * (K @ f).sum()
return result
print("Approximation for integral:", vectorized_simpsons(func, (a, b), N))
```
In comparison, below is a "standard" example code for the Simpson's method:
```python
def standard_simpsons(func, bounds, samples):
a, b = bounds
h = (b-a)/samples
alpha = h/3
x = a
# approximation for integral with sum
# but dropping the h/3 factor until the end
int_sum = func(a)
for k in range(1, samples-1, 2):
# we step by two every time
int_sum += 4*func(x) + 2*func(x+h)
x += 2*h
return alpha * int_sum
```
## Performance comparison
Having given our description of the vectorized and standard Simpson's rules, we can now compare their performance. A very basic benchmark code to calculate the execution time of each algorithm is given as follows:
```python
def timeit(N, method, func=func, a=a, b=b, h=h):
start_time = time.perf_counter()
if method == "standard":
_ = vectorized_simpsons(func, (a, b), N)
elif method == "vectorized":
_ = standard_simpsons(func, (a, b), N, h)
else: # no other case possible
return
end_time = time.perf_counter()
elapsed_time = end_time - start_time
return elapsed_time
# to use, just call timeit(some_number, "standard")
# or timeit(some_number, "vectorized")
```
A full version of the code is available [on this GitHub gist](https://gist.github.com/Songtech-0912/d908d5ee513cfec6d1698d5b746fbf56). The question is, is vectorized numerical quadrature really worth it? First, note that *both* methods are $O(n)$ in computational complexity. The standard Simpson's method is trivially $O(n)$ since it is just a pure sum; meanwhile, the vectorized Simpson's method uses a sparse diagonal matrix (which is $O(n)$ to construct), meaning the matrix-vector multiplication reduces to just a dot product (which is also $O(n)$) and the final sum is of course also $O(n)$. However, if we take our basic code (which, again, is not optimized), we can perform some (rough) benchmarks:
| $N$ | Standard Simpson's time (ms) | Vectorized Simpson's time (ms) |
| --------- | ---------------------------- | ------------------------------ |
| 10 | <0.1 | 75.3 |
| 100 | 0.1 | 79.3 |
| 1,000 | 1.0 | 78.9 |
| 10,000 | 7.4 | 74.0 |
| 50,000 | 37.6 | 81.7 |
| 100,000 | **84.3** | **78.7** |
| 250,000 | 191.3 | 78.6 |
| 1,000,000 | 749.1 | 74.5 |
| 5,000,000 | 3876.2 | 79.9 |
> **Note:** Bear in mind this is vanilla Python, not a high-performance language like C/C++/Rust/Fortran that you'd actually use for running ultra-high-performance code. Also, these benchmarks are definitely not done according the best practices and the results should be taken with caution.
Notice how below $N = \text{100,000}$, the standard Simpson's method is faster, but after $N > \text{100,000}$, the vectorized Simpson's method is *orders of magnitude* faster (see the highlighted row in the table above). While these results should be taken with a grain of salt, it is quite likely that the vectorized method was able to take advantage of BLAS/SIMD for vectorization speedups and avoid the need for for-loops (which can drastically slow code down). For smaller $N$, it's likely that the allocation cost for initializing the sparse matrix for $K$ makes it slower, but for large $N$ the vectorization benefits take over.
## Concluding remarks
The verdict? Vectorized numerical quadrature is absolutely a viable option[^1], particularly when evaluating integrals to very high precision is needed (that is, for large $N$). Furthermore, vectorized numerical quadrature can potentially benefit from hardware acceleration by running linear algebra on the GPU. In addition, another benefit of the vectorized method is that once constructed, the matrix $K$ can be continuously reused at no computational cost for subsequent integration, which is valuable when an integral needs to be rapidly re-evaluated many times over (for instance, if you are trying to compute a function that is *defined* by an integral, like the [error function](https://en.wikipedia.org/wiki/Error_function) or [Gamma function](https://en.wikipedia.org/wiki/Gamma_function)). Perhaps the best method is to use a mixed algorithm that uses the standard Simpson's method for small $N$, but switches to the vectorized version for large $N$.
[^1]: One big caveat is that this requires that you have a fast linear algebra library with support for **sparse matrices**. As long as you have that, though, you get all the benefits of the vectorization speed-up!
[^2]: Here, when we speak of "numerical quadrature", we mean the *numerical evaluation of integrals*. The similar term "numerical integration" is sometimes also used, but it can lead to confusion since "numerical integration" is also used to refer to solving a differential equation numerically.