Vectorized numerical quadrature

On this page, we introduce **vectorized numerical quadrature**: an interesting technique we developed for evaluating integrals numerically that, to our knowledge, has not been explored in-depth by anyone else. We'll show that this technique can achieve surprising performance gains over traditional quadrature rules. ## Simpson's rule, vectorized To start, let's review the typical methods of numerical integral evaluation, also called _numerical quadrature_[^2]. numerical quadrature rules are a collection of diverse methods used for approximating the value of an integral. One of the most widely-used methods of numerical quadrature is [Simpson's rule](https://en.wikipedia.org/wiki/Simpson's_rule), which is actually a collection of several rules and even more variants. Simpson's (composite) 1/3 rule is given by: $ \int_a^b f(x) dx \approx \frac{h}{3}\left[f_0 + 4f_1 + 2 f_2 + 4 f_3 + 2 f_4 + \dots 2f_{n-2} + 4 f_{n-1} + f_n\right] $ Where $f_i \equiv f(x_i)$, $h = (b-a)/N$, and $N$ subintervals are used (and for the 1/3 rule, it must be an **odd number**). We can write this in a vectorized form as follows. Let $\mathbf{I}_S$ be the vector defined by $\mathbf{I}_S = K\mathbf{f}$, where $K$ is a matrix and $\mathbf{f}$ is a vector, defined as follows: $ K = \begin{pmatrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 4 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 2 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 4 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & 0 & 0 & 2 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 4 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ \end{pmatrix}, \quad \mathbf{f} = \begin{pmatrix} f_0 \\ f_1 \\ f_2 \\ f_3 \\ f_4 \\ \vdots \\ f_{n-2} \\ f_{n-1} \\ f_n \end{pmatrix} $ Thus, by expanding out $\mathbf{I}_S = K \mathbf{f}$ we have: $ \mathbf{I}_S = \begin{pmatrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 4 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 2 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 4 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & 0 & 0 & 2 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 4 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ \end{pmatrix} \begin{pmatrix} f_0 \\ f_1 \\ f_2 \\ f_3 \\ f_4 \\ \vdots \\ f_{n-2} \\ f_{n-1} \\ f_n \end{pmatrix} $ The value of the integral is then given by: $ \int_{a}^b f(x) dx \approx \alpha( \mathbf{I}_S \cdot \mathbf{1}), \quad \alpha = \frac{h}{3} $ > **Note:** the factor of $\alpha$ is there to absorb any fractional coefficients as well as absorbing the step size $h$. By leaving it as a simple multiplication *after* the sum is computed, it allows the $K$-matrix to be composed of purely integers, increasing computational efficiency. Then, the Simpson's 1/3 rule is simply taking the dot product of $\mathbf{I}_S$ with a vector of all ones (which we notate as $\mathbf{1}$), or equivalently, the sum of all its elements, and then multiplying by $h/3$: $ \int_a^b f(x) dx \approx \frac{h}{3} (\mathbf{I}_S \cdot \mathbf{1}) = \frac{h}{3} \sum_j (\mathbf{I}_S)_j $ Doing it this way, we can write far more general code that can simply take in a stencil for a particular integral rule and construct the matrix $K$, rather than needing to write a for-loop for every individual method, since the evaluation steps are the same: 1. Construct the vector $\mathbf{f}$ by evaluating $f(x)$ on every point in the integration domain 2. Construct the matrix $K$ and calculate $\mathbf{I}_S$ via $\mathbf{I}_S = K \mathbf{f}$ 3. Compute $\alpha \cdot \operatorname{sum}(\mathbf{I}_S)$ to get the approximated value of the integral For instance, Simpson's 3/8 rule can be written in a very similar form; indeed, one only needs to change the matrix and constant to the following forms: $ \alpha = \frac{3h}{8}, \quad K = \begin{pmatrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 3 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 3 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 2 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & 0 & 0 & 3 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 3 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ \end{pmatrix} $ Where the central diagonals repeat in the form $(3, 3, 2), (3, 3, 2), (3, 3,2), \dots$ in the region represented by the $\ddots$ symbol. The generalizability of the algorithm does not sacrifice simplicity; in fact, assuming that you're using a language that supports vectorized operations, the code to implement the algorithm is so simple that it is under 10 lines of code(!) in pseudocode: ```python # Note: this is pseudocode but Python/Matlab-inspired # Assuming that the matrix K # and the function f(x) are defined already # define integration domain & subintervals a = 0 b = 1 N = 100 h = (b - a)/N alpha = h/3 # construct grid and evaluate function x = linspace(a, b, N) f = f(x) # perform the matrix-vector multiplication I_s = matmul(K, f) # for Python we can use the special syntax K @ f result = alpha*sum(I_s) # integral result ``` A functional (but un-optimized) example, using Python and SciPy, is given below. It evaluates the following integral, which has no elementary solution: $ \int_{2}^5 \frac{1}{\sqrt{ \ln x }} dx $ ```python import numpy as np import scipy.sparse as sp N = 1000 # must be even! a = 2 b = 5 h = (b-a)/N # ensure that the number of subintervals # is an even number assert N % 2 == 0 # remove this def integral_matrix(n): # it is assumed that n is checked # to be an even number prior to calling # this function central_diags = np.tile([2, 4], (n-2)//2) diags = np.pad(central_diags, pad_width=(1, 1), mode="constant", constant_values=1) K = sp.block_diag(diags) return K # function to integrate func = lambda x: np.log(x)**(-1/2) def vectorized_simpsons(func, bounds, samples): """ func - function to integrate bounds - bounds of integration samples - number of subintervals """ a, b = bounds alpha = h/3 # specific to Simpson's 1/3 rule x = np.linspace(a, b, N) f = func(x) K = integral_matrix(N) result = alpha * (K @ f).sum() return result print("Approximation for integral:", vectorized_simpsons(func, (a, b), N)) ``` In comparison, below is a "standard" example code for the Simpson's method: ```python def standard_simpsons(func, bounds, samples): a, b = bounds h = (b-a)/samples alpha = h/3 x = a # approximation for integral with sum # but dropping the h/3 factor until the end int_sum = func(a) for k in range(1, samples-1, 2): # we step by two every time int_sum += 4*func(x) + 2*func(x+h) x += 2*h return alpha * int_sum ``` ## Performance comparison Having given our description of the vectorized and standard Simpson's rules, we can now compare their performance. A very basic benchmark code to calculate the execution time of each algorithm is given as follows: ```python def timeit(N, method, func=func, a=a, b=b, h=h): start_time = time.perf_counter() if method == "standard": _ = vectorized_simpsons(func, (a, b), N) elif method == "vectorized": _ = standard_simpsons(func, (a, b), N, h) else: # no other case possible return end_time = time.perf_counter() elapsed_time = end_time - start_time return elapsed_time # to use, just call timeit(some_number, "standard") # or timeit(some_number, "vectorized") ``` A full version of the code is available [on this GitHub gist](https://gist.github.com/Songtech-0912/d908d5ee513cfec6d1698d5b746fbf56). The question is, is vectorized numerical quadrature really worth it? First, note that *both* methods are $O(n)$ in computational complexity. The standard Simpson's method is trivially $O(n)$ since it is just a pure sum; meanwhile, the vectorized Simpson's method uses a sparse diagonal matrix (which is $O(n)$ to construct), meaning the matrix-vector multiplication reduces to just a dot product (which is also $O(n)$) and the final sum is of course also $O(n)$. However, if we take our basic code (which, again, is not optimized), we can perform some (rough) benchmarks: | $N$ | Standard Simpson's time (ms) | Vectorized Simpson's time (ms) | | --------- | ---------------------------- | ------------------------------ | | 10 | <0.1 | 75.3 | | 100 | 0.1 | 79.3 | | 1,000 | 1.0 | 78.9 | | 10,000 | 7.4 | 74.0 | | 50,000 | 37.6 | 81.7 | | 100,000 | **84.3** | **78.7** | | 250,000 | 191.3 | 78.6 | | 1,000,000 | 749.1 | 74.5 | | 5,000,000 | 3876.2 | 79.9 | > **Note:** Bear in mind this is vanilla Python, not a high-performance language like C/C++/Rust/Fortran that you'd actually use for running ultra-high-performance code. Also, these benchmarks are definitely not done according the best practices and the results should be taken with caution. Notice how below $N = \text{100,000}$, the standard Simpson's method is faster, but after $N > \text{100,000}$, the vectorized Simpson's method is *orders of magnitude* faster (see the highlighted row in the table above). While these results should be taken with a grain of salt, it is quite likely that the vectorized method was able to take advantage of BLAS/SIMD for vectorization speedups and avoid the need for for-loops (which can drastically slow code down). For smaller $N$, it's likely that the allocation cost for initializing the sparse matrix for $K$ makes it slower, but for large $N$ the vectorization benefits take over. ## Concluding remarks The verdict? Vectorized numerical quadrature is absolutely a viable option[^1], particularly when evaluating integrals to very high precision is needed (that is, for large $N$). Furthermore, vectorized numerical quadrature can potentially benefit from hardware acceleration by running linear algebra on the GPU. In addition, another benefit of the vectorized method is that once constructed, the matrix $K$ can be continuously reused at no computational cost for subsequent integration, which is valuable when an integral needs to be rapidly re-evaluated many times over (for instance, if you are trying to compute a function that is *defined* by an integral, like the [error function](https://en.wikipedia.org/wiki/Error_function) or [Gamma function](https://en.wikipedia.org/wiki/Gamma_function)). Perhaps the best method is to use a mixed algorithm that uses the standard Simpson's method for small $N$, but switches to the vectorized version for large $N$. [^1]: One big caveat is that this requires that you have a fast linear algebra library with support for **sparse matrices**. As long as you have that, though, you get all the benefits of the vectorization speed-up! [^2]: Here, when we speak of "numerical quadrature", we mean the *numerical evaluation of integrals*. The similar term "numerical integration" is sometimes also used, but it can lead to confusion since "numerical integration" is also used to refer to solving a differential equation numerically.