PathToPerformance

NB - This notebook was originally made, some years ago, by Steven G. Johnson. I took the pre 1.0 Julia code and updated it so as to teach myself and others some useful concepts about generated functions in Julia. It is based on the idea of generated functions and staged programming of chapter 5.7 of Jeff Bezanson's PhD thesis on the Julia Language.This transcription is shared with the author's permission. The original code is in Appendix B of the same thesis.

– Miguel Raz, 2020


  1. Kernel transformation
  2. Analytically known integrals:
  3. Numerically computed integrals
  4. Chebyshev interpolation
  5. First-integral generation

Kernel transformation

Input: kernel function K(r) K(r) . Output "first integral" of K K :

Kn(X)=∫01wnK(wX)dw \mathcal{K}_n(X) = \int_0^1 w^n K(wX) dw

for X∈(0,∞)X \in (0,\infty). Some small set of nn values, to be determined, will be supplied at compile time.

KK will be specified as a subtype of AbstractKernel.

The Kn\mathcal{K}_n function will be parameterized by a FirstIntegral{K,n} type parameterized by an AbstractKernel type K and an integer n.

julia> abstract type AbstractKernel end

Any kernel ~ X^P for X β‰ͺ S and ~ X^Q for X ≫ S

julia> abstract type PowerLawScaling{P,Q,S} <: AbstractKernel end
julia> immutable FirstIntegral{K<:AbstractKernel,N} end

Analytically known integrals:

immutable PowerLaw{p} <: PowerLawScaling{p,p,p} end # rα΅– power law

function (::FirstIntegral{PowerLaw{p},n})(X::Number) where {p,n}
    return p >= 0 ? X^p / (1 + n + p) : inv(X^(-p) * (1 + n + p))
end
F = FirstIntegral{PowerLaw{-1}, 3}()
F(3.7)
julia> @code_llvm F(3.7) # best to run this one yourself!

Numerically computed integrals

In a general multi-physics BEM package, one might conceivably have a user-specified kernel KK for which the first integral Kn\mathcal{K}_n is not known analytically. Performing the integral numerically at runtime would be too expensive, however, especially since the integrand may have an integrable singularity.

Instead, we will perform the integral Kn\mathcal{K}_n at compile-time (or, at least, outside the innermost BEM loops) for various XX and use these values to compute a Chebyshev interpolating polynomial C(ΞΎ)C(\xi). This polynomial will then be used to generate an efficient Kn(X)\mathcal{K}_n(X).

There are three tricky points:

In summary, the user will specify a PowerLawScaling{p,q,s} kernel type parameterized by p, q, and s. They will also define a (::PowerLawScaling{p,q,s})(r::Number) method that computes K(r)K(r).

The polynomial fit will be performed as follows, assuming p≀0p \le 0 and q≀0q \le 0. First, we let Ln(X)=Kn(X)/(sp+Xp)L_n(X) = \mathcal{K}_n(X) / (s^p + X^p), which eliminates the xβ†’0x\to 0 singularity while still having Ln∼XqL_n \sim X^q for X≫sX \gg s. Second, we let X=(1βˆ’ΞΎ)1/qβˆ’21/qX = (1-\xi)^{1/q} - 2^{1/q} [or equivalently ΞΎ=1βˆ’(x+21/q)q\xi = 1 - (x+2^{1/q})^q], which maps ξ∈(βˆ’1,1)\xi \in (-1,1) to X∈(0,∞)X \in (0,\infty), and has the property that Xqβ‰ˆ1βˆ’ΞΎX^q \approx 1-\xi as ΞΎβ†’1\xi\to 1, so the coordinate remapping produces a nice degree-1 polynomial. Finally, we fit Ln(ΞΎ(X))=C(ΞΎ)L_n(\xi(X)) = C(\xi) to a Chebyshev polynomial CC, and compute Kn(X)\mathcal{K}_n(X) via Kn(X)=(sp+Xp)Γ—Ln(ΞΎ(X))\mathcal{K}_n(X) = (s^p + X^p) \times L_n(\xi(X)).

Chebyshev interpolation

The following routines compute the coefficients cnc_n of a Chebyshev interpolating polynomial C(x)=βˆ‘n=0Nβˆ’1cnTn(x)C(x) = \sum_{n=0}^{N-1} c_n T_n(x) for a function f(x)f(x) on (βˆ’1,1)(-1,1), where Tn(x)=cos⁑(ncosβ‘βˆ’1x)T_n(x) = \cos(n \cos^{-1}x) are the first-kind Chebyshev polynomials.

We compute these coefficients cnc_n by first evaluating f(x)f(x) at the Chebyshev points cos⁑(Ο€n+1/2N)\cos\left(\pi\frac{n+1/2}{N}\right) for n=0,…,Nβˆ’1n=0,\ldots,N-1, for which the Chebyshev sum is equivalent to a type-III discrete cosine transform (DCT-III), so that the coefficients cnc_n are computed by a DCT-II. These are not quite the typical Chebyshev points, which correspond to a DCT-I: the difference is that the DCT-I corresponds to the closed interval [βˆ’1,1][-1,1], i.e. it includes the endpoints, whereas our function may involve terms that blow up at the endpoints (although the overall function should be okay) so we don't want to evaluate it there.

We also provide a function evalcheb to evaluate C(x)C(x) for any x∈(βˆ’1,1)x\in(-1,1) by a Clenshaw recurrence, and a macro version @evalcheb (analogous to Base.@horner) that generates a completely inlined version of this recurrence for the case where cc is fixed.

julia> #Pkg.add("FFTW");
julia> using FFTW

NN chebyshev points (order N) on the interval (βˆ’1,1)(-1,1)

julia> chebx(N) = [cos(Ο€*(n+0.5)/N) for n in 0:N-1]

NN chebyshev coefficients for vector of f(x)f(x) values on chebxchebx points xx

julia> function chebcoef(f::AbstractVector)
    a = FFTW.r2r(f, FFTW.REDFT10) / length(f)
    a[1] /= 2
    return a
end

Given a function ff and a tolerance, return enough Chebyshev coefficients to reconstruct ff to that tolerance on (βˆ’1,1)(-1,1)

julia> function chebcoef(f, tol=1e-13)
    N = 10
    local c
    while true
        x = chebx(N)
        c = chebcoef(float[f(y) for y in x])
        # look at last 3 coefs, since individual c's might be zero
        if maximum(abs,c[end:end-2]) < tol * maximum(abs,c)
            break
        end
        N *= 2
    end
    vβ‚€ = maximum(abs,c) * tol
    return c[1:findlast(v -> abs(v) > tol, c)] # shrink to minimum length
end

julia> function chebcoef(f, tol=1e-13)
    N = 10
    local c
    while true
        x = chebx(N)
        c = chebcoef(Float64[f(y) for y in x])
        # look at last 3 coefs, since individual c's might be zero
        if maximum(abs,c[end:end-2]) < tol * maximum(abs,c)
            break
        end
        N *= 2
    end
    vβ‚€ = maximum(abs,c) * tol
    return c[1:findlast(v -> abs(v) > tol, c)] # shrink to minimum length
end

Given chebcheb coefficients aa, evaluate them for xx in (βˆ’1,1)(-1,1) by Clenshaw recurrence

julia> function evalcheb(x, a)
    isempty(a) && throw(BoundsError())
    -1 ≀ x ≀ 1 || throw(DomainError())
    bβ‚–β‚Šβ‚ = bβ‚–β‚Šβ‚‚ = zero(x)
    for k = length(a):-1:2
        bβ‚– = a[k] + 2x*bβ‚–β‚Šβ‚ - bβ‚–β‚Šβ‚‚
        bβ‚–β‚Šβ‚‚ = bβ‚–β‚Šβ‚
        bβ‚–β‚Šβ‚ = bβ‚–
    end
    return a[1] + x*bβ‚–β‚Šβ‚ - bβ‚–β‚Šβ‚‚
end

# inlined version of evalcheb given coefficents a, and x in (-1,1)
julia> macro evalcheb(x, a...)
    isempty(a) && throw(BoundsError())
    # Clenshaw recurrence, evaluated symbolically:
    bβ‚–β‚Šβ‚ = bβ‚–β‚Šβ‚‚ = 0
    for k = length(a):-1:2
        bβ‚– = esc(a[k])
        if bβ‚–β‚Šβ‚ != 0
            bβ‚– = :(muladd(t2, $bβ‚–β‚Šβ‚, $bβ‚–))
        end
        if bβ‚–β‚Šβ‚‚ != 0
            bβ‚– = :($bβ‚– - $bβ‚–β‚Šβ‚‚)
        end
        bβ‚–β‚Šβ‚‚ = bβ‚–β‚Šβ‚
        bβ‚–β‚Šβ‚ = bβ‚–
    end
    ex = esc(a[1])
    if bβ‚–β‚Šβ‚ != 0
        ex = :(muladd(t, $bβ‚–β‚Šβ‚, $ex))
    end
    if bβ‚–β‚Šβ‚‚ != 0
        ex = :($ex - $bβ‚–β‚Šβ‚‚)
    end
    Expr(:block, :(t = $(esc(x))), :(t2 = 2t), ex)
end

Let's try a simple test case: performing Chebyshev interpolation of exp⁑(x)\exp(x):

julia> c = chebcoef(exp)
julia> x = linspace(-1,1,100)
julia> maximum(abs.(Float64[evalcheb(y,c) for y in x] - exp.(x))) # the maximum error on [-1,1]

# check that the evalcheb macro works
julia> evalcheb(0.1234, c[1:4]) - @evalcheb(0.1234, c[1],c[2],c[3],c[4])

First-integral generation

# extract parameters from PowerLawScaling type
julia> pqsPowerLawScaling{p,q,s}(::PowerLawScaling{p,q,s}) = (p,q,s)

Extract parameters from PowerLawScaling type

julia> pqsPowerLawScaling{p,q,s}(::PowerLawScaling{p,q,s}) = (p,q,s)

julia> @generated function (::FirstIntegral{P,n}, X::Real) where {P<:PowerLawScaling,n}
    # compute the Chebyshev coefficients (of the rescaled 𝒦ₙ as described above)
    K = P()
    p,q,s = pqsPowerLawScaling(K)
    
    𝒦ₙ = X -> quadgk(w -> w^n * K(w*X), 0,1, abstol=1e-12, reltol=1e-10)[1]
    Lβ‚™ = p < 0 ? X -> 𝒦ₙ(X) / (s^p + X^p) : 𝒦ₙ # scale out X β‰ͺ s singularity
    q > 0 && throw(DomainError()) # don't know how to deal with growing kernels
    qinv = 1/q
    c = chebcoef(ΞΎ -> Lβ‚™((1-ΞΎ)^qinv - 2^qinv), 1e-9)
    
    # return an expression that inlines the evaluation of 𝒦ₙ via C(ΞΎ)
    quote
        X <= 0 && throw(DomainError())
        ΞΎ = 1 - (X + $(2^qinv))^$q
        C = @evalcheb ΞΎ $(c...)
        return $p < 0 ? C * (X^$p + $(s^p)) : C
    end
end

A simple example where the result is known analytically:

julia> immutable DumbPowerLaw{p,s} <: PowerLawScaling{p,p,s}; end # rα΅– power law
julia> (::FirstIntegral{DumbPowerLaw{p,s}})(r)  where {p,s} = r^p
julia> F = FirstIntegral{DumbPowerLaw{-1,1.0},3}()
julia> F(3.7)
julia> @code_llvm F(3.7)
julia> #Pkg.add("PyPlot")
julia> using PyPlot
julia> x = [0.01:.0125:1.0;]; 
julia> plot(x, map(FirstIntegral{DumbPowerLaw{-1,1.}, 3}(),x))

As an added bonus, I should probably include the other integrals included in section 5.7. Here they are:

Notice that a Helmholtz kernel is also a power law kernel:

import GSL
exprel(n, x) = GSL. sf_exprel_n(n, x)
type Helmholtz{k} <: PowerLaw{-1, -1} end # exp(ikr) / 4 pi r
function(::FirstIntegral{Helmholtz{k}, n})(x) where {k,n}
	ikx = im * k * x
	return exp(ikx) * exprel(n, -ikx)/(4*pi*x)
end

# magnetic field integral equation
abstract type MFIE{k} <: PowerLaw{-3, -3} end # (ikr - 1) * exp(ikr) / 4pi r^3
function (::FirstIntegral{MFIE{k}, n})(x) where {k,n}
    ikx = im * k * x
    return exp(ikx) * (im * k * exprel(n - 1, -ikx)/((n - 1)*x) - exprel(n - 2, -ikx)/((n-2)*(x^2)) / (4*pi*x)
end