# Bijectors.jl

## Functions

Bijectors._link_chol_lkjMethod
function _link_chol_lkj(w)

Link function for cholesky factor.

An alternative and maybe more efficient implementation was considered:

for i=2:K, j=(i+1):K
z[i, j] = (w[i, j] / w[i-1, j]) * (z[i-1, j] / sqrt(1 - z[i-1, j]^2))
end

But this implementation will not work when w[i-1, j] = 0. Though it is a zero measure set, unit matrix initialization will not work.

For equivelence, following explanations is given by @torfjelde:

For (i, j) in the loop below, we define

z₍ᵢ₋₁, ⱼ₎ = w₍ᵢ₋₁,ⱼ₎ * ∏ₖ₌₁ⁱ⁻² (1 / √(1 - z₍ₖ,ⱼ₎²))

and so

z₍ᵢ,ⱼ₎ = w₍ᵢ,ⱼ₎ * ∏ₖ₌₁ⁱ⁻¹ (1 / √(1 - z₍ₖ,ⱼ₎²))
= (w₍ᵢ,ⱼ₎ * / √(1 - z₍ᵢ₋₁,ⱼ₎²)) * (∏ₖ₌₁ⁱ⁻² 1 / √(1 - z₍ₖ,ⱼ₎²))
= (w₍ᵢ,ⱼ₎ * / √(1 - z₍ᵢ₋₁,ⱼ₎²)) * (w₍ᵢ₋₁,ⱼ₎ * ∏ₖ₌₁ⁱ⁻² 1 / √(1 - z₍ₖ,ⱼ₎²)) / w₍ᵢ₋₁,ⱼ₎
= (w₍ᵢ,ⱼ₎ * / √(1 - z₍ᵢ₋₁,ⱼ₎²)) * (z₍ᵢ₋₁,ⱼ₎ / w₍ᵢ₋₁,ⱼ₎)
= (w₍ᵢ,ⱼ₎ / w₍ᵢ₋₁,ⱼ₎) * (z₍ᵢ₋₁,ⱼ₎ / √(1 - z₍ᵢ₋₁,ⱼ₎²))

which is the above implementation.

Bijectors.bijectorMethod
bijector(d::Distribution)

Returns the constrained-to-unconstrained bijector for distribution d.

Bijectors.combineMethod
combine(m::PartitionMask, x_1, x_2, x_3)

Combines x_1, x_2, and x_3 into a single vector.

Bijectors.composelMethod
composel(ts::Bijector...)::Composed{<:Tuple}

Constructs Composed such that ts are applied left-to-right.

Bijectors.composerMethod
composer(ts::Bijector...)::Composed{<:Tuple}

Constructs Composed such that ts are applied right-to-left.

Bijectors.compute_rMethod
compute_r(y_minus_z0::AbstractVector{<:Real}, α, α_plus_β_hat)

Compute the unique solution $r$ to the equation

$$$\|y_minus_z0\|_2 = r \left(1 + \frac{α_plus_β_hat - α}{α + r}\right)$$$

subject to $r ≥ 0$ and $r ≠ α$.

Since $α > 0$ and $α_plus_β_hat > 0$, the solution is unique and given by

$$$r = (\sqrt{(α_plus_β_hat - γ)^2 + 4 α γ} - (α_plus_β_hat - γ)) / 2,$$$

where $γ = \|y_minus_z0\|_2$. For details see appendix A.2 of the reference.

References

D. Rezende, S. Mohamed (2015): Variational Inference with Normalizing Flows. arXiv:1505.05770

Bijectors.find_alphaMethod
find_alpha(wt_y, wt_u_hat, b)

Compute an (approximate) real-valued solution $α̂$ to the equation

$$$wt_y = α + wt_u_hat tanh(α + b)$$$

The uniqueness of the solution is guaranteed since $wt_u_hat ≥ -1$. For details see appendix A.1 of the reference.

Initial bracket

For all $α$, we have

$$$α - |wt_u_hat| - wt_y \leq α + wt_u_hat tanh(α + b) - wt_y \leq α + |wt_u_hat| - wt_y.$$$

Thus

$$$α̂ - |wt_u_hat| - wt_y \leq 0 \leq α̂ + |wt_u_hat| - wt_y,$$$

which implies $α̂ ∈ [wt_y - |wt_u_hat|, wt_y + |wt_u_hat|]$. To avoid floating point issues if $α̂ = wt_y ± |wt_u_hat|$, we use the more conservative interval $[wt_y - 2 |wt_u_hat|, wt_y + 2|wt_u_hat|]$ as initial bracket, at the cost of one additional iteration step.

References

D. Rezende, S. Mohamed (2015): Variational Inference with Normalizing Flows. arXiv:1505.05770

Bijectors.forwardMethod
forward(d::Distribution)
forward(d::Distribution, num_samples::Int)

Returns a NamedTuple with fields x, y, logabsdetjac and logpdf.

In the case where d isa TransformedDistribution, this means

• x = rand(d.dist)
• y = d.transform(x)
• logabsdetjac is the logabsdetjac of the "forward" transform.
• logpdf is the logpdf of y, not x

In the case where d isa Distribution, this means

• x = rand(d)
• y = x
• logabsdetjac = 0.0
• logpdf is logpdf of x
Bijectors.get_u_hatMethod
get_u_hat(u::AbstractVector{<:Real}, w::AbstractVector{<:Real})

Return a tuple of vector $û$ that guarantees invertibility of the planar layer, and scalar $wᵀ û$.

Mathematical background

According to appendix A.1, vector $û$ defined by

$$$û(w, u) = u + (\log(1 + \exp{(wᵀu)}) - 1 - wᵀu) \frac{w}{\|w\|²}$$$

guarantees that the planar layer $f(z) = z + û tanh(wᵀz + b)$ is invertible for all $w, u ∈ ℝᵈ$ and $b ∈ ℝ$. We can rewrite $û$ as

$$$û = u + (\log(1 + \exp{(-wᵀu)}) - 1) \frac{w}{\|w\|²}.$$$

$$$wᵀû = wᵀu + \log(1 + \exp{(-wᵀu)}) - 1 = \log(1 + \exp{(wᵀu)}) - 1.$$$

References

D. Rezende, S. Mohamed (2015): Variational Inference with Normalizing Flows. arXiv:1505.05770

Bijectors.isclosedformMethod
isclosedform(b::Bijector)::bool
isclosedform(b⁻¹::Inverse{<:Bijector})::bool

Returns true or false depending on whether or not evaluation of b has a closed-form implementation.

Most bijectors have closed-form evaluations, but there are cases where this is not the case. For example the inverse evaluation of PlanarLayer requires an iterative procedure to evaluate.

Bijectors.logabsdetjacMethod
logabsdetjac(b::Bijector, x)
logabsdetjac(ib::Inverse{<:Bijector}, y)

Computes the log(abs(det(J(b(x))))) where J is the jacobian of the transform. Similarily for the inverse-transform.

Default implementation for Inverse{<:Bijector} is implemented as - logabsdetjac of original Bijector.

Bijectors.logabsdetjacinvMethod
logabsdetjacinv(td::UnivariateTransformed, y::Real)
logabsdetjacinv(td::MultivariateTransformed, y::AbstractVector{<:Real})

Computes the logabsdetjac of the inverse transformation, since rand(td) returns the transformed random variable.

Bijectors.logpdf_with_jacMethod
logpdf_with_jac(td::UnivariateTransformed, y::Real)
logpdf_with_jac(td::MvTransformed, y::AbstractVector{<:Real})
logpdf_with_jac(td::MatrixTransformed, y::AbstractMatrix{<:Real})

Makes use of the forward method to potentially re-use computation and returns a tuple (logpdf, logabsdetjac).

Bijectors.orderedMethod
ordered(d::Distribution)

Return a Distribution whose support are ordered vectors, i.e., vectors with increasingly ordered elements.

Bijectors.transformedMethod
transformed(d::Distribution)
transformed(d::Distribution, b::Bijector)

Couples distribution d with the bijector b by returning a TransformedDistribution.

If no bijector is provided, i.e. transformed(d) is called, then transformed(d, bijector(d)) is returned.

ChangesOfVariables.with_logabsdet_jacobianMethod
with_logabsdet_jacobian(b::Bijector, x)

Computes both transform and logabsdetjac in one forward pass, and returns a named tuple (b(x), logabsdetjac(b, x)).

This defaults to the call above, but often one can re-use computation in the computation of the forward pass and the computation of the logabsdetjac. forward allows the user to take advantange of such efficiencies, if they exist.

## Types

Bijectors.ADBijectorType

Abstract type for a Bijector{N} making use of auto-differentation (AD) to implement jacobian and, by impliciation, logabsdetjac.

Bijectors.ComposedType
Composed(ts::A)

∘(b1::Bijector{N}, b2::Bijector{N})::Composed{<:Tuple}
composel(ts::Bijector{N}...)::Composed{<:Tuple}
composer(ts::Bijector{N}...)::Composed{<:Tuple}

where A refers to either

• Tuple{Vararg{<:Bijector{N}}}: a tuple of bijectors of dimensionality N
• AbstractArray{<:Bijector{N}}: an array of bijectors of dimensionality N

A Bijector representing composition of bijectors. composel and composer results in a Composed for which application occurs from left-to-right and right-to-left, respectively.

Note that all the alternative ways of constructing a Composed returns a Tuple of bijectors. This ensures type-stability of implementations of all relating methdos, e.g. inverse.

If you want to use an Array as the container instead you can do

Composed([b1, b2, ...])

In general this is not advised since you lose type-stability, but there might be cases where this is desired, e.g. if you have a insanely large number of bijectors to compose.

Examples

Simple example

Let's consider a simple example of Exp:

julia> using Bijectors: Exp

julia> b = Exp()
Exp{0}()

julia> b ∘ b
Composed{Tuple{Exp{0},Exp{0}},0}((Exp{0}(), Exp{0}()))

julia> (b ∘ b)(1.0) == exp(exp(1.0))    # evaluation
true

julia> inverse(b ∘ b)(exp(exp(1.0))) == 1.0 # inversion
true

julia> logabsdetjac(b ∘ b, 1.0)         # determinant of jacobian
3.718281828459045

Notes

Order

It's important to note that ∘ does what is expected mathematically, which means that the bijectors are applied to the input right-to-left, e.g. first applying b2 and then b1:

(b1 ∘ b2)(x) == b1(b2(x))     # => true

But in the Composed struct itself, we store the bijectors left-to-right, so that

cb1 = b1 ∘ b2                  # => Composed.ts == (b2, b1)
cb2 = composel(b2, b1)         # => Composed.ts == (b2, b1)
cb1(x) == cb2(x) == b1(b2(x))  # => true

Structure

∘ will result in "flatten" the composition structure while composel and composer preserve the compositional structure. This is most easily seen by an example:

julia> b = Exp()
Exp{0}()

julia> cb1 = b ∘ b; cb2 = b ∘ b;

julia> (cb1 ∘ cb2).ts # <= different
(Exp{0}(), Exp{0}(), Exp{0}(), Exp{0}())

julia> (cb1 ∘ cb2).ts isa NTuple{4, Exp{0}}
true

julia> Bijectors.composer(cb1, cb2).ts
(Composed{Tuple{Exp{0},Exp{0}},0}((Exp{0}(), Exp{0}())), Composed{Tuple{Exp{0},Exp{0}},0}((Exp{0}(), Exp{0}())))

julia> Bijectors.composer(cb1, cb2).ts isa Tuple{Composed, Composed}
true
Bijectors.CorrBijectorType
CorrBijector <: Bijector{2}

A bijector implementation of Stan's parametrization method for Correlation matrix: https://mc-stan.org/docs/2_23/reference-manual/correlation-matrix-transform-section.html

Basically, a unconstrained strictly upper triangular matrix y is transformed to a correlation matrix by following readable but not that efficient form:

K = size(y, 1)
z = tanh.(y)

for j=1:K, i=1:K
if i>j
w[i,j] = 0
elseif 1==i==j
w[i,j] = 1
elseif 1<i==j
w[i,j] = prod(sqrt(1 .- z[1:i-1, j].^2))
elseif 1==i<j
w[i,j] = z[i,j]
elseif 1<i<j
w[i,j] = z[i,j] * prod(sqrt(1 .- z[1:i-1, j].^2))
end
end

It is easy to see that every column is a unit vector, for example:

w3' w3 ==
w[1,3]^2 + w[2,3]^2 + w[3,3]^2 ==
z[1,3]^2 + (z[2,3] * sqrt(1 - z[1,3]^2))^2 + (sqrt(1-z[1,3]^2) * sqrt(1-z[2,3]^2))^2 ==
z[1,3]^2 + z[2,3]^2 * (1-z[1,3]^2) + (1-z[1,3]^2) * (1-z[2,3]^2) ==
z[1,3]^2 + z[2,3]^2 - z[2,3]^2 * z[1,3]^2 + 1 -z[1,3]^2 - z[2,3]^2 + z[1,3]^2 * z[2,3]^2 ==
1

And diagonal elements are positive, so w is a cholesky factor for a positive matrix.

x = w' * w

Consider block matrix representation for x

x = [w1'; w2'; ... wn'] * [w1 w2 ... wn] ==
[w1'w1 w1'w2 ... w1'wn;
w2'w1 w2'w2 ... w2'wn;
...
]

The diagonal elements are given by wk'wk = 1, thus x is a correlation matrix.

Every step is invertible, so this is a bijection(bijector).

Note: The implementation doesn't follow their "manageable expression" directly, because their equation seems wrong (7/30/2020). Insteadly it follows definition above the "manageable expression" directly, which is also described in above doc.

Bijectors.CouplingType
Coupling{F, M}(θ::F, mask::M)

Implements a coupling-layer as defined in .

Examples

julia> m = PartitionMask(3, , ) # <= going to use x to parameterize transform of x
[1, 1]  =  1.0,
[2, 1]  =  1.0,
[3, 1]  =  1.0)

julia> cl = Coupling(θ -> Shift(θ), m) # <= will do y[1:1] = x[1:1] + x[2:2];

julia> x = [1., 2., 3.];

julia> cl(x)
3-element Array{Float64,1}:
3.0
2.0
3.0

julia> inverse(cl)(cl(x))
3-element Array{Float64,1}:
1.0
2.0
3.0

julia> coupling(cl) # get the Bijector map θ -> b(⋅, θ)
Shift

julia> couple(cl, x) # get the Bijector resulting from x
Shift{Array{Float64,1},1}([2.0])

References

 Kobyzev, I., Prince, S., & Brubaker, M. A., Normalizing flows: introduction and ideas, CoRR, (), (2019).

Bijectors.InverseType
inverse(b::Bijector)
Inverse(b::Bijector)

A Bijector representing the inverse transform of b.

Bijectors.LeakyReLUType
LeakyReLU{T, N}(α::T) <: Bijector{N}

Defines the invertible mapping

x ↦ x if x ≥ 0 else αx

where α > 0.

Bijectors.NamedBijectorType
NamedBijector <: AbstractNamedBijector

Wraps a NamedTuple of key -> Bijector pairs, implementing evaluation, inversion, etc.

Examples

julia> using Bijectors: NamedBijector, Scale, Exp

julia> b = NamedBijector((a = Scale(2.0), b = Exp()));

julia> x = (a = 1., b = 0., c = 42.);

julia> b(x)
(a = 2.0, b = 1.0, c = 42.0)

julia> (a = 2 * x.a, b = exp(x.b), c = x.c)
(a = 2.0, b = 1.0, c = 42.0)
Bijectors.NamedCompositionType
NamedComposition <: AbstractNamedBijector

Wraps a tuple of array of AbstractNamedBijector and implements their composition.

This is very similar to Composed for Bijector, with the exception that we do not require the inputs to have the same "dimension", which in this case refers to the symbols for the NamedTuple that this takes as input.

See also: Composed

Bijectors.NamedCouplingType
NamedCoupling{target, deps, F} <: AbstractNamedBijector

Implements a coupling layer for named bijectors.

Examples

julia> using Bijectors: NamedCoupling, Scale

julia> b = NamedCoupling(:b, (:a, :c), (a, c) -> Scale(a + c))
NamedCoupling{:b,(:a, :c),var"#3#4"}(var"#3#4"())

julia> x = (a = 1., b = 2., c = 3.);

julia> b(x)
(a = 1.0, b = 8.0, c = 3.0)

julia> (a = x.a, b = (x.a + x.c) * x.b, c = x.c)
(a = 1.0, b = 8.0, c = 3.0)
Bijectors.PartitionMaskType
PartitionMask{A}(A_1::A, A_2::A, A_3::A) where {A}

This is used to partition and recombine a vector into 3 disjoint "subvectors".

Implements

• partition(m::PartitionMask, x): partitions x into 3 disjoint "subvectors"
• combine(m::PartitionMask, x_1, x_2, x_3): combines 3 disjoint vectors into a single one

Note that PartitionMask is not a Bijector. It is indeed a bijection, but does not follow the Bijector interface.

Its main use is in Coupling where we want to partition the input into 3 parts, one part to transform, one part to map into the parameter-space of the transform applied to the first part, and the last part of the vector is not used for anything.

Examples

julia> using Bijectors: PartitionMask, partition, combine

julia> m = PartitionMask(3, , ) # <= assumes input-length 3
[1, 1]  =  true,
[2, 1]  =  true,
[3, 1]  =  true)

julia> # Partition into 3 parts; the last part is inferred to be indices [3, ] from
# the fact that  and  does not make up all indices in 1:3.
x1, x2, x3 = partition(m, [1., 2., 3.])
([1.0], [2.0], [3.0])

julia> # Recombines the partitions into a vector
combine(m, x1, x2, x3)
3-element Array{Float64,1}:
1.0
2.0
3.0

Note that the underlying SparseMatrix is using Bool as the element type. We can also specify this to be some other type using the sp_type keyword:

julia> m = PartitionMask{Float32}(3, , )
[1, 1]  =  1.0,
[2, 1]  =  1.0,
[3, 1]  =  1.0)
Bijectors.PartitionMaskMethod
PartitionMask(n::Int, indices)

Assumes you want to split the vector, where indices refer to the parts of the vector you want to apply the bijector to.

Bijectors.PermuteType
Permute{A} <: Bijector{1}

A bijector implementation of a permutation. The permutation is performed using a matrix of type A. There are a couple of different ways to construct Permute:

Permute([0 1; 1 0])          # will map [1, 2] => [2, 1]
Permute([2, 1])              # will map [1, 2] => [2, 1]
Permute(2, 2 => 1, 1 => 2)   # will map [1, 2] => [2, 1]
Permute(2, [1, 2] => [2, 1]) # will map [1, 2] => [2, 1]

If this is not clear, the examples might be of help.

Examples

A simple example is permuting a vector of size 3.

julia> b1 = Permute([
0 1 0;
1 0 0;
0 0 1
])
Permute{Array{Int64,2}}([0 1 0; 1 0 0; 0 0 1])

julia> b2 = Permute([2, 1, 3])           # specify all elements at once
Permute{SparseArrays.SparseMatrixCSC{Float64,Int64}}(

[2, 1]  =  1.0
[1, 2]  =  1.0
[3, 3]  =  1.0)

julia> b3 = Permute(3, 2 => 1, 1 => 2)    # element-wise
Permute{SparseArrays.SparseMatrixCSC{Float64,Int64}}(
[2, 1]  =  1.0
[1, 2]  =  1.0
[3, 3]  =  1.0)

julia> b4 = Permute(3, [1, 2] => [2, 1])  # block-wise
Permute{SparseArrays.SparseMatrixCSC{Float64,Int64}}(
[2, 1]  =  1.0
[1, 2]  =  1.0
[3, 3]  =  1.0)

julia> b1.A == b2.A == b3.A == b4.A
true

julia> b1([1., 2., 3.])
3-element Array{Float64,1}:
2.0
1.0
3.0

julia> b2([1., 2., 3.])
3-element Array{Float64,1}:
2.0
1.0
3.0

julia> b3([1., 2., 3.])
3-element Array{Float64,1}:
2.0
1.0
3.0

julia> b4([1., 2., 3.])
3-element Array{Float64,1}:
2.0
1.0
3.0

julia> inverse(b1)
Permute{LinearAlgebra.Transpose{Int64,Array{Int64,2}}}([0 1 0; 1 0 0; 0 0 1])

julia> inverse(b1)(b1([1., 2., 3.]))
3-element Array{Float64,1}:
1.0
2.0
3.0
Bijectors.RationalQuadraticSplineType
RationalQuadraticSpline{T, 0} <: Bijector{0}
RationalQuadraticSpline{T, 1} <: Bijector{1}

Implementation of the Rational Quadratic Spline flow .

• Outside of the interval [minimum(widths), maximum(widths)], this mapping is given by the identity map.
• Inside the interval it's given by a monotonic spline (i.e. monotonic polynomials connected at intermediate points) with endpoints fixed so as to continuously transform into the identity map.

For the sake of efficiency, there are separate implementations for 0-dimensional and 1-dimensional inputs.

Notes

There are two constructors for RationalQuadraticSpline:

• RationalQuadraticSpline(widths, heights, derivatives): it is assumed that widths,

heights, and derivatives satisfy the constraints that makes this a valid bijector, i.e.

• widths: monotonically increasing and length(widths) == K,
• heights: monotonically increasing and length(heights) == K,
• derivatives: non-negative and derivatives == derivatives[end] == 1.
• RationalQuadraticSpline(widths, heights, derivatives, B): other than than the lengths, no assumptions are made on parameters. Therefore we will transform the parameters s.t.:
• widths_new ∈ [-B, B]ᴷ⁺¹, where K == length(widths),
• heights_new ∈ [-B, B]ᴷ⁺¹, where K == length(heights),
• derivatives_new ∈ (0, ∞)ᴷ⁺¹ with derivatives_new == derivates_new[end] == 1, where (K - 1) == length(derivatives).

Examples

Univariate

julia> using Bijectors: RationalQuadraticSpline

julia> K = 3; B = 2;

julia> # Monotonic spline on '[-B, B]' with K intermediate knots/"connection points".
b = RationalQuadraticSpline(randn(K), randn(K), randn(K - 1), B);

julia> b(0.5) # inside of [-B, B] → transformed
1.412300607463467

julia> b(5.) # outside of [-B, B] → not transformed
5.0

Or we can use the constructor with the parameters correctly constrained:

julia> b = RationalQuadraticSpline(b.widths, b.heights, b.derivatives);

julia> b(0.5) # inside of [-B, B] → transformed
1.412300607463467

Multivariate

julia> d = 2; K = 3; B = 2;

julia> b = RationalQuadraticSpline(randn(d, K), randn(d, K), randn(d, K - 1), B);

julia> b([-1., 1.])
2-element Array{Float64,1}:
-1.2568224171342797
0.5537259740554675

julia> b([-5., 5.])
2-element Array{Float64,1}:
-5.0
5.0

julia> b([-1., 5.])
2-element Array{Float64,1}:
-1.2568224171342797
5.0

References

 Durkan, C., Bekasov, A., Murray, I., & Papamakarios, G., Neural Spline Flows, CoRR, arXiv:1906.04032 [stat.ML], (2019).

Bijectors.StackedType
Stacked(bs)
Stacked(bs, ranges)
stack(bs::Bijector{0}...) # where 0 means 0-dim Bijector

A Bijector which stacks bijectors together which can then be applied to a vector where bs[i]::Bijector is applied to x[ranges[i]]::UnitRange{Int}.

Arguments

• bs can be either a Tuple or an AbstractArray of 0- and/or 1-dimensional bijectors
• If bs is a Tuple, implementations are type-stable using generated functions
• If bs is an AbstractArray, implementations are not type-stable and use iterative methods
• ranges needs to be an iterable consisting of UnitRange{Int}
• length(bs) == length(ranges) needs to be true.

Examples

b1 = Logit(0.0, 1.0)
b2 = Identity{0}()
b = stack(b1, b2)
b([0.0, 1.0]) == [b1(0.0), 1.0]  # => true