# ๐  2d Ginzburg-Landau equation (finite differences, codim 2, Hopf aBS)

This example is also treated in the MATLAB library pde2path.

We look at the Ginzburg-Landau equations in 2d. The code is very similar to the Brusselator example except that some special care has to be taken in order to cope with the "high" dimensionality of the problem.

Note that we try to be pedagogical here. Hence, we may write "bad" code that we improve later. Finally, we could use all sort of tricks to take advantage of the specificity of the problem. Rather, we stay quite close to the example in the MATLAB library pde2path (and discussed in Hopf Bifurcation and Time Periodic Orbits with Pde2path โ Algorithms and Applications., Uecker, Hannes, Communications in Computational Physics 25, no. 3 (2019)) for fair comparison.

Goal

We do not use automatic branch switching here. The goal is to show our to use the internals of the package to squeeze most of the performances, use tailored options...

The equations are as follows

$$$\partial_{t} u=\Delta u+(r+\mathrm{i} v) u-\left(c_{3}+\mathrm{i} \mu\right)|u|^{2} u-c_{5}|u|^{4} u+\gamma, \quad u=u(t, x) \in \mathbb{C},\quad \gamma\in\mathbb R$$$

with Dirichlet boundary conditions. We discretize the square $\Omega = (0,L_x)\times(0,L_y)$ with $2N_xN_y$ points. We start by writing the Laplacian:

using Revise
using DiffEqOperators, ForwardDiff
using BifurcationKit, LinearAlgebra, Plots, SparseArrays, Parameters
const BK = BifurcationKit

function Laplacian2D(Nx, Ny, lx, ly)
hx = 2lx/Nx; hy = 2ly/Ny
D2x = CenteredDifference(2, 2, hx, Nx)
D2y = CenteredDifference(2, 2, hy, Ny)

Qx = Dirichlet0BC(typeof(hx))
Qy = Dirichlet0BC(typeof(hy))

D2xsp = sparse(D2x * Qx)[1]
D2ysp = sparse(D2y * Qy)[1]

A = kron(sparse(I, Ny, Ny), D2xsp) + kron(D2ysp, sparse(I, Nx, Nx))
return A
end

It is then straightforward to write the vector field

# this encodes the nonlinearity
function NL(u, p)
@unpack r, ฮผ, ฮฝ, c3, c5, ฮณ = p
n = div(length(u), 2)
u1 = @view u[1:n]
u2 = @view u[n+1:2n]

ua = u1.^2 .+ u2.^2

f = similar(u)
f1 = @view f[1:n]
f2 = @view f[n+1:2n]

@. f1 .= r * u1 - ฮฝ * u2 - ua * (c3 * u1 - ฮผ * u2) - c5 * ua^2 * u1 + ฮณ
@. f2 .= r * u2 + ฮฝ * u1 - ua * (c3 * u2 + ฮผ * u1) - c5 * ua^2 * u2

return f
end

function Fcgl(u, p)
f = similar(u)
mul!(f, p.ฮ, u)
f .= f .+ NL(u, p)
end

and its jacobian:

function Jcgl(u, p)
@unpack r, ฮผ, ฮฝ, c3, c5, ฮ = p

n = div(length(u), 2)
u1 = @view u[1:n]
u2 = @view u[n+1:2n]

ua = u1.^2 .+ u2.^2

f1u = zero(u1)
f2u = zero(u1)
f1v = zero(u1)
f2v = zero(u1)

@. f1u =  r - 2 * u1 * (c3 * u1 - ฮผ * u2) - c3 * ua - 4 * c5 * ua * u1^2 - c5 * ua^2
@. f1v = -ฮฝ - 2 * u2 * (c3 * u1 - ฮผ * u2)  + ฮผ * ua - 4 * c5 * ua * u1 * u2
@. f2u =  ฮฝ - 2 * u1 * (c3 * u2 + ฮผ * u1)  - ฮผ * ua - 4 * c5 * ua * u1 * u2
@. f2v =  r - 2 * u2 * (c3 * u2 + ฮผ * u1) - c3 * ua - 4 * c5 * ua * u2 ^2 - c5 * ua^2

jacdiag = vcat(f1u, f2v)

ฮ + spdiagm(0 => jacdiag, n => f1v, -n => f2u)
end

We now define the parameters and the stationary solution:

Nx = 41
Ny = 21
n = Nx * Ny
lx = pi
ly = pi/2

ฮ = Laplacian2D(Nx, Ny, lx, ly)
par_cgl = (r = 0.5, ฮผ = 0.1, ฮฝ = 1.0, c3 = -1.0, c5 = 1.0, ฮ = blockdiag(ฮ, ฮ), ฮณ = 0.)
sol0 = zeros(2Nx, Ny)

# we make a problem
prob = BK.BifurcationProblem(Fcgl, vec(sol0), par_cgl, (@lens _.r); J = Jcgl)

and we continue it to find the Hopf bifurcation points. We use a Shift-Invert eigensolver.

# Shift-Invert eigensolver
eigls = EigArpack(1.0, :LM) # shift = 1.0
opt_newton = NewtonPar(tol = 1e-10, verbose = true, eigsolver = eigls)
opts_br = ContinuationPar(dsmin = 0.001, dsmax = 0.005, ds = 0.001, p_max = 2., detect_bifurcation = 3, nev = 5, plot_every_step = 50, newton_options = opt_newton, max_steps = 1060)

br = continuation(prob, PALC(), opts_br)
 โโ Curve type: EquilibriumCont
โโ Number of points: 216
โโ Type of vectors: Vector{Float64}
โโ Parameter r starts at 0.5, ends at 2.0
โโ Algo: PALC
โโ Special points:

- #  1,     hopf at r โ +1.14777610 โ (+1.14766562, +1.14777610), |ฮดp|=1e-04, [converged], ฮด = ( 2,  2), step =  94
- #  2,     hopf at r โ +1.86107007 โ (+1.86018618, +1.86107007), |ฮดp|=9e-04, [converged], ฮด = ( 2,  2), step = 195
- #  3, endpoint at r โ +2.00000000,                                                                     step = 215


which gives

scene = plot(br, ylims=(-0.1,0.1))

## Normal form computation

We compute the Hopf normal form of the first bifurcation point.

hopfpt = get_normal_form(br, 1)
SubCritical - Hopf bifurcation point at r โ 1.1477761028276166.
Frequency ฯ โ 1.0000000000000018
Period of the periodic orbit โ 6.283185307179575
Normal form zโ(iฯ + aโฮดp + bโ|z|ยฒ):
โโ a = 0.9999993693687919 - 9.533101401409105e-9im
โโ b = 0.00487012987012987 + 0.00048701298701298696im


So the Hopf branch is subcritical.

## Codim 2 Hopf continuation

Having detected 2 hopf bifurcation points, we now continue them in the plane $(\gamma, r)$.

Before we start the codim 2 continuation, we tell BifurcationKit.jl to use the spectral information start_with_eigen = true because the left eigenvector of the Jacobian is simply not the conjugate of the right one.

# we perform Hopf continuation of the first Hopf point in br
ind_hopf = 1
br_hopf = @time continuation(
br, ind_hopf, (@lens _.ฮณ),
ContinuationPar(dsmin = 0.001, dsmax = 0.02, ds= 0.01, p_max = 6.5, p_min = -10., newton_options = opts_br.newton_options, detect_bifurcation = 1); plot = true,
update_minaug_every_step = 1,
normC = norminf,
detect_codim2_bifurcation = 2,
start_with_eigen = true,
jacobian_ma = :minaug, # specific to high dimensions
bdlinsolver = BorderingBLS(solver = DefaultLS(), check_precision = false))

plot(br_hopf, branchlabel = "Hopf curve", legend = :topleft)

with detailed information

br_hopf
 โโ Curve type: HopfCont
โโ Number of points: 46
โโ Type of vectors: BorderedArray{Vector{Float64}, Vector{Float64}}
โโ Parameters (:r, :ฮณ)
โโ Parameter ฮณ starts at 0.0, ends at 1.0098353536481628
โโ Algo: PALC
โโ Special points:

- #  1,       gh at ฮณ โ +0.31388746 โ (+0.30785258, +0.31388746), |ฮดp|=6e-03, [converged], ฮด = ( 0,  0), step =  14
- #  2,       bt at ฮณ โ +1.00983535 โ (+0.98529855, +1.00983535), |ฮดp|=2e-02, [converged], ฮด = ( 0,  0), step =  45
- #  3, endpoint at ฮณ โ +1.00983535,                                                                     step =  45


We can now construct the curve of Fold points branching off the Bogdanov-Takens bifurcation we just detected.

# find the index of the BT point
indbt = findfirst(x -> x.type == :bt, br_hopf.specialpoint)
# branch from the BT point
brfold = continuation(br_hopf, indbt, setproperties(br_hopf.contparams; detect_bifurcation = 1, max_steps = 20);
update_minaug_every_step = 1,
detect_codim2_bifurcation = 2,
callback_newton = BK.cbMaxNorm(1e5),
bdlinsolver = BorderingBLS(solver = DefaultLS(), check_precision = false),
jacobian_ma = :minaug, # specific to high dimensions
bothside = true, normC = norminf)

plot(br_hopf, branchlabel = "Hopf"); plot!(brfold, legend = :topleft, branchlabel = "Fold")

## Periodic orbits continuation with stability

Having found two Hopf bifurcation points, we aim at computing the periodic orbits branching from them. Like for the Brusselator example, we need to find some educated guess for the periodic orbits in order to have a successful Newton call.

The following code is very close to the one explained in the tutorial 1d Brusselator (advanced user) so we won't give too much details here.

We focus on the first Hopf bifurcation point. Note that, we do not improve the guess for the Hopf bifurcation point, e.g. by calling newtonHopf, as this is not really needed.

# index of the Hopf point we want to branch from
ind_hopf = 1

# number of time slices in the periodic orbit
M = 30

# periodic orbit initial guess from Hopf point
r_hopf, Th, orbitguess2, hopfpt, eigvec = guess_from_hopf(br, ind_hopf, opt_newton.eigsolver,
# we pass the number of time slices M, the amplitude 22*sqrt(0.1) and phase
M, 22*sqrt(0.1); phase = 0.25)

# flatten the initial guess
orbitguess_f2 = reduce(hcat, orbitguess2)
orbitguess_f = vcat(vec(orbitguess_f2), Th) |> vec

We create a problem to hold the functional and compute Periodic orbits based on Trapezoidal rule

poTrap = PeriodicOrbitTrapProblem(
# vector field and sparse Jacobian
re_make(prob, params = (@set par_cgl.r = r_hopf - 0.01)),
# parameters for the phase condition
real.(eigvec),
hopfpt.u,
# number of time slices
M,
# space dimension
2n,
# jacobian of the periodic orbit functional
jacobian = :FullMatrixFree)

We can use this (family) problem poTrap with newton on our periodic orbit guess to find a periodic orbit. Hence, one can be tempted to use

Don't run this!!

It uses too much memory

opts_po_cont = ContinuationPar(dsmin = 0.0001, dsmax = 0.03, ds= 0.001, p_max = 2.5, max_steps = 250, plot_every_step = 3, newton_options = (@set opt_po.linsolver = DefaultLS()))
br_po = @time continuation(poTrap, vec(sol0), PALC(), opts_po_cont)

However, the linear system associated to the newton iterations will be solved by forming the sparse jacobian of size $(2N_xN_yM+1)^2$ and the use of \ (based on LU decomposition). It takes way too much time and memory.

Instead, we use a preconditioner. We build the jacobian once, compute its incomplete LU decomposition (ILU) and use it as a preconditioner.

using IncompleteLU

# Sparse matrix representation of the jacobian of the periodic orbit functional
Jpo = poTrap(Val(:JacFullSparse), orbitguess_f, @set par_cgl.r = r_hopf - 0.01)

# incomplete LU factorization with threshold
Precilu = @time ilu(Jpo, ฯ = 0.005);

# we define the linear solver with left preconditioner Precilu
ls = GMRESIterativeSolvers(verbose = false, reltol = 1e-3, N = size(Jpo,1), restart = 40, maxiter = 50, Pl = Precilu, log=true)

# we try the linear solver
ls(Jpo, rand(ls.N))

This converges in 7 iterations whereas, without the preconditioner, it does not converge after 100 iterations.

We set the parameters for the newton solve.

opt_po = @set opt_newton.verbose = true
outpo_f = @time newton(poTrap, orbitguess_f,  (@set opt_po.linsolver = ls); normN = norminf)
BK.converged(outpo_f) && printstyled(color=:red, "--> T = ", outpo_f.u[end], ", amplitude = ", BK.amplitude(outpo_f.u, Nx*Ny, M; ratio = 2),"\n")
BK.plotPeriodicPOTrap(outpo_f.u, M, Nx, Ny; ratio = 2);

which gives

โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Newton step         residual     linear iterations  โ
โโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโค
โ       0     โ       6.5432e-03     โ        0       โ
โ       1     โ       1.4372e-03     โ        8       โ
โ       2     โ       3.6731e-04     โ        8       โ
โ       3     โ       6.5658e-05     โ        9       โ
โ       4     โ       4.3028e-06     โ       10       โ
โ       5     โ       6.4509e-08     โ       11       โ
โ       6     โ       2.9713e-10     โ       12       โ
โ       7     โ       2.2181e-13     โ       14       โ
โโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโ
1.780986 seconds (132.31 k allocations: 1.237 GiB, 12.13% gc time)
--> T = 6.532023020978835, amplitude = 0.2684635643839235

and

At this point, we are still wasting a lot of resources, because the matrix-free version of the jacobian of the functional uses the jacobian of the vector field x -> Jcgl(x, p). Hence, it builds M sparse matrices for each evaluation!! Let us create a problem which is fully Matrix Free:

# computation of the first derivative using automatic differentiation
d1Fcgl(x, p, dx) = ForwardDiff.derivative(t -> Fcgl(x .+ t .* dx, p), 0.)

# linear solver for solving Jcgl*x = rhs. Needed for Floquet multipliers computation
ls0 = GMRESIterativeSolvers(N = 2Nx*Ny, reltol = 1e-9, Pl = lu(I + par_cgl.ฮ))

poTrapMF = PeriodicOrbitTrapProblem(
# vector field and sparse Jacobian
re_make(prob, params = (@set par_cgl.r = r_hopf - 0.01), J = (x, p) ->  (dx -> d1Fcgl(x, p, dx))),
# parameters for the phase condition
real.(eigvec),
hopfpt.u,
# number of time slices
M,
# space dimension
2n,
ls0,
# jacobian of the periodic orbit functional
jacobian = :FullMatrixFree)

We can now use newton

outpo_f = @time newton(poTrapMF, orbitguess_f, (@set opt_po.linsolver = ls); normN = norminf)
BK.converged(outpo_f) && printstyled(color=:red, "--> T = ", outpo_f.u[end], ", amplitude = ", BK.amplitude(outpo_f.u, Nx*Ny, M; ratio = 2),"\n")

which gives

โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Newton step         residual     linear iterations  โ
โโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโค
โ       0     โ       6.5432e-03     โ        0       โ
โ       1     โ       1.4372e-03     โ        8       โ
โ       2     โ       3.6731e-04     โ        8       โ
โ       3     โ       6.5658e-05     โ        9       โ
โ       4     โ       4.3028e-06     โ       10       โ
โ       5     โ       6.4509e-08     โ       11       โ
โ       6     โ       2.9713e-10     โ       12       โ
โ       7     โ       2.2188e-13     โ       14       โ
โโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโ
1.322440 seconds (35.03 k allocations: 459.996 MiB, 7.63% gc time)

The speedup will increase a lot for larger $N_x, N_y$. Also, for Floquet multipliers computation, the speedup will be substantial.

### Removing most allocations (Advanced and Experimental)

We show here how to remove most allocations and speed up the computations. This is an experimental feature as the Floquet multipliers computation is not yet readily available in this case. To this end, we rewrite the functional using inplace formulation and trying to avoid allocations. This can be done as follows:

# compute just the nonlinearity
function NL!(f, u, p, t = 0.)
@unpack r, ฮผ, ฮฝ, c3, c5 = p
n = div(length(u), 2)
u1v = @view u[1:n]
u2v = @view u[n+1:2n]

f1 = @view f[1:n]
f2 = @view f[n+1:2n]

@inbounds for ii = 1:n
u1 = u1v[ii]
u2 = u2v[ii]
ua = u1^2+u2^2
f1[ii] = r * u1 - ฮฝ * u2 - ua * (c3 * u1 - ฮผ * u2) - c5 * ua^2 * u1
f2[ii] = r * u2 + ฮฝ * u1 - ua * (c3 * u2 + ฮผ * u1) - c5 * ua^2 * u2
end
return f
end

# derivative of the nonlinearity
function dNL!(f, u, p, du)
@unpack r, ฮผ, ฮฝ, c3, c5 = p
n = div(length(u), 2)
u1v = @view u[1:n]
u2v = @view u[n+1:2n]

du1v = @view du[1:n]
du2v = @view du[n+1:2n]

f1 = @view f[1:n]
f2 = @view f[n+1:2n]

@inbounds for ii = 1:n
u1 = u1v[ii]
u2 = u2v[ii]
du1 = du1v[ii]
du2 = du2v[ii]
ua = u1^2+u2^2
f1[ii] = (-5*c5*u1^4 + (-6*c5*u2^2 - 3*c3)*u1^2 + 2*ฮผ*u1*u2 - c5*u2^4 - c3*u2^2 + r) * du1 +
(-4*c5*u2*u1^3 + ฮผ*u1^2 + (-4*c5*u2^3 - 2*c3*u2)*u1 + 3*u2^2*ฮผ - ฮฝ) * du2

f2[ii] = (-4*c5*u2*u1^3 - 3*ฮผ*u1^2 + (-4*c5*u2^3 - 2*c3*u2)*u1 - u2^2*ฮผ + ฮฝ) * du1 + (-c5*u1^4 + (-6*c5*u2^2 - c3)*u1^2 - 2*ฮผ*u1*u2 - 5*c5*u2^4 - 3*c3*u2^2 + r) * du2
end

return f
end

# inplace vector field
function Fcgl!(f, u, p, t = 0.)
NL!(f, u, p)
mul!(f, p.ฮ, u, 1., 1.)
end

# inplace derivative of the vector field
function dFcgl!(f, x, p, dx)
dNL!(f, x, p, dx)
mul!(f, p.ฮ, dx, 1., 1.)
end

# we create a problem inplace
probInp = BifurcationProblem(Fcgl!, vec(sol0), (@set par_cgl.r = r_hopf - 0.01), (@lens _.r); J = dFcgl!, inplace = true)

We can now define an inplace functional

ls0 = GMRESIterativeSolvers(N = 2Nx*Ny, reltol = 1e-9)#, Pl = lu(I + par_cgl.ฮ))

poTrapMFi = PeriodicOrbitTrapProblem(
# vector field and sparse Jacobian
probInp,
# parameters for the phase condition
real.(eigvec),
hopfpt.u,
# number of time slices
M,
# space dimension
2n,
ls0,
# jacobian of the periodic orbit functional
jacobian = :FullMatrixFree)

and run the newton method:

outpo_f = @time newton(poTrapMFi, orbitguess_f, (@set opt_po.linsolver = ls); normN = norminf)

It gives

โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Newton step         residual     linear iterations  โ
โโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโค
โ       0     โ       6.5432e-03     โ        0       โ
โ       1     โ       1.4372e-03     โ        8       โ
โ       2     โ       3.6731e-04     โ        8       โ
โ       3     โ       6.5658e-05     โ        9       โ
โ       4     โ       4.3028e-06     โ       10       โ
โ       5     โ       6.4509e-08     โ       11       โ
โ       6     โ       2.9713e-10     โ       12       โ
โ       7     โ       2.2143e-13     โ       14       โ
โโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโ
1.179854 seconds (902 allocations: 151.500 MiB)

Notice the small speed boost but the reduced allocations. At this stage, further improvements could target the use of BlockBandedMatrices.jl for the Laplacian operator, etc.

### Other linear formulation

We could use another way to "invert" jacobian of the functional based on bordered techniques. We try to use an ILU preconditioner on the cyclic matrix $J_c$ (see Periodic orbits based on Trapezoidal rule) which has a smaller memory footprint:

Jpo2 = poTrap(Val(:JacCyclicSparse), orbitguess_f, @set par_cgl.r = r_hopf - 0.1)
Precilu = @time ilu(Jpo2, ฯ = 0.005)
ls2 = GMRESIterativeSolvers(verbose = false, reltol = 1e-3, N = size(Jpo2, 1), restart = 30, maxiter = 50, Pl = Precilu, log=true)

opt_po = @set opt_newton.verbose = true
outpo_f = @time newton((@set poTrapMF.jacobian = :BorderedMatrixFree),	orbitguess_f,
(@set opt_po.linsolver = ls2), normN = norminf)

but it gives:

โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Newton step         residual     linear iterations  โ
โโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโค
โ       0     โ       3.3298e-03     โ        0       โ
โ       1     โ       9.5088e-03     โ       34       โ
โ       2     โ       1.2807e-03     โ       26       โ
โ       3     โ       7.1393e-05     โ       29       โ
โ       4     โ       4.1625e-07     โ       36       โ
โ       5     โ       1.7924e-09     โ       44       โ
โ       6     โ       6.2725e-13     โ       60       โ
โโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโ
3.479920 seconds (62.64 k allocations: 1009.778 MiB, 5.19% gc time)

Hence, it seems better to use the previous preconditioner.

## Continuation of periodic solutions

We can now perform continuation of the newly found periodic orbit and compute the Floquet multipliers using Matrix-Free methods.

# set the eigensolver for the computation of the Floquet multipliers
opt_po = @set opt_po.eigsolver = EigKrylovKit(tol = 1e-3, xโ = rand(2n), verbose = 2, dim = 25)

# parameters for the continuation
opts_po_cont = ContinuationPar(dsmin = 0.0001, dsmax = 0.02, ds = 0.001, p_max = 2.2, max_steps = 250, plot_every_step = 3, newton_options = (@set opt_po.linsolver = ls),
nev = 5, tol_stability = 1e-7, detect_bifurcation = 0)

br_po = @time continuation(poTrapMF, outpo_f.u, PALC(),	opts_po_cont;
verbosity = 2,	plot = true,
linear_algo = BorderingBLS(solver = ls, check_precision = false),
plot_solution = (x, p; kwargs...) -> BK.plot_periodic_potrap(x, M, Nx, Ny; ratio = 2, kwargs...),
record_from_solution = (u, p) -> BK.getamplitude(poTrapMF, u, par_cgl; ratio = 2), normC = norminf)

This gives the following bifurcation diagram:

Improved performances

Although it would be "cheating" for fair comparisons with existing packages, there is a trick to compute the bifurcation diagram without using preconditionners. We will not detail it here but it allows to handle the case Nx = 200; Ny = 110; M = 30 and above.

We did not change the preconditioner in the previous example as it does not seem needed. Let us show how to do this nevertheless:

# callback which will be sent to newton.
# iteration in the arguments refers to newton iterations
function callbackPO(state; linsolver = ls, prob = poTrap, p = par_cgl, kwargs...)
@show ls.N keys(kwargs)
# we update the preconditioner every 10 continuation steps
if mod(kwargs[:iterationC], 10) == 9 && state.it == 1
@info "update Preconditioner"
Jpo = poTrap(Val(:JacCyclicSparse), state.x, (@set p.r = state.p))
Precilu = @time ilu(Jpo, ฯ = 0.003)
ls.Pl = Precilu
end
true
end

br_po = @time continuation(poTrapMF, outpo_f.u, PALC(),	opts_po_cont;
verbosity = 2,	plot = true,
callback_newton = callbackPO,
linear_algo = BorderingBLS(solver = ls, check_precision = false),
plot_solution = (x, p; kwargs...) -> BK.plot_periodic_potrap(x, M, Nx, Ny; ratio = 2, kwargs...),
record_from_solution = (u, p) -> BK.getamplitude(poTrapMF, u, par_cgl; ratio = 2), normC = norminf)

## Continuation of Fold of periodic orbits

We continue the Fold point of the first branch of the previous bifurcation diagram in the parameter plane $(r, c_5)$. To this end, we need to be able to compute the Hessian of the periodic orbit functional. This is not yet readily available so we turn to automatic differentiation.

using ForwardDiff

# computation of the second derivative of a function f
function d2Fcglpb(f, x, dx1, dx2)
return ForwardDiff.derivative(t2 -> ForwardDiff.derivative( t1-> f(x .+ t1 .* dx1 .+ t2 .* dx2,), 0.), 0.)
end

We select the Fold point from the branch br_po and redefine our linear solver to get the ILU preconditioner tuned close to the Fold point.

indfold = 2
foldpt = foldpoint(br_po, indfold)

Jpo = poTrap(Val(:JacFullSparse), orbitguess_f, (@set par_cgl.r = r_hopf - 0.1))
Precilu = @time ilu(Jpo, ฯ = 0.005);
ls = GMRESIterativeSolvers(verbose = false, reltol = 1e-4, N = size(Jpo, 1), restart = 40, maxiter = 60, Pl = Precilu);

We can then use our functional to call newtonFold unlike for a regular function (see Tutorial 1). Indeed, we specify the change the parameters too much to rely on a generic algorithm.

# define a bifurcation problem for the fold
# this will be made automatic in the future
probFold = BifurcationProblem((x, p) -> poTrap(x, p), foldpt, getparams(br), getlens(br);
J = (x, p) -> poTrap(Val(:JacFullSparse), x, p),
d2F = (x, p, dx1, dx2) -> d2Fcglpb(z -> poTrap(z, p), x, dx1, dx2))

outfold = @time BK.newton_fold(
br_po, indfold; #index of the fold point
prob = probFold,
# we change the linear solver for the one we
# defined above
options = (@set opt_po.linsolver = ls),
bdlinsolver = BorderingBLS(solver = ls, check_precision = false))
BK.converged(outfold) && printstyled(color=:red, "--> We found a Fold Point at ฮฑ = ", outfold.u.p," from ", br_po.specialpoint[indfold].param,"\n")

and this gives

โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Newton step         residual     linear iterations  โ
โโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโค
โ       0     โ       4.5937e-01     โ        0       โ
โ       1     โ       5.6013e-01     โ       20       โ
โ       2     โ       3.1385e-02     โ       23       โ
โ       3     โ       6.0620e-05     โ       29       โ
โ       4     โ       2.7839e-08     โ       39       โ
โ       5     โ       8.1593e-12     โ       45       โ
โโโโโโโโโโโโโโโดโโโโโโ-โโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโ
27.289005 seconds (1.07 M allocations: 24.444 GiB, 10.12% gc time)
--> We found a Fold Point at ฮฑ = 0.9470569704262517 from 0.9481896723164748

Finally, one can perform continuation of the Fold bifurcation point as follows CA NE MARCHE PAS

optcontfold = ContinuationPar(dsmin = 0.001, dsmax = 0.05, ds= 0.01, p_max = 40.1, p_min = -10., newton_options = (@set opt_po.linsolver = ls), max_steps = 20, detect_bifurcation = 0)

outfoldco = BK.continuation_fold(probFold,
br_po, indfold, (@lens _.c5),
optcontfold;
jacobian_ma = :minaug,
bdlinsolver = BorderingBLS(solver = ls, check_precision = false),
plot = true, verbosity = 2)

which yields:

There is still room for a lot of improvements here. Basically, the idea would be to use full Matrix-Free the jacobian functional and its transpose.

## Continuation of periodic orbits on the GPU (Advanced)

This is a very neat example all done on the GPU using the following ingredients: Matrix-Free computation of periodic orbits using preconditioners.

We now take advantage of the computing power of GPUs. The section is run on an NVIDIA Tesla V100. Given the small number of unknowns, we can (only) expect significant speedup in the application of the big preconditioner.

Note that we use the parameters Nx = 82; Ny = 42; M=30.

# computation of the first derivative
d1Fcgl(x, p, dx) = ForwardDiff.derivative(t -> Fcgl(x .+ t .* dx, p), 0.)

d1NL(x, p, dx) = ForwardDiff.derivative(t -> NL(x .+ t .* dx, p), 0.)

function dFcgl(x, p, dx)
f = similar(dx)
mul!(f, p.ฮ, dx)
nl = d1NL(x, p, dx)
f .= f .+ nl
end

We first load CuArrays

using CUDA
CUDA.allowscalar(false)
import LinearAlgebra: mul!, axpby!
mul!(x::CuArray, y::CuArray, ฮฑ::T) where {T <: Number} = (x .= ฮฑ .* y)
mul!(x::CuArray, ฮฑ::T, y::CuArray) where {T <: Number} = (x .= ฮฑ .* y)
axpby!(a::T, X::CuArray, b::T, Y::CuArray) where {T <: Number} = (Y .= a .* X .+ b .* Y)

and update the parameters

par_cgl_gpu = @set par_cgl.ฮ = CUDA.CUSPARSE.CuSparseMatrixCSC(par_cgl.ฮ);

Then, we precompute the preconditioner on the CPU:

Jpo = poTrap(Val(:JacFullSparse), orbitguess_f, @set par_cgl.r = r_hopf - 0.01)
Precilu = @time ilu(Jpo, ฯ = 0.003)

To invert Precilu on the GPU, we need to define a few functions which are not in CuArrays and which are related to LU decomposition:

struct LUperso
L
Ut	# transpose of U in LU decomposition
end

import Base: ldiv!
function LinearAlgebra.ldiv!(_lu::LUperso, rhs::CUDA.CuArray)
_x = UpperTriangular(_lu.Ut) \ (LowerTriangular(_lu.L) \ rhs)
rhs .= vec(_x)
CUDA.unsafe_free!(_x)
rhs
end

Finally, for the methods in PeriodicOrbitTrapProblem to work, we need to redefine the following method. Indeed, we disable the use of scalar on the GPU to increase the speed.

import BifurcationKit: extractPeriodFDTrap
extractPeriodFDTrap(x::CuArray) = x[end:end]

We can now define our functional:

# we create a problem for the vector field on the GUP
probGPU = BifurcationProblem(Fcgl,
vec(CuArrays(zeros((x, p) ->  (dx -> dFcgl(x, p, dx))))),
(@set par_cgl_gpu.r = r_hopf - 0.01),
(@lens _.r);
J = (x, p) ->  (dx -> dFcgl(x, p, dx)))

# matrix-free problem on the gpu
ls0gpu = GMRESKrylovKit(rtol = 1e-9)
poTrapMFGPU = PeriodicOrbitTrapProblem(
probGPU,
CuArray(real.(eigvec)),
CuArray(hopfpt.u),
M, 2n, ls0gpu;
jacobian = :FullMatrixFree,
ongpu = true) # this is required to alter the way the constraint is handled

Let us have a look at the linear solvers and compare the speed on CPU and GPU:

ls = GMRESKrylovKit(verbose = 2, Pl = Precilu, rtol = 1e-3, dim  = 20)
# runs in 	2.990495 seconds (785 allocations: 31.564 MiB, 0.98% gc time)
outh, = @time ls((Jpo), orbitguess_f)

Precilu_gpu = LUperso(LowerTriangular(CUDA.CUSPARSE.CuSparseMatrixCSR(I+Precilu.L)), UpperTriangular(CUDA.CUSPARSE.CuSparseMatrixCSR(sparse(Precilu.U'))));
lsgpu = GMRESKrylovKit(verbose = 2, Pl = Precilu_gpu, rtol = 1e-3, dim  = 20)
Jpo_gpu = CUDA.CUSPARSE.CuSparseMatrixCSR(Jpo);
orbitguess_cu = CuArray(orbitguess_f)
# runs in 1.751230 seconds (6.54 k allocations: 188.500 KiB, 0.43% gc time)
outd, = @time lsgpu(Jpo_gpu, orbitguess_cu)

So we can expect a pretty descent x2 speed up in computing the periodic orbits. We can thus call newton:

opt_po = @set opt_newton.verbose = true
outpo_f = @time newton(poTrapMFGPU, orbitguess_cu,
(@set opt_po.linsolver = lsgpu);
normN = x->maximum(abs.(x)))

The computing time is 6.914367 seconds (2.94 M allocations: 130.348 MiB, 1.10% gc time). The same computation on the CPU, runs in 13.972836 seconds (551.41 k allocations: 1.300 GiB, 1.05% gc time).

You can also perform continuation, here is a simple example:

opts_po_cont = ContinuationPar(dsmin = 0.0001, dsmax = 0.02, ds= 0.001, p_max = 2.2, max_steps = 250, plot_every_step = 3, newton_options = (@set opt_po.linsolver = lsgpu), detect_bifurcation = 0)
br_po = @time continuation(poTrapMFGPU, orbitguess_cu, PALC(),
opts_po_cont;
verbosity = 2,
record_from_solution = (u, p) -> getAmplitude(poTrapMFGPU, u, par_cgl_gpu), normC = x->maximum(abs.(x)))
Preconditioner update

For now, the preconditioner has been precomputed on the CPU which forbids its (efficient) update during continuation of a branch of periodic orbits. This could be improved using ilu0! and friends in CuArrays.