Generative Modeling & Efficient Sampling for LQCD

Sam Foreman

2023-12-21

Generative Modeling & Efficient Sampling
(for Lattice QCD)

 

Sam Foreman
Xiao-Yong Jin, James C. Osborn
saforem2/lqcd-pasc23

Standard Model

  • Electricity & Magnetism
  • Quantum Field Theory
    • Nuclear interactions
      • Strong + Weak Force
      • Observed particles
    • Quantum Chromodynamics (QCD):
      • Quark / gluon interactions in the nuclueus
      • Analytic progress is difficult1
        • Lattice QCD to the rescue! 🚀
  • known to be incomplete!

Figure 1: Feynman diagram for e^{+} + e^{-} \rightarrow 2\gamma

Magnetic Moment of the Muon

a_{\mu} = \frac{(g_{\mu} - 2)}{2}

a_{\mu}^{\mathrm{exp}} - a_{\mu}^{\mathrm{SM}} = \left(25.1 \pm 5.9\right)\cdot 10^{-10}

can Lattice QCD resolve this?

Markov Chain Monte Carlo (MCMC)

Goal

Generate independent samples \{x_{i}\}, such that1 \{x_{i}\} \sim p(x) \propto e^{-S(x)} where S(x) is the action (or potential energy)

  • Want to calculate observables \mathcal{O}:
    \left\langle \mathcal{O}\right\rangle \propto \int \left[\mathcal{D}x\right]\hspace{4pt} {\mathcal{O}(x)\, p(x)}

If these were independent, we could approximate: \left\langle\mathcal{O}\right\rangle \simeq \frac{1}{N}\sum^{N}_{n=1}\mathcal{O}(x_{n})
\sigma_{\mathcal{O}}^{2} = \frac{1}{N}\mathrm{Var}{\left[\mathcal{O} (x) \right]}\Longrightarrow \sigma_{\mathcal{O}} \propto \frac{1}{\sqrt{N}}

Markov Chain Monte Carlo (MCMC)

Goal

Generate independent samples \{x_{i}\}, such that1 \{x_{i}\} \sim p(x) \propto e^{-S(x)} where S(x) is the action (or potential energy)

  • Want to calculate observables \mathcal{O}:
    \left\langle \mathcal{O}\right\rangle \propto \int \left[\mathcal{D}x\right]\hspace{4pt} {\mathcal{O}(x)\, p(x)}

Instead, nearby configs are correlated, and we incur a factor of \textcolor{#FF5252}{\tau^{\mathcal{O}}_{\mathrm{int}}}: \sigma_{\mathcal{O}}^{2} = \frac{\textcolor{#FF5252}{\tau^{\mathcal{O}}_{\mathrm{int}}}}{N}\mathrm{Var}{\left[\mathcal{O} (x) \right]}

Hamiltonian Monte Carlo (HMC)1

  • Want to (sequentially) construct a chain of states: x_{0} \rightarrow x_{1} \rightarrow x_{i} \rightarrow \cdots \rightarrow x_{N}\hspace{10pt}

    such that, as N \rightarrow \infty: \left\{x_{i}, x_{i+1}, x_{i+2}, \cdots, x_{N} \right\} \xrightarrow[]{N\rightarrow\infty} p(x) \propto e^{-S(x)}

Trick

  • Introduce fictitious momentum v \sim \mathcal{N}(0, \mathbb{1})
    • Normally distributed independent of x, i.e. \begin{align*} p(x, v) &\textcolor{#02b875}{=} p(x)p(v) \propto e^{-S{(x)}} e^{-\frac{1}{2} v^{T}v} = e^{-\left[S(x) + \frac{1}{2} v^{T}{v}\right]} \textcolor{#02b875}{=} e^{-H(x, v)} \end{align*}

Hamiltonian Monte Carlo (HMC)

  • Idea: Evolve the (\dot{x}, \dot{v}) system to get new states \{x_{i}\}

  • Write the joint distribution p(x, v): p(x, v) \propto e^{-S[x]} e^{-\frac{1}{2}v^{T} v} = e^{-H(x, v)}

Hamiltonian Dynamics

H = S[x] + \frac{1}{2} v^{T} v \Longrightarrow \dot{x} = +\partial_{v} H, \,\,\dot{v} = -\partial_{x} H

Figure 2: Overview of HMC algorithm

Leapfrog Integrator (HMC)

Hamiltonian Dynamics

\left(\dot{x}, \dot{v}\right) = \left(\partial_{v} H, -\partial_{x} H\right)

Leapfrog Step

input \,\left(x, v\right) \rightarrow \left(x', v'\right)\, output

\begin{align*} \tilde{v} &:= \textcolor{#F06292}{\Gamma}(x; v)\hspace{2.2pt} = v - \frac{\varepsilon}{2} \partial_{x} S(x) \\ x' &:= \textcolor{#FD971F}{\Lambda}(x; \tilde{v}) \, = x + \varepsilon \, \tilde{v} \\ v' &:= \textcolor{#F06292}{\Gamma}(x'; \tilde{v}) = \tilde{v} - \frac{\varepsilon}{2} \partial_{x} S(x') \end{align*}

Warning!

Resample v_{0} \sim \mathcal{N}(0, \mathbb{1})
at the beginning of each trajectory

Note: \partial_{x} S(x) is the force

HMC Update

  • We build a trajectory of N_{\mathrm{LF}} leapfrog steps1 \begin{equation*} (x_{0}, v_{0})% \rightarrow (x_{1}, v_{1})\rightarrow \cdots% \rightarrow (x', v') \end{equation*}

  • And propose x' as the next state in our chain

\begin{align*} \textcolor{#F06292}{\Gamma}: (x, v) \textcolor{#F06292}{\rightarrow} v' &:= v - \frac{\varepsilon}{2} \partial_{x} S(x) \\ \textcolor{#FD971F}{\Lambda}: (x, v) \textcolor{#FD971F}{\rightarrow} x' &:= x + \varepsilon v \end{align*}

  • We then accept / reject x' using Metropolis-Hastings criteria,
    A(x'|x) = \min\left\{1, \frac{p(x')}{p(x)}\left|\frac{\partial x'}{\partial x}\right|\right\}

HMC Demo

Figure 3: HMC Demo

Issues with HMC

  • What do we want in a good sampler?
    • Fast mixing (small autocorrelations)
    • Fast burn-in (quick convergence)
  • Problems with HMC:
    • Energy levels selected randomly \rightarrow slow mixing
    • Cannot easily traverse low-density zones \rightarrow slow convergence

HMC Samples with \varepsilon=0.25

HMC Samples with \varepsilon=0.5

Figure 4: HMC Samples generated with varying step sizes \varepsilon

Can we do better1?

L2HMC

  • Generalize HMC by introducing 6 functions:
    • x-update: \psi_{\theta}: (x, v) \longrightarrow \left(s_{x},\, t_{x},\, q_{x}\right)
    • v-update: \varphi_{\theta}: (x, v) \longrightarrow \left(s_{v},\, t_{v},\, q_{v}\right) where \psi_{\theta}, \varphi_{\theta} are NNs, parameterized by weights \theta
  • These functions, \left(s_{k}, t_{k}, q_{k}\right) \quad \text{ with } \quad k \in \{x, v\} are then used in a generalized leapfrog update to generate (x, v) \longrightarrow (x', v')

L2HMC: Leapfrog Layer

Algorithm: L2HMC Update

  1. input: \mathbf{x}

    • Resample {\mathbf{v} \sim \mathcal{N}(0, \mathbb{1})}, {d\sim\mathcal{U}(+, -)}, and construct \textcolor{#00CCFF}{\xi} =(\mathbf{x}, {\mathbf{v}}, {\pm})
  2. forward: Generate proposal \xi^{\ast} by passing initial \xi through N_{\mathrm{LF}} leapfrog layers
    \textcolor{#00CCFF} \xi \hspace{1pt}\xrightarrow[]{\tiny{\mathrm{LF} \text{ layer}}}\xi_{1} \longrightarrow\cdots \longrightarrow \xi_{N_{\mathrm{LF}}} = \textcolor{#AE81FF}{\xi^{\ast}}

    • Compute the Metropolis-Hastings (MH) acceptance (with Jacobian \mathcal{J}) \begin{equation} A({\textcolor{#AE81FF}{\xi^{\ast}}}|{\textcolor{#00CCFF}{\xi}})= \mathrm{min}\left\{1, \frac{p(\textcolor{#AE81FF}{\xi^{\ast}})}{p(\textcolor{#00CCFF}{\xi})} \left| \mathcal{J}\left(\textcolor{#AE81FF}{\xi^{\ast}},\textcolor{#00CCFF}{\xi}\right)\right| \right\} \end{equation}
  3. backward (if training):

    • Evaluate the loss function1 \mathcal{L}\gets \mathcal{L}_{\theta}(\textcolor{#AE81FF}{\xi^{\ast}}, \textcolor{#00CCFF}{\xi}) and backprop
  4. return: Evaluate MH criteria (1) and return accepted config, \mathbf{x}_{i+1} \mathbf{x}_{i+1}\gets \begin{cases} \textcolor{#AE81FF}{\mathbf{x}^{\ast}} \small{\text{ w/ prob }} A(\textcolor{#AE81FF}{\xi^{\ast}}|\textcolor{#00CCFF}{\xi}) \hspace{26pt} ✅ \\ \textcolor{#00CCFF}{\mathbf{x}} \hspace{5pt}\small{\text{ w/ prob }} 1 - A(\textcolor{#AE81FF}{\xi^{\ast}}|{\textcolor{#00CCFF}{\xi}}) \hspace{10pt} 🚫 \end{cases}

Toy Example: GMM \in \mathbb{R}^{2}

Lattice Gauge Theory (2D U(1))

Link Variables

U_{\mu}(n) = e^{i x_{\mu}(n)}\in \mathbb{C},\quad \text{where}\quad x_{\mu}(n) \in [-\pi,\pi)

Wilson Action

S_{\beta}(x) = \beta\sum_{P} \cos \textcolor{#00CCFF}{x_{P}},

\textcolor{#00CCFF}{x_{P}} = \left[x_{\mu}(n) + x_{\nu}(n+\hat{\mu}) - x_{\mu}(n+\hat{\nu})-x_{\nu}(n)\right]

Note: \textcolor{#00CCFF}{x_{P}} is the product of links around 1\times 1 square, called a “plaquette”

2D Lattice

Physical Quantities

  • To estimate physical quantities, we:
    • calculate physical observables at increasing spatial resolution
    • perform extrapolation to continuum limit

Figure 5: Increasing the physical resolution (a \rightarrow 0) allows us to make predictions about numerical values of physical quantities in the continuum limit.

Topological Freezing

Topological Charge: Q = \frac{1}{2\pi}\sum_{P}\left\lfloor x_{P}\right\rfloor \in \mathbb{Z}

note: \left\lfloor x_{P} \right\rfloor = x_{P} - 2\pi \left\lfloor\frac{x_{P} + \pi}{2\pi}\right\rfloor

Critical Slowing Down

  • Q gets stuck!
    • as \beta\longrightarrow \infty:
      • Q \longrightarrow \text{const.}
      • \delta Q = \left(Q^{\ast} - Q\right) \rightarrow 0 \textcolor{#FF5252}{\Longrightarrow}
    • # configs required to estimate errors
      grows exponentially: \tau_{\mathrm{int}}^{Q} \longrightarrow \infty

Note \delta Q \rightarrow 0 at increasing \beta

Loss Function

  • Want to maximize the expected squared charge difference1: \begin{equation*} \mathcal{L}_{\theta}\left(\xi^{\ast}, \xi\right) = {\mathbb{E}_{p(\xi)}}\big[-\textcolor{#FA5252}{{\delta Q}}^{2} \left(\xi^{\ast}, \xi \right)\cdot A(\xi^{\ast}|\xi)\big] \end{equation*}

  • Where:

    • \delta Q is the tunneling rate: \begin{equation*} \textcolor{#FA5252}{\delta Q}(\xi^{\ast},\xi)=\left|Q^{\ast} - Q\right| \end{equation*}

    • A(\xi^{\ast}|\xi) is the probability2 of accepting the proposal \xi^{\ast}: \begin{equation*} A(\xi^{\ast}|\xi) = \mathrm{min}\left( 1, \frac{p(\xi^{\ast})}{p(\xi)}\left|\frac{\partial \xi^{\ast}}{\partial \xi^{T}}\right|\right) \end{equation*}

Integrated Autocorrelation time: \tau_{\mathrm{int}}

Improvement

We can measure the performance by comparing \tau_{\mathrm{int}} for the trained model vs. HMC.

Note: lower is better

Integrated Autocorrelation Time

Figure 6: Plot of the integrated autocorrelation time for both the trained model (colored) and HMC (greyscale).

Interpretation

Deviation in x_{P}

Topological charge mixing

Artificial influx of energy

Figure 7: Illustration of how different observables evolve over a single L2HMC trajectory.

Interpretation

Average plaquette: \langle x_{P}\rangle vs LF step

Average energy: H - \sum\log|\mathcal{J}|

Figure 8: The trained model artifically increases the energy towards the middle of the trajectory, allowing the sampler to tunnel between isolated sectors.

Plaquette analysis: x_{P}

Deviation from V\rightarrow\infty limit, x_{P}^{\ast}

Average \langle x_{P}\rangle, with x_{P}^{\ast} (dotted-lines)

Figure 9: Plot showing how average plaquette, \left\langle x_{P}\right\rangle varies over a single trajectory for models trained at different \beta, with varying trajectory lengths N_{\mathrm{LF}}

Comparison

(a) Trained model

(b) Generic HMC

Figure 10: Comparison of \langle \delta Q\rangle = \frac{1}{N}\sum_{i=k}^{N} \delta Q_{i} for the trained model Figure 10 (a) vs. HMC Figure 10 (b)

2D U(1) Model

  • lattice.shape: [2, Nt, Nx]
  • maintain buffer of Nb chains,
    updated in parallel
    • x.shape: [Nb, 2, Nt, Nx]
  • to deal with x \in \mathbb{C}, stack as:
    [x.cos(), x.sin()]
  • final shapes:
    • v.shape: [Nb, 2, Nt, Nx]
    • x.shape: [Nb, 2, 2, Nt, Nx]

4D SU(3) Model1

  • link variables:
    • U_{\mu}(x) \in SU(3),
  • + conjugate momenta:
    • P_{\mu}(x) \in \mathfrak{su}(3)
  • We maintain a batch of Nb lattices, all updated in parallel
    • lattice.shape:
      • [4, Nt, Nx, Ny, Nz, 3, 3]
    • batch.shape:
      • [Nb, *lattice.shape]

Thank you!

samforeman.me
saforem2
@saforem2
foremans@anl.gov

Acknowledgements

This research used resources of the Argonne Leadership Computing Facility,
which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.

Acknowledgements

  • Huge thank you to:
    • Yannick Meurice
    • Norman Christ
    • Akio Tomiya
    • Luchang Jin
    • Chulwoo Jung
    • Peter Boyle
    • Taku Izubuchi
    • ECP-CSD group
    • ALCF Staff + Datascience Group

References

Boyda, Denis et al. 2022. Applications of Machine Learning to Lattice Quantum Field Theory.” In Snowmass 2021. https://arxiv.org/abs/2202.05838.
Duane, S., A. D. Kennedy, B. J. Pendleton, and D. Roweth. 1987. Hybrid Monte Carlo.” Phys. Lett. B 195: 216–22. https://doi.org/10.1016/0370-2693(87)91197-X.
Foreman, Sam, Taku Izubuchi, Luchang Jin, Xiao-Yong Jin, James C. Osborn, and Akio Tomiya. 2022. HMC with Normalizing Flows.” PoS LATTICE2021: 073. https://doi.org/10.22323/1.396.0073.
Foreman, Sam, Xiao-Yong Jin, and James C. Osborn. 2021. Deep Learning Hamiltonian Monte Carlo.” In 9th International Conference on Learning Representations. https://arxiv.org/abs/2105.03418.
———. 2022. LeapfrogLayers: A Trainable Framework for Effective Topological Sampling.” PoS LATTICE2021: 508. https://doi.org/10.22323/1.396.0508.
Shanahan, Phiala et al. 2022. Snowmass 2021 Computational Frontier CompF03 Topical Group Report: Machine Learning,” September. https://arxiv.org/abs/2209.07559.

Extras

Figure 11: Jupyter Notebook

Networks 2D U(1)

  • Stack gauge links as shape\left(U_{\mu}\right)=[Nb, 2, Nt, Nx] \in \mathbb{C}

    x_{\mu}(n) ≔ \left[\cos(x), \sin(x)\right]

    with shape\left(x_{\mu}\right)= [Nb, 2, Nt, Nx, 2] \in \mathbb{R}

  • x-Network:

    • \Lambda^{\pm}_{k}(x, v) \rightarrow \left[s^{k}_{x}, t^{k}_{x}, q^{k}_{x}\right]
  • v-Network:

    • \Gamma_{k}^{\pm}(x, v) \rightarrow \left[s^{k}_{v}, t^{k}_{v}, q^{k}_{v}\right]

v-update (pt. 1)

  • network1: \left(x, \partial_{x} S(x)\right) \coloneqq \left(x, F \right)\rightarrow (s_{v}, t_{v}, q_{v}), where \begin{align*} h_{0} &= \sigma\left( w_{x} x + w_{F} F + b \right) \\ h_{1} &= \sigma\left( w_{1} h_{0} + b_{1} \right) \\ &\vdots \\ h_{n} &= \sigma\left(w_{n} h_{n-1} + b_{n}\right) \longrightarrow \\ \end{align*} s_{v} = \lambda_{s} \tanh\left(w_{s} h_{n} + b_{s}\right), \quad t_{v} = w_{t} z + b_{t}, \quad q_{v} = \lambda_{q}w_{q} z + b_{q}

v-update (pt. 2)

  • Network outputs1: s_{v} = \lambda_{s} \tanh\left(w_{s}\, h_{n} + b_{s}\right), \quad t_{v} = w_{t} h_{n} + b_{t}, \quad q_{v} = \lambda_{q}\tanh\left(w_{q} h_{n} + b_{q}\right)

  • Use these to update \Gamma^{\pm}: (x, v) \rightarrow \left(x, v_{\pm}\right)2:

    • forward (d = \textcolor{#FF5252}{+}): \Gamma^{\textcolor{#FF5252}{+}}(x, v) \coloneqq v_{\textcolor{#FF5252}{+}} = v \cdot e^{\frac{\varepsilon}{2} s_{v}} - \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{v}} + t_{v} \right]

    • backward (d = \textcolor{#1A8FFF}{-}): \Gamma^{\textcolor{#1A8FFF}{-}}(x, v) \coloneqq v_{\textcolor{#1A8FFF}{-}} = e^{-\frac{\varepsilon}{2} s_{v}} \left\{v + \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{v}} + t_{v} \right]\right\}

Annealing Schedule

  • Introduce an annealing schedule during the training phase:

    \left\{ \gamma_{t} \right\}_{t=0}^{N} = \left\{\gamma_{0}, \gamma_{1}, \ldots, \gamma_{N-1}, \gamma_{N} \right\}

    where \gamma_{0} < \gamma_{1} < \cdots < \gamma_{N} \equiv 1, and \left|\gamma_{t+1} - \gamma_{t}\right| \ll 1

  • Note:

    • for \left|\gamma_{t}\right| < 1, this rescaling helps to reduce the height of the energy barriers \Longrightarrow
    • easier for our sampler to explore previously inaccessible regions of the phase space

LQCD @ ALCF (2008)

The Blue Gene/P at the ALCF has tremendously accelerated the generation of the gauge configurations—in many cases, by a factor of 5 to 10 over what has been possible with other machines. Significant progress has been made in simulations with two different implementations of the quarks—domain wall and staggered1

  — Mike Papka, 2008

HMC

flowchart LR
  v("v") --> G1{"Γ"}
  x0("x") -. "∂𝑆 / ∂𝒙" -.-> G1{"Γ"}
  G1 -.-> L{"Λ"}
  x0 --> L{"Λ"}
  G1 --> G2{"Γ"}
  L -.-> G2
  L ---> x'("x*")
  G2 --> v'("v*")
  style x0  color:#805AC1
  style x' color:#805AC1
  style v' color:#308E40
  style v color:#308E40
  style L color:#FD971F
  style G1 color:#ec407a
  style G2 color:#ec407a
  classDef default fill:#1c1c1c