Generative Modeling & Efficient Sampling
(for Lattice QCD)

 

Sam Foreman
Xiao-Yong Jin, James C. Osborn
saforem2/lqcd-pasc23

Standard Model

  • Electricity & Magnetism
  • Quantum Field Theory
    • Nuclear interactions
      • Strong + Weak Force
      • Observed particles
    • Quantum Chromodynamics (QCD):
      • Quark / gluon interactions in the nuclueus
      • Analytic progress is difficult1
        • Lattice QCD to the rescue! 🚀
  • known to be incomplete!

Figure 1: Feynman diagram for e^{+} + e^{-} \rightarrow 2\gamma

Magnetic Moment of the Muon

a_{\mu} = \frac{(g_{\mu} - 2)}{2}

a_{\mu}^{\mathrm{exp}} - a_{\mu}^{\mathrm{SM}} = \left(25.1 \pm 5.9\right)\cdot 10^{-10}

can Lattice QCD resolve this?

Fermilab: Muon g-2

Figure 2: The Muon g-2 ring sits in its detector hall amidst electronics racks, the muon beamline, and other equipment. This impressive experiment operates at negative 450 degrees Fahrenheit and studies the precession (or wobble) of muons as they travel through the magnetic field.

Figure 3: In the summer of 2013, the Muon g-2 team successfully transported a 50-foot-wide electromagnet from Long Island to the Chicago suburbs in one piece. The move took 35 days and traversed 3,200 miles over land and sea.

Figure 4: Thousands of people followed the move of the ring, and thousands were on hand to greet it upon its arrival at Fermilab.

Tension

Obviously, an independent cross-check of the BMW lattice result for a_{\mu}^{\mathrm{hvp, LO}} with sub-percent precision is badly needed. 
(Wittig 2023)

Figure 5: Full cartoon

Markov Chain Monte Carlo (MCMC)

Goal

Generate independent samples \{x_{i}\}, such that1 \{x_{i}\} \sim p(x) \propto e^{-S(x)} where S(x) is the action (or potential energy)

  • Want to calculate observables \mathcal{O}:
    \left\langle \mathcal{O}\right\rangle \propto \int \left[\mathcal{D}x\right]\hspace{4pt} {\mathcal{O}(x)\, p(x)}

If these were independent, we could approximate: \left\langle\mathcal{O}\right\rangle \simeq \frac{1}{N}\sum^{N}_{n=1}\mathcal{O}(x_{n})
\sigma_{\mathcal{O}}^{2} = \frac{1}{N}\mathrm{Var}{\left[\mathcal{O} (x) \right]}\Longrightarrow \sigma_{\mathcal{O}} \propto \frac{1}{\sqrt{N}}

Markov Chain Monte Carlo (MCMC)

Goal

Generate independent samples \{x_{i}\}, such that1 \{x_{i}\} \sim p(x) \propto e^{-S(x)} where S(x) is the action (or potential energy)

  • Want to calculate observables \mathcal{O}:
    \left\langle \mathcal{O}\right\rangle \propto \int \left[\mathcal{D}x\right]\hspace{4pt} {\mathcal{O}(x)\, p(x)}

Instead, nearby configs are correlated, and we incur a factor of \textcolor{#FF5252}{\tau^{\mathcal{O}}_{\mathrm{int}}}: \sigma_{\mathcal{O}}^{2} = \frac{\textcolor{#FF5252}{\tau^{\mathcal{O}}_{\mathrm{int}}}}{N}\mathrm{Var}{\left[\mathcal{O} (x) \right]}

Hamiltonian Monte Carlo (HMC)1

  • Want to (sequentially) construct a chain of states: x_{0} \rightarrow x_{1} \rightarrow x_{i} \rightarrow \cdots \rightarrow x_{N}\hspace{10pt}

    such that, as N \rightarrow \infty: \left\{x_{i}, x_{i+1}, x_{i+2}, \cdots, x_{N} \right\} \xrightarrow[]{N\rightarrow\infty} p(x) \propto e^{-S(x)}

Trick

  • Introduce fictitious momentum v \sim \mathcal{N}(0, \mathbb{1})
    • Normally distributed independent of x, i.e. \begin{align*} p(x, v) &\textcolor{#02b875}{=} p(x)p(v) \propto e^{-S{(x)}} e^{-\frac{1}{2} v^{T}v} = e^{-\left[S(x) + \frac{1}{2} v^{T}{v}\right]} \textcolor{#02b875}{=} e^{-H(x, v)} \end{align*}

Hamiltonian Monte Carlo (HMC)

  • Idea: Evolve the (\dot{x}, \dot{v}) system to get new states \{x_{i}\}

  • Write the joint distribution p(x, v): p(x, v) \propto e^{-S[x]} e^{-\frac{1}{2}v^{T} v} = e^{-H(x, v)}

Hamiltonian Dynamics

H = S[x] + \frac{1}{2} v^{T} v \Longrightarrow \dot{x} = +\partial_{v} H, \,\,\dot{v} = -\partial_{x} H

Figure 6: Overview of HMC algorithm

Leapfrog Integrator (HMC)

Hamiltonian Dynamics

\left(\dot{x}, \dot{v}\right) = \left(\partial_{v} H, -\partial_{x} H\right)

Leapfrog Step

input \,\left(x, v\right) \rightarrow \left(x', v'\right)\, output

\begin{align*} \tilde{v} &:= \textcolor{#F06292}{\Gamma}(x; v)\hspace{2.2pt} = v - \frac{\varepsilon}{2} \partial_{x} S(x) \\ x' &:= \textcolor{#FD971F}{\Lambda}(x; \tilde{v}) \, = x + \varepsilon \, \tilde{v} \\ v' &:= \textcolor{#F06292}{\Gamma}(x'; \tilde{v}) = \tilde{v} - \frac{\varepsilon}{2} \partial_{x} S(x') \end{align*}

Warning!

Resample v_{0} \sim \mathcal{N}(0, \mathbb{1})
at the beginning of each trajectory

Note: \partial_{x} S(x) is the force

HMC Update

  • We build a trajectory of N_{\mathrm{LF}} leapfrog steps1 \begin{equation*} (x_{0}, v_{0})% \rightarrow (x_{1}, v_{1})\rightarrow \cdots% \rightarrow (x', v') \end{equation*}

  • And propose x' as the next state in our chain

\begin{align*} \textcolor{#F06292}{\Gamma}: (x, v) \textcolor{#F06292}{\rightarrow} v' &:= v - \frac{\varepsilon}{2} \partial_{x} S(x) \\ \textcolor{#FD971F}{\Lambda}: (x, v) \textcolor{#FD971F}{\rightarrow} x' &:= x + \varepsilon v \end{align*}

  • We then accept / reject x' using Metropolis-Hastings criteria,
    A(x'|x) = \min\left\{1, \frac{p(x')}{p(x)}\left|\frac{\partial x'}{\partial x}\right|\right\}

HMC Demo

Figure 7: HMC Demo

Issues with HMC

  • What do we want in a good sampler?
    • Fast mixing (small autocorrelations)
    • Fast burn-in (quick convergence)
  • Problems with HMC:
    • Energy levels selected randomly \rightarrow slow mixing
    • Cannot easily traverse low-density zones \rightarrow slow convergence

HMC Samples with \varepsilon=0.25

HMC Samples with \varepsilon=0.5

Figure 8: HMC Samples generated with varying step sizes \varepsilon

Can we do better1?

L2HMC

  • Generalize HMC by introducing 6 functions:
    • x-update: \psi_{\theta}: (x, v) \longrightarrow \left(s_{x},\, t_{x},\, q_{x}\right)
    • v-update: \varphi_{\theta}: (x, v) \longrightarrow \left(s_{v},\, t_{v},\, q_{v}\right) where \psi_{\theta}, \varphi_{\theta} are NNs, parameterized by weights \theta
  • These functions, \left(s_{k}, t_{k}, q_{k}\right) \quad \text{ with } \quad k \in \{x, v\} are then used in a generalized leapfrog update to generate (x, v) \longrightarrow (x', v')

L2HMC: Leapfrog Layer

Algorithm: L2HMC Update

  1. input: \mathbf{x}

    • Resample {\mathbf{v} \sim \mathcal{N}(0, \mathbb{1})}, {d\sim\mathcal{U}(+, -)}, and construct \textcolor{#00CCFF}{\xi} =(\mathbf{x}, {\mathbf{v}}, {\pm})
  2. forward: Generate proposal \xi^{\ast} by passing initial \xi through N_{\mathrm{LF}} leapfrog layers
    \textcolor{#00CCFF} \xi \hspace{1pt}\xrightarrow[]{\tiny{\mathrm{LF} \text{ layer}}}\xi_{1} \longrightarrow\cdots \longrightarrow \xi_{N_{\mathrm{LF}}} = \textcolor{#AE81FF}{\xi^{\ast}}

    • Compute the Metropolis-Hastings (MH) acceptance (with Jacobian \mathcal{J}) \begin{equation} A({\textcolor{#AE81FF}{\xi^{\ast}}}|{\textcolor{#00CCFF}{\xi}})= \mathrm{min}\left\{1, \frac{p(\textcolor{#AE81FF}{\xi^{\ast}})}{p(\textcolor{#00CCFF}{\xi})} \left| \mathcal{J}\left(\textcolor{#AE81FF}{\xi^{\ast}},\textcolor{#00CCFF}{\xi}\right)\right| \right\} \end{equation}
  3. backward (if training):

    • Evaluate the loss function1 \mathcal{L}\gets \mathcal{L}_{\theta}(\textcolor{#AE81FF}{\xi^{\ast}}, \textcolor{#00CCFF}{\xi}) and backprop
  4. return: Evaluate MH criteria (1) and return accepted config, \mathbf{x}_{i+1} \mathbf{x}_{i+1}\gets \begin{cases} \textcolor{#AE81FF}{\mathbf{x}^{\ast}} \small{\text{ w/ prob }} A(\textcolor{#AE81FF}{\xi^{\ast}}|\textcolor{#00CCFF}{\xi}) \hspace{26pt} ✅ \\ \textcolor{#00CCFF}{\mathbf{x}} \hspace{5pt}\small{\text{ w/ prob }} 1 - A(\textcolor{#AE81FF}{\xi^{\ast}}|{\textcolor{#00CCFF}{\xi}}) \hspace{10pt} 🚫 \end{cases}

Toy Example: GMM \in \mathbb{R}^{2}

Lattice Gauge Theory (2D U(1))

Link Variables

U_{\mu}(n) = e^{i x_{\mu}(n)}\in \mathbb{C},\quad \text{where}\quad x_{\mu}(n) \in [-\pi,\pi)

Wilson Action

S_{\beta}(x) = \beta\sum_{P} \cos \textcolor{#00CCFF}{x_{P}},

\textcolor{#00CCFF}{x_{P}} = \left[x_{\mu}(n) + x_{\nu}(n+\hat{\mu}) - x_{\mu}(n+\hat{\nu})-x_{\nu}(n)\right]

Note: \textcolor{#00CCFF}{x_{P}} is the product of links around 1\times 1 square, called a “plaquette”

2D Lattice

Physical Quantities

  • To estimate physical quantities, we:
    • calculate physical observables at increasing spatial resolution
    • perform extrapolation to continuum limit

Figure 9: Increasing the physical resolution (a \rightarrow 0) allows us to make predictions about numerical values of physical quantities in the continuum limit.

Topological Freezing

Topological Charge: Q = \frac{1}{2\pi}\sum_{P}\left\lfloor x_{P}\right\rfloor \in \mathbb{Z}

note: \left\lfloor x_{P} \right\rfloor = x_{P} - 2\pi \left\lfloor\frac{x_{P} + \pi}{2\pi}\right\rfloor

Critical Slowing Down

  • Q gets stuck!
    • as \beta\longrightarrow \infty:
      • Q \longrightarrow \text{const.}
      • \delta Q = \left(Q^{\ast} - Q\right) \rightarrow 0 \textcolor{#FF5252}{\Longrightarrow}
    • # configs required to estimate errors
      grows exponentially: \tau_{\mathrm{int}}^{Q} \longrightarrow \infty

Note \delta Q \rightarrow 0 at increasing \beta

Loss Function

  • Want to maximize the expected squared charge difference1: \begin{equation*} \mathcal{L}_{\theta}\left(\xi^{\ast}, \xi\right) = {\mathbb{E}_{p(\xi)}}\big[-\textcolor{#FA5252}{{\delta Q}}^{2} \left(\xi^{\ast}, \xi \right)\cdot A(\xi^{\ast}|\xi)\big] \end{equation*}

  • Where:

    • \delta Q is the tunneling rate: \begin{equation*} \textcolor{#FA5252}{\delta Q}(\xi^{\ast},\xi)=\left|Q^{\ast} - Q\right| \end{equation*}

    • A(\xi^{\ast}|\xi) is the probability2 of accepting the proposal \xi^{\ast}: \begin{equation*} A(\xi^{\ast}|\xi) = \mathrm{min}\left( 1, \frac{p(\xi^{\ast})}{p(\xi)}\left|\frac{\partial \xi^{\ast}}{\partial \xi^{T}}\right|\right) \end{equation*}

Integrated Autocorrelation time: \tau_{\mathrm{int}}

Improvement

We can measure the performance by comparing \tau_{\mathrm{int}} for the trained model vs. HMC.

Note: lower is better

Integrated Autocorrelation Time

Figure 10: Plot of the integrated autocorrelation time for both the trained model (colored) and HMC (greyscale).

Interpretation

Deviation in x_{P}

Topological charge mixing

Artificial influx of energy

Figure 11: Illustration of how different observables evolve over a single L2HMC trajectory.

Interpretation

Average plaquette: \langle x_{P}\rangle vs LF step

Average energy: H - \sum\log|\mathcal{J}|

Figure 12: The trained model artifically increases the energy towards the middle of the trajectory, allowing the sampler to tunnel between isolated sectors.

Plaquette analysis: x_{P}

Deviation from V\rightarrow\infty limit, x_{P}^{\ast}

Average \langle x_{P}\rangle, with x_{P}^{\ast} (dotted-lines)

Figure 13: Plot showing how average plaquette, \left\langle x_{P}\right\rangle varies over a single trajectory for models trained at different \beta, with varying trajectory lengths N_{\mathrm{LF}}

Comparison

(a) Trained model

(b) Generic HMC

Figure 14: Comparison of \langle \delta Q\rangle = \frac{1}{N}\sum_{i=k}^{N} \delta Q_{i} for the trained model Figure 14 (a) vs. HMC Figure 14 (b)

2D U(1) Model

  • lattice.shape: [2, Nt, Nx]
  • maintain buffer of Nb chains,
    updated in parallel
    • x.shape: [Nb, 2, Nt, Nx]
  • to deal with x \in \mathbb{C}, stack as:
    [x.cos(), x.sin()]
  • final shapes:
    • v.shape: [Nb, 2, Nt, Nx]
    • x.shape: [Nb, 2, 2, Nt, Nx]

4D SU(3) Model1

  • link variables:
    • U_{\mu}(x) \in SU(3),
  • + conjugate momenta:
    • P_{\mu}(x) \in \mathfrak{su}(3)
  • We maintain a batch of Nb lattices, all updated in parallel
    • lattice.shape:
      • [4, Nt, Nx, Ny, Nz, 3, 3]
    • batch.shape:
      • [Nb, *lattice.shape]

Thank you!

samforeman.me
saforem2
@saforem2
foremans@anl.gov

Acknowledgements

This research used resources of the Argonne Leadership Computing Facility,
which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.

Acknowledgements

  • Huge thank you to:
    • Yannick Meurice
    • Norman Christ
    • Akio Tomiya
    • Luchang Jin
    • Chulwoo Jung
    • Peter Boyle
    • Taku Izubuchi
    • ECP-CSD group
    • ALCF Staff + Datascience Group

References

Boyda, Denis et al. 2022. Applications of Machine Learning to Lattice Quantum Field Theory.” In Snowmass 2021. https://arxiv.org/abs/2202.05838.
Duane, S., A. D. Kennedy, B. J. Pendleton, and D. Roweth. 1987. Hybrid Monte Carlo.” Phys. Lett. B 195: 216–22. https://doi.org/10.1016/0370-2693(87)91197-X.
Foreman, Sam, Taku Izubuchi, Luchang Jin, Xiao-Yong Jin, James C. Osborn, and Akio Tomiya. 2022. HMC with Normalizing Flows.” PoS LATTICE2021: 073. https://doi.org/10.22323/1.396.0073.
Foreman, Sam, Xiao-Yong Jin, and James C. Osborn. 2021. Deep Learning Hamiltonian Monte Carlo.” In 9th International Conference on Learning Representations. https://arxiv.org/abs/2105.03418.
———. 2022. LeapfrogLayers: A Trainable Framework for Effective Topological Sampling.” PoS LATTICE2021: 508. https://doi.org/10.22323/1.396.0508.
Shanahan, Phiala et al. 2022. Snowmass 2021 Computational Frontier CompF03 Topical Group Report: Machine Learning,” September. https://arxiv.org/abs/2209.07559.
Wittig, Hartmut. 2023. “Progress on (g-2)_\mu from Lattice QCD.” https://arxiv.org/abs/2306.04165.

Extras

Figure 15: Jupyter Notebook

Networks 2D U(1)

  • Stack gauge links as shape\left(U_{\mu}\right)=[Nb, 2, Nt, Nx] \in \mathbb{C}

    x_{\mu}(n) ≔ \left[\cos(x), \sin(x)\right]

    with shape\left(x_{\mu}\right)= [Nb, 2, Nt, Nx, 2] \in \mathbb{R}

  • x-Network:

    • \Lambda^{\pm}_{k}(x, v) \rightarrow \left[s^{k}_{x}, t^{k}_{x}, q^{k}_{x}\right]
  • v-Network:

    • \Gamma_{k}^{\pm}(x, v) \rightarrow \left[s^{k}_{v}, t^{k}_{v}, q^{k}_{v}\right]

v-update (pt. 1)

  • network1: \left(x, \partial_{x} S(x)\right) \coloneqq \left(x, F \right)\rightarrow (s_{v}, t_{v}, q_{v}), where \begin{align*} h_{0} &= \sigma\left( w_{x} x + w_{F} F + b \right) \\ h_{1} &= \sigma\left( w_{1} h_{0} + b_{1} \right) \\ &\vdots \\ h_{n} &= \sigma\left(w_{n} h_{n-1} + b_{n}\right) \longrightarrow \\ \end{align*} s_{v} = \lambda_{s} \tanh\left(w_{s} h_{n} + b_{s}\right), \quad t_{v} = w_{t} z + b_{t}, \quad q_{v} = \lambda_{q}w_{q} z + b_{q}

v-update (pt. 2)

  • Network outputs1: s_{v} = \lambda_{s} \tanh\left(w_{s}\, h_{n} + b_{s}\right), \quad t_{v} = w_{t} h_{n} + b_{t}, \quad q_{v} = \lambda_{q}\tanh\left(w_{q} h_{n} + b_{q}\right)

  • Use these to update \Gamma^{\pm}: (x, v) \rightarrow \left(x, v_{\pm}\right)2:

    • forward (d = \textcolor{#FF5252}{+}): \Gamma^{\textcolor{#FF5252}{+}}(x, v) \coloneqq v_{\textcolor{#FF5252}{+}} = v \cdot e^{\frac{\varepsilon}{2} s_{v}} - \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{v}} + t_{v} \right]

    • backward (d = \textcolor{#1A8FFF}{-}): \Gamma^{\textcolor{#1A8FFF}{-}}(x, v) \coloneqq v_{\textcolor{#1A8FFF}{-}} = e^{-\frac{\varepsilon}{2} s_{v}} \left\{v + \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{v}} + t_{v} \right]\right\}

Annealing Schedule

  • Introduce an annealing schedule during the training phase:

    \left\{ \gamma_{t} \right\}_{t=0}^{N} = \left\{\gamma_{0}, \gamma_{1}, \ldots, \gamma_{N-1}, \gamma_{N} \right\}

    where \gamma_{0} < \gamma_{1} < \cdots < \gamma_{N} \equiv 1, and \left|\gamma_{t+1} - \gamma_{t}\right| \ll 1

  • Note:

    • for \left|\gamma_{t}\right| < 1, this rescaling helps to reduce the height of the energy barriers \Longrightarrow
    • easier for our sampler to explore previously inaccessible regions of the phase space

LQCD @ ALCF (2008)

The Blue Gene/P at the ALCF has tremendously accelerated the generation of the gauge configurations—in many cases, by a factor of 5 to 10 over what has been possible with other machines. Significant progress has been made in simulations with two different implementations of the quarks—domain wall and staggered1

  — Mike Papka, 2008

HMC

flowchart LR
  v("v") --> G1{"Γ"}
  x0("x") -. "∂𝑆 / ∂𝒙" -.-> G1{"Γ"}
  G1 -.-> L{"Λ"}
  x0 --> L{"Λ"}
  G1 --> G2{"Γ"}
  L -.-> G2
  L ---> x'("x*")
  G2 --> v'("v*")
  style x0  color:#805AC1
  style x' color:#805AC1
  style v' color:#308E40
  style v color:#308E40
  style L color:#FD971F
  style G1 color:#ec407a
  style G2 color:#ec407a
  classDef default fill:#1c1c1c