Generative Modeling &amp; Efficient Sampling for LQCD

Sam Foreman

Generative Modeling & Efficient Sampling for LQCD

Sam Foreman

2023-12-21

Generative Modeling & Efficient Sampling
(for Lattice QCD)

Sam Foreman
Xiao-Yong Jin, James C. Osborn
saforem2/lqcd-pasc23

Standard Model

Electricity & Magnetism
Quantum Field Theory
- Nuclear interactions
  - Strong + Weak Force
  - Observed particles
- Quantum Chromodynamics (QCD):
  - Quark / gluon interactions in the nuclueus
  - Analytic progress is difficult…¹
    - Lattice QCD to the rescue! 🚀
known to be incomplete!

Figure 1: Feynman diagram for e^{+} + e^{-} \rightarrow 2\gamma

Magnetic Moment of the Muon

a_{\mu} = \frac{(g_{\mu} - 2)}{2}

a_{\mu}^{\mathrm{exp}} - a_{\mu}^{\mathrm{SM}} = \left(25.1 \pm 5.9\right)\cdot 10^{-10}

can Lattice QCD resolve this?

Markov Chain Monte Carlo (MCMC)

Goal

Generate independent samples \{x_{i}\}, such that¹ \{x_{i}\} \sim p(x) \propto e^{-S(x)} where S(x) is the action (or potential energy)

Want to calculate observables \mathcal{O}:
\left\langle \mathcal{O}\right\rangle \propto \int \left[\mathcal{D}x\right]\hspace{4pt} {\mathcal{O}(x)\, p(x)}

If these were independent, we could approximate: \left\langle\mathcal{O}\right\rangle \simeq \frac{1}{N}\sum^{N}_{n=1}\mathcal{O}(x_{n})
\sigma_{\mathcal{O}}^{2} = \frac{1}{N}\mathrm{Var}{\left[\mathcal{O} (x) \right]}\Longrightarrow \sigma_{\mathcal{O}} \propto \frac{1}{\sqrt{N}}

Markov Chain Monte Carlo (MCMC)

Goal

Generate independent samples \{x_{i}\}, such that¹ \{x_{i}\} \sim p(x) \propto e^{-S(x)} where S(x) is the action (or potential energy)

Want to calculate observables \mathcal{O}:
\left\langle \mathcal{O}\right\rangle \propto \int \left[\mathcal{D}x\right]\hspace{4pt} {\mathcal{O}(x)\, p(x)}

Instead, nearby configs are correlated, and we incur a factor of \textcolor{#FF5252}{\tau^{\mathcal{O}}_{\mathrm{int}}}: \sigma_{\mathcal{O}}^{2} = \frac{\textcolor{#FF5252}{\tau^{\mathcal{O}}_{\mathrm{int}}}}{N}\mathrm{Var}{\left[\mathcal{O} (x) \right]}

Hamiltonian Monte Carlo (HMC)¹

Want to (sequentially) construct a chain of states: x_{0} \rightarrow x_{1} \rightarrow x_{i} \rightarrow \cdots \rightarrow x_{N}\hspace{10pt}

such that, as N \rightarrow \infty: \left\{x_{i}, x_{i+1}, x_{i+2}, \cdots, x_{N} \right\} \xrightarrow[]{N\rightarrow\infty} p(x) \propto e^{-S(x)}

Trick

Introduce fictitious momentum v \sim \mathcal{N}(0, \mathbb{1})
- Normally distributed independent of x, i.e. \begin{align*} p(x, v) &\textcolor{#02b875}{=} p(x)p(v) \propto e^{-S{(x)}} e^{-\frac{1}{2} v^{T}v} = e^{-\left[S(x) + \frac{1}{2} v^{T}{v}\right]} \textcolor{#02b875}{=} e^{-H(x, v)} \end{align*}

Hamiltonian Monte Carlo (HMC)

Idea: Evolve the (\dot{x}, \dot{v}) system to get new states \{x_{i}\}❗
Write the joint distribution p(x, v): p(x, v) \propto e^{-S[x]} e^{-\frac{1}{2}v^{T} v} = e^{-H(x, v)}

Hamiltonian Dynamics

H = S[x] + \frac{1}{2} v^{T} v \Longrightarrow \dot{x} = +\partial_{v} H, \,\,\dot{v} = -\partial_{x} H

Figure 2: Overview of HMC algorithm

Leapfrog Integrator (HMC)

Hamiltonian Dynamics

\left(\dot{x}, \dot{v}\right) = \left(\partial_{v} H, -\partial_{x} H\right)

Leapfrog Step

input \,\left(x, v\right) \rightarrow \left(x', v'\right)\, output

\begin{align*} \tilde{v} &:= \textcolor{#F06292}{\Gamma}(x; v)\hspace{2.2pt} = v - \frac{\varepsilon}{2} \partial_{x} S(x) \\ x' &:= \textcolor{#FD971F}{\Lambda}(x; \tilde{v}) \, = x + \varepsilon \, \tilde{v} \\ v' &:= \textcolor{#F06292}{\Gamma}(x'; \tilde{v}) = \tilde{v} - \frac{\varepsilon}{2} \partial_{x} S(x') \end{align*}

Warning!

Resample v_{0} \sim \mathcal{N}(0, \mathbb{1})
at the beginning of each trajectory

Note: \partial_{x} S(x) is the force

HMC Update

We build a trajectory of N_{\mathrm{LF}} leapfrog steps¹ \begin{equation*} (x_{0}, v_{0})% \rightarrow (x_{1}, v_{1})\rightarrow \cdots% \rightarrow (x', v') \end{equation*}
And propose x' as the next state in our chain

\begin{align*} \textcolor{#F06292}{\Gamma}: (x, v) \textcolor{#F06292}{\rightarrow} v' &:= v - \frac{\varepsilon}{2} \partial_{x} S(x) \\ \textcolor{#FD971F}{\Lambda}: (x, v) \textcolor{#FD971F}{\rightarrow} x' &:= x + \varepsilon v \end{align*}

We then accept / reject x' using Metropolis-Hastings criteria,
A(x'|x) = \min\left\{1, \frac{p(x')}{p(x)}\left|\frac{\partial x'}{\partial x}\right|\right\}

HMC Demo

Figure 3: HMC Demo

Issues with HMC

What do we want in a good sampler?
- Fast mixing (small autocorrelations)
- Fast burn-in (quick convergence)
Problems with HMC:
- Energy levels selected randomly \rightarrow slow mixing
- Cannot easily traverse low-density zones \rightarrow slow convergence

Can we do better¹?

L2HMC

Generalize HMC by introducing 6 functions:
- x-update: \psi_{\theta}: (x, v) \longrightarrow \left(s_{x},\, t_{x},\, q_{x}\right)
- v-update: \varphi_{\theta}: (x, v) \longrightarrow \left(s_{v},\, t_{v},\, q_{v}\right) where \psi_{\theta}, \varphi_{\theta} are NNs, parameterized by weights \theta
These functions, \left(s_{k}, t_{k}, q_{k}\right) \quad \text{ with } \quad k \in \{x, v\} are then used in a generalized leapfrog update to generate (x, v) \longrightarrow (x', v')

L2HMC: Leapfrog Layer

Algorithm: L2HMC Update

input: \mathbf{x}
- Resample {\mathbf{v} \sim \mathcal{N}(0, \mathbb{1})}, {d\sim\mathcal{U}(+, -)}, and construct \textcolor{#00CCFF}{\xi} =(\mathbf{x}, {\mathbf{v}}, {\pm})
forward: Generate proposal \xi^{\ast} by passing initial \xi through N_{\mathrm{LF}} leapfrog layers
\textcolor{#00CCFF} \xi \hspace{1pt}\xrightarrow[]{\tiny{\mathrm{LF} \text{ layer}}}\xi_{1} \longrightarrow\cdots \longrightarrow \xi_{N_{\mathrm{LF}}} = \textcolor{#AE81FF}{\xi^{\ast}}
- Compute the Metropolis-Hastings (MH) acceptance (with Jacobian \mathcal{J}) \begin{equation} A({\textcolor{#AE81FF}{\xi^{\ast}}}|{\textcolor{#00CCFF}{\xi}})= \mathrm{min}\left\{1, \frac{p(\textcolor{#AE81FF}{\xi^{\ast}})}{p(\textcolor{#00CCFF}{\xi})} \left| \mathcal{J}\left(\textcolor{#AE81FF}{\xi^{\ast}},\textcolor{#00CCFF}{\xi}\right)\right| \right\} \end{equation}
backward (if training):
- Evaluate the loss function¹ \mathcal{L}\gets \mathcal{L}_{\theta}(\textcolor{#AE81FF}{\xi^{\ast}}, \textcolor{#00CCFF}{\xi}) and backprop
return: Evaluate MH criteria (1) and return accepted config, \mathbf{x}_{i+1} \mathbf{x}_{i+1}\gets \begin{cases} \textcolor{#AE81FF}{\mathbf{x}^{\ast}} \small{\text{ w/ prob }} A(\textcolor{#AE81FF}{\xi^{\ast}}|\textcolor{#00CCFF}{\xi}) \hspace{26pt} ✅ \\ \textcolor{#00CCFF}{\mathbf{x}} \hspace{5pt}\small{\text{ w/ prob }} 1 - A(\textcolor{#AE81FF}{\xi^{\ast}}|{\textcolor{#00CCFF}{\xi}}) \hspace{10pt} 🚫 \end{cases}

Toy Example: GMM \in \mathbb{R}^{2}

Lattice Gauge Theory (2D U(1))

Link Variables

U_{\mu}(n) = e^{i x_{\mu}(n)}\in \mathbb{C},\quad \text{where}\quad x_{\mu}(n) \in [-\pi,\pi)

Wilson Action

S_{\beta}(x) = \beta\sum_{P} \cos \textcolor{#00CCFF}{x_{P}},

\textcolor{#00CCFF}{x_{P}} = \left[x_{\mu}(n) + x_{\nu}(n+\hat{\mu}) - x_{\mu}(n+\hat{\nu})-x_{\nu}(n)\right]

Note: \textcolor{#00CCFF}{x_{P}} is the product of links around 1\times 1 square, called a “plaquette”

Physical Quantities

To estimate physical quantities, we:
- calculate physical observables at increasing spatial resolution
- perform extrapolation to continuum limit

Figure 5: Increasing the physical resolution (a \rightarrow 0) allows us to make predictions about numerical values of physical quantities in the continuum limit.

Topological Freezing

Topological Charge: Q = \frac{1}{2\pi}\sum_{P}\left\lfloor x_{P}\right\rfloor \in \mathbb{Z}

note: \left\lfloor x_{P} \right\rfloor = x_{P} - 2\pi \left\lfloor\frac{x_{P} + \pi}{2\pi}\right\rfloor

Critical Slowing Down

Q gets stuck!
- as \beta\longrightarrow \infty:
  - Q \longrightarrow \text{const.}
  - \delta Q = \left(Q^{\ast} - Q\right) \rightarrow 0 \textcolor{#FF5252}{\Longrightarrow}
- # configs required to estimate errors
  grows exponentially: \tau_{\mathrm{int}}^{Q} \longrightarrow \infty

Note \delta Q \rightarrow 0 at increasing \beta

Loss Function

Want to maximize the expected squared charge difference¹: \begin{equation*} \mathcal{L}_{\theta}\left(\xi^{\ast}, \xi\right) = {\mathbb{E}_{p(\xi)}}\big[-\textcolor{#FA5252}{{\delta Q}}^{2} \left(\xi^{\ast}, \xi \right)\cdot A(\xi^{\ast}|\xi)\big] \end{equation*}
Where:
- \delta Q is the tunneling rate: \begin{equation*} \textcolor{#FA5252}{\delta Q}(\xi^{\ast},\xi)=\left|Q^{\ast} - Q\right| \end{equation*}
- A(\xi^{\ast}|\xi) is the probability² of accepting the proposal \xi^{\ast}: \begin{equation*} A(\xi^{\ast}|\xi) = \mathrm{min}\left( 1, \frac{p(\xi^{\ast})}{p(\xi)}\left|\frac{\partial \xi^{\ast}}{\partial \xi^{T}}\right|\right) \end{equation*}

Integrated Autocorrelation time: \tau_{\mathrm{int}}

Improvement

We can measure the performance by comparing \tau_{\mathrm{int}} for the trained model vs. HMC.

Note: lower is better

Integrated Autocorrelation Time

Figure 6: Plot of the integrated autocorrelation time for both the trained model (colored) and HMC (greyscale).

Interpretation

Deviation in x_{P}

Topological charge mixing

Artificial influx of energy

Figure 7: Illustration of how different observables evolve over a single L2HMC trajectory.

Interpretation

Average plaquette: \langle x_{P}\rangle vs LF step

Plaquette analysis: x_{P}

Deviation from V\rightarrow\infty limit, x_{P}^{\ast}

Average \langle x_{P}\rangle, with x_{P}^{\ast} (dotted-lines)

Figure 9: Plot showing how average plaquette, \left\langle x_{P}\right\rangle varies over a single trajectory for models trained at different \beta, with varying trajectory lengths N_{\mathrm{LF}}

Comparison

2D U(1) Model

lattice.shape: [2, Nt, Nx]
maintain buffer of Nb chains,
updated in parallel
- x.shape: [Nb, 2, Nt, Nx]
to deal with x \in \mathbb{C}, stack as:
[x.cos(), x.sin()]
final shapes:
- v.shape: [Nb, 2, Nt, Nx]
- x.shape: [Nb, 2, 2, Nt, Nx]

4D SU(3) Model¹

link variables:
- U_{\mu}(x) \in SU(3),
+ conjugate momenta:
- P_{\mu}(x) \in \mathfrak{su}(3)
We maintain a batch of Nb lattices, all updated in parallel
- lattice.shape:
  - [4, Nt, Nx, Ny, Nz, 3, 3]
- batch.shape:
  - [Nb, *lattice.shape]

Thank you!

samforeman.me
saforem2
@saforem2
foremans@anl.gov

Acknowledgements

This research used resources of the Argonne Leadership Computing Facility,
which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.

Acknowledgements

Links:
- Link to github
- reach out!
References:

Huge thank you to:
- Yannick Meurice
- Norman Christ
- Akio Tomiya
- Luchang Jin
- Chulwoo Jung
- Peter Boyle
- Taku Izubuchi
- ECP-CSD group
- ALCF Staff + Datascience Group

Links + References

This talk: saforem2/lqcd-pasc23
Code repo saforem2/l2hmc-qcd
Slides saforem2.github.io/lqcd-pasc23
Animated background (g - 2)
Title Slide Background (worms) animation
Fermilab Muon g-2

References

Boyda, Denis et al. 2022. “Applications of Machine Learning to Lattice Quantum Field Theory.” In Snowmass 2021. https://arxiv.org/abs/2202.05838.

Duane, S., A. D. Kennedy, B. J. Pendleton, and D. Roweth. 1987. “Hybrid Monte Carlo.” Phys. Lett. B 195: 216–22. https://doi.org/10.1016/0370-2693(87)91197-X.

Foreman, Sam, Taku Izubuchi, Luchang Jin, Xiao-Yong Jin, James C. Osborn, and Akio Tomiya. 2022. “HMC with Normalizing Flows.” PoS LATTICE2021: 073. https://doi.org/10.22323/1.396.0073.

Foreman, Sam, Xiao-Yong Jin, and James C. Osborn. 2021. “Deep Learning Hamiltonian Monte Carlo.” In 9th International Conference on Learning Representations. https://arxiv.org/abs/2105.03418.

———. 2022. “LeapfrogLayers: A Trainable Framework for Effective Topological Sampling.” PoS LATTICE2021: 508. https://doi.org/10.22323/1.396.0508.

Shanahan, Phiala et al. 2022. “Snowmass 2021 Computational Frontier CompF03 Topical Group Report: Machine Learning,” September. https://arxiv.org/abs/2209.07559.

Extras

Figure 11: Jupyter Notebook

Networks 2D U(1)

Stack gauge links as shape\left(U_{\mu}\right)=[Nb, 2, Nt, Nx] \in \mathbb{C}

x_{\mu}(n) ≔ \left[\cos(x), \sin(x)\right]

with shape\left(x_{\mu}\right)= [Nb, 2, Nt, Nx, 2] \in \mathbb{R}
x-Network:
- \Lambda^{\pm}_{k}(x, v) \rightarrow \left[s^{k}_{x}, t^{k}_{x}, q^{k}_{x}\right]
v-Network:
- \Gamma_{k}^{\pm}(x, v) \rightarrow \left[s^{k}_{v}, t^{k}_{v}, q^{k}_{v}\right]

v-update (pt. 1)

network¹: \left(x, \partial_{x} S(x)\right) \coloneqq \left(x, F \right)\rightarrow (s_{v}, t_{v}, q_{v}), where \begin{align*} h_{0} &= \sigma\left( w_{x} x + w_{F} F + b \right) \\ h_{1} &= \sigma\left( w_{1} h_{0} + b_{1} \right) \\ &\vdots \\ h_{n} &= \sigma\left(w_{n} h_{n-1} + b_{n}\right) \longrightarrow \\ \end{align*} s_{v} = \lambda_{s} \tanh\left(w_{s} h_{n} + b_{s}\right), \quad t_{v} = w_{t} z + b_{t}, \quad q_{v} = \lambda_{q}w_{q} z + b_{q}

v-update (pt. 2)

Network outputs¹: s_{v} = \lambda_{s} \tanh\left(w_{s}\, h_{n} + b_{s}\right), \quad t_{v} = w_{t} h_{n} + b_{t}, \quad q_{v} = \lambda_{q}\tanh\left(w_{q} h_{n} + b_{q}\right)
Use these to update \Gamma^{\pm}: (x, v) \rightarrow \left(x, v_{\pm}\right)²:
- forward (d = \textcolor{#FF5252}{+}): \Gamma^{\textcolor{#FF5252}{+}}(x, v) \coloneqq v_{\textcolor{#FF5252}{+}} = v \cdot e^{\frac{\varepsilon}{2} s_{v}} - \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{v}} + t_{v} \right]
- backward (d = \textcolor{#1A8FFF}{-}): \Gamma^{\textcolor{#1A8FFF}{-}}(x, v) \coloneqq v_{\textcolor{#1A8FFF}{-}} = e^{-\frac{\varepsilon}{2} s_{v}} \left\{v + \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{v}} + t_{v} \right]\right\}

Annealing Schedule

Introduce an annealing schedule during the training phase:

\left\{ \gamma_{t} \right\}_{t=0}^{N} = \left\{\gamma_{0}, \gamma_{1}, \ldots, \gamma_{N-1}, \gamma_{N} \right\}

where \gamma_{0} < \gamma_{1} < \cdots < \gamma_{N} \equiv 1, and \left|\gamma_{t+1} - \gamma_{t}\right| \ll 1
Note:
- for \left|\gamma_{t}\right| < 1, this rescaling helps to reduce the height of the energy barriers \Longrightarrow
- easier for our sampler to explore previously inaccessible regions of the phase space

LQCD @ ALCF (2008)

The Blue Gene/P at the ALCF has tremendously accelerated the generation of the gauge configurations—in many cases, by a factor of 5 to 10 over what has been possible with other machines. Significant progress has been made in simulations with two different implementations of the quarks—domain wall and staggered¹

— Mike Papka, 2008

HMC

flowchart LR
  v("v") --> G1{"Γ"}
  x0("x") -. "∂𝑆 / ∂𝒙" -.-> G1{"Γ"}
  G1 -.-> L{"Λ"}
  x0 --> L{"Λ"}
  G1 --> G2{"Γ"}
  L -.-> G2
  L ---> x'("x*")
  G2 --> v'("v*")
  style x0  color:#805AC1
  style x' color:#805AC1
  style v' color:#308E40
  style v color:#308E40
  style L color:#FD971F
  style G1 color:#ec407a
  style G2 color:#ec407a
  classDef default fill:#1c1c1c

Generative Modeling & Efficient Sampling for LQCD

Standard Model

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC)

Hamiltonian Monte Carlo (HMC)1

Hamiltonian Monte Carlo (HMC)

Leapfrog Integrator (HMC)

HMC Update

HMC Demo

Issues with HMC

Can we do better1?

L2HMC: Leapfrog Layer

Toy Example: GMM \in \mathbb{R}^{2}

Lattice Gauge Theory (2D U(1))

Physical Quantities

Topological Freezing

Loss Function

Integrated Autocorrelation time: \tau_{\mathrm{int}}

Integrated Autocorrelation Time

Interpretation

Interpretation

Plaquette analysis: x_{P}

Comparison

2D U(1) Model

4D SU(3) Model1

Thank you!

Acknowledgements

Links + References

References

Extras

Networks 2D U(1)

v-update (pt. 1)

v-update (pt. 2)

Annealing Schedule

LQCD @ ALCF (2008)

HMC

Hamiltonian Monte Carlo (HMC)¹

Can we do better¹?

4D SU(3) Model¹