MLMC: Machine Learning Monte Carlo

2023-08-03

MLMC: Machine Learning Monte Carlo
for Lattice Gauge Theory

 

Sam Foreman
Xiao-Yong Jin, James C. Osborn
saforem2/{lattice23, l2hmc-qcd}

Overview

  1. Background: {MCMC,HMC}
  2. L2HMC: Generalizing MD
  3. References
  4. Extras

Markov Chain Monte Carlo (MCMC)

Goal

Generate independent samples \{x_{i}\}, such that1 \{x_{i}\} \sim p(x) \propto e^{-S(x)} where S(x) is the action (or potential energy)

  • Want to calculate observables \mathcal{O}:
    \left\langle \mathcal{O}\right\rangle \propto \int \left[\mathcal{D}x\right]\hspace{4pt} {\mathcal{O}(x)\, p(x)}

If these were independent, we could approximate: \left\langle\mathcal{O}\right\rangle \simeq \frac{1}{N}\sum^{N}_{n=1}\mathcal{O}(x_{n})
\sigma_{\mathcal{O}}^{2} = \frac{1}{N}\mathrm{Var}{\left[\mathcal{O} (x) \right]}\Longrightarrow \sigma_{\mathcal{O}} \propto \frac{1}{\sqrt{N}}

Markov Chain Monte Carlo (MCMC)

Goal

Generate independent samples \{x_{i}\}, such that1 \{x_{i}\} \sim p(x) \propto e^{-S(x)} where S(x) is the action (or potential energy)

  • Want to calculate observables \mathcal{O}:
    \left\langle \mathcal{O}\right\rangle \propto \int \left[\mathcal{D}x\right]\hspace{4pt} {\mathcal{O}(x)\, p(x)}

Instead, nearby configs are correlated, and we incur a factor of \textcolor{#FF5252}{\tau^{\mathcal{O}}_{\mathrm{int}}}: \sigma_{\mathcal{O}}^{2} = \frac{\textcolor{#FF5252}{\tau^{\mathcal{O}}_{\mathrm{int}}}}{N}\mathrm{Var}{\left[\mathcal{O} (x) \right]}

Hamiltonian Monte Carlo (HMC)

  • Want to (sequentially) construct a chain of states: x_{0} \rightarrow x_{1} \rightarrow x_{i} \rightarrow \cdots \rightarrow x_{N}\hspace{10pt}

    such that, as N \rightarrow \infty: \left\{x_{i}, x_{i+1}, x_{i+2}, \cdots, x_{N} \right\} \xrightarrow[]{N\rightarrow\infty} p(x) \propto e^{-S(x)}

Trick

  • Introduce fictitious momentum v \sim \mathcal{N}(0, \mathbb{1})
    • Normally distributed independent of x, i.e. \begin{align*} p(x, v) &\textcolor{#02b875}{=} p(x)\,p(v) \propto e^{-S{(x)}} e^{-\frac{1}{2} v^{T}v} = e^{-\left[S(x) + \frac{1}{2} v^{T}{v}\right]} \textcolor{#02b875}{=} e^{-H(x, v)} \end{align*}

Hamiltonian Monte Carlo (HMC)

  • Idea: Evolve the (\dot{x}, \dot{v}) system to get new states \{x_{i}\}❗

  • Write the joint distribution p(x, v): p(x, v) \propto e^{-S[x]} e^{-\frac{1}{2}v^{T} v} = e^{-H(x, v)}

Hamiltonian Dynamics

H = S[x] + \frac{1}{2} v^{T} v \Longrightarrow \dot{x} = +\partial_{v} H, \,\,\dot{v} = -\partial_{x} H

Figure 1: Overview of HMC algorithm

Leapfrog Integrator (HMC)

Hamiltonian Dynamics

\left(\dot{x}, \dot{v}\right) = \left(\partial_{v} H, -\partial_{x} H\right)

Leapfrog Step

input \,\left(x, v\right) \rightarrow \left(x', v'\right)\, output

\begin{align*} \tilde{v} &:= \textcolor{#F06292}{\Gamma}(x, v)\hspace{2.2pt} = v - \frac{\varepsilon}{2} \partial_{x} S(x) \\ x' &:= \textcolor{#FD971F}{\Lambda}(x, \tilde{v}) \, = x + \varepsilon \, \tilde{v} \\ v' &:= \textcolor{#F06292}{\Gamma}(x', \tilde{v}) = \tilde{v} - \frac{\varepsilon}{2} \partial_{x} S(x') \end{align*}

Warning!

Resample v_{0} \sim \mathcal{N}(0, \mathbb{1})
at the beginning of each trajectory

Note: \partial_{x} S(x) is the force

HMC Update

  • We build a trajectory of N_{\mathrm{LF}} leapfrog steps1 \begin{equation*} (x_{0}, v_{0})% \rightarrow (x_{1}, v_{1})\rightarrow \cdots% \rightarrow (x', v') \end{equation*}

  • And propose x' as the next state in our chain

\begin{align*} \textcolor{#F06292}{\Gamma}: (x, v) \textcolor{#F06292}{\rightarrow} v' &:= v - \frac{\varepsilon}{2} \partial_{x} S(x) \\ \textcolor{#FD971F}{\Lambda}: (x, v) \textcolor{#FD971F}{\rightarrow} x' &:= x + \varepsilon v \end{align*}

  • We then accept / reject x' using Metropolis-Hastings criteria,
    A(x'|x) = \min\left\{1, \frac{p(x')}{p(x)}\left|\frac{\partial x'}{\partial x}\right|\right\}

HMC Demo

Figure 2: HMC Demo

Issues with HMC

  • What do we want in a good sampler?
    • Fast mixing (small autocorrelations)
    • Fast burn-in (quick convergence)
  • Problems with HMC:
    • Energy levels selected randomly \rightarrow slow mixing
    • Cannot easily traverse low-density zones \rightarrow slow convergence

HMC Samples with \varepsilon=0.25

HMC Samples with \varepsilon=0.5

Figure 3: HMC Samples generated with varying step sizes \varepsilon

Topological Freezing

Topological Charge: Q = \frac{1}{2\pi}\sum_{P}\left\lfloor x_{P}\right\rfloor \in \mathbb{Z}

note: \left\lfloor x_{P} \right\rfloor = x_{P} - 2\pi \left\lfloor\frac{x_{P} + \pi}{2\pi}\right\rfloor

Critical Slowing Down

  • Q gets stuck!
    • as \beta\longrightarrow \infty:
      • Q \longrightarrow \text{const.}
      • \delta Q = \left(Q^{\ast} - Q\right) \rightarrow 0 \textcolor{#FF5252}{\Longrightarrow}
    • # configs required to estimate errors
      grows exponentially: \tau_{\mathrm{int}}^{Q} \longrightarrow \infty

Note \delta Q \rightarrow 0 at increasing \beta

Can we do better?

  • Introduce two (invertible NNs) vNet and xNet1:
    • vNet: (x, F) \longrightarrow \left(s_{v},\, t_{v},\, q_{v}\right)
    • xNet: (x, v) \longrightarrow \left(s_{x},\, t_{x},\, q_{x}\right)

 

  • Use these (s, t, q) in the generalized MD update:
    • \Gamma_{\theta}^{\pm} : ({x}, \textcolor{#07B875}{v}) \xrightarrow[]{\textcolor{#F06292}{s_{v}, t_{v}, q_{v}}} (x, \textcolor{#07B875}{v'})
    • \Lambda_{\theta}^{\pm} : (\textcolor{#AE81FF}{x}, v) \xrightarrow[]{\textcolor{#FD971F}{s_{x}, t_{x}, q_{x}}} (\textcolor{#AE81FF}{x'}, v)

Figure 4: Generalized MD update where \Lambda_{\theta}^{\pm}, \Gamma_{\theta}^{\pm} are invertible NNs

L2HMC: Generalizing the MD Update

L2HMC Update

  • Introduce d \sim \mathcal{U}(\pm) to determine the direction1 of our update
  1. \textcolor{#07B875}{v'} = \Gamma^{\pm}({x}, \textcolor{#07B875}{v}) \hspace{46pt} update v

  2. \textcolor{#AE81FF}{x'} = x_{B}\,+\,\Lambda^{\pm}(x_{A}, {v'}) \hspace{10pt} update first half: x_{A}

  3. \textcolor{#AE81FF}{x''} = x'_{A}\,+\,\Lambda^{\pm}(x'_{B}, {v'}) \hspace{8pt} update other half: x_{B}

  4. \textcolor{#07B875}{v''} = \Gamma^{\pm}({x''}, \textcolor{#07B875}{v'}) \hspace{36pt} update v

 

Figure 5: Generalized MD update with \Lambda_{\theta}^{\pm}, \Gamma_{\theta}^{\pm} invertible NNs

L2HMC: Leapfrog Layer

L2HMC Update

Algorithm

  1. input: x

    • Resample: \textcolor{#07B875}{v} \sim \mathcal{N}(0, \mathbb{1}); \,\,{d\sim\mathcal{U}(\pm)}
    • Construct initial state: \textcolor{#939393}{\xi} =(\textcolor{#AE81FF}{x}, \textcolor{#07B875}{v}, {\pm})
  2. forward: Generate proposal \xi' by passing initial \xi through N_{\mathrm{LF}} leapfrog layers
    \textcolor{#939393} \xi \hspace{1pt}\xrightarrow[]{\tiny{\mathrm{LF} \text{ layer}}}\xi_{1} \longrightarrow\cdots \longrightarrow \xi_{N_{\mathrm{LF}}} = \textcolor{#f8f8f8}{\xi'} := (\textcolor{#AE81FF}{x''}, \textcolor{#07B875}{v''})

    • Accept / Reject: \begin{equation*} A({\textcolor{#f8f8f8}{\xi'}}|{\textcolor{#939393}{\xi}})= \mathrm{min}\left\{1, \frac{\pi(\textcolor{#f8f8f8}{\xi'})}{\pi(\textcolor{#939393}{\xi})} \left| \mathcal{J}\left(\textcolor{#f8f8f8}{\xi'},\textcolor{#939393}{\xi}\right)\right| \right\} \end{equation*}
  3. backward (if training):

    • Evaluate the loss function1 \mathcal{L}\gets \mathcal{L}_{\theta}(\textcolor{#f8f8f8}{\xi'}, \textcolor{#939393}{\xi}) and backprop
  4. return: \textcolor{#AE81FF}{x}_{i+1}
    Evaluate MH criteria (1) and return accepted config, \textcolor{#AE81FF}{{x}_{i+1}}\gets \begin{cases} \textcolor{#f8f8f8}{\textcolor{#AE81FF}{x''}} \small{\text{ w/ prob }} A(\textcolor{#f8f8f8}{\xi''}|\textcolor{#939393}{\xi}) \hspace{26pt} βœ… \\ \textcolor{#939393}{\textcolor{#AE81FF}{x}} \hspace{5pt}\small{\text{ w/ prob }} 1 - A(\textcolor{#f8f8f8}{\xi''}|{\textcolor{#939393}{\xi}}) \hspace{10pt} 🚫 \end{cases}

Figure 6: Leapfrog Layer used in generalized MD update

4D SU(3) Model

Link Variables

  • Write link variables U_{\mu}(x) \in SU(3):

    \begin{align*} U_{\mu}(x) &= \mathrm{exp}\left[{i\, \textcolor{#AE81FF}{\omega^{k}_{\mu}(x)} \lambda^{k}}\right]\\ &= e^{i \textcolor{#AE81FF}{Q}},\quad \text{with} \quad \textcolor{#AE81FF}{Q} \in \mathfrak{su}(3) \end{align*}

    where \omega^{k}_{\mu}(x) \in \mathbb{R}, and \lambda^{k} are the generators of SU(3)

Conjugate Momenta

  • Introduce P_{\mu}(x) = P^{k}_{\mu}(x) \lambda^{k} conjugate to \omega^{k}_{\mu}(x)

Wilson Action

S_{G} = -\frac{\beta}{6} \sum \mathrm{Tr}\left[U_{\mu\nu}(x) + U^{\dagger}_{\mu\nu}(x)\right]

where U_{\mu\nu}(x) = U_{\mu}(x) U_{\nu}(x+\hat{\mu}) U^{\dagger}_{\mu}(x+\hat{\nu}) U^{\dagger}_{\nu}(x)

Figure 7: Illustration of the lattice

HMC: 4D SU(3)

Hamiltonian: H[P, U] = \frac{1}{2} P^{2} + S[U] \Longrightarrow

  • U update: \frac{d\omega^{k}}{dt} = \frac{\partial H}{\partial P^{k}} \frac{d\omega^{k}}{dt}\lambda^{k} = P^{k}\lambda^{k} \Longrightarrow \frac{dQ}{dt} = P \begin{align*} Q(\textcolor{#FFEE58}{\varepsilon}) &= Q(0) + \textcolor{#FFEE58}{\varepsilon} P(0)\Longrightarrow\\ -i\, \log U(\textcolor{#FFEE58}{\varepsilon}) &= -i\, \log U(0) + \textcolor{#FFEE58}{\varepsilon} P(0) \\ U(\textcolor{#FFEE58}{\varepsilon}) &= e^{i\,\textcolor{#FFEE58}{\varepsilon} P(0)} U(0)\Longrightarrow \\ &\hspace{1pt}\\ \textcolor{#FD971F}{\Lambda}:\,\, U \longrightarrow U' &\coloneqq e^{i\varepsilon P'} U \end{align*}

\textcolor{#FFEE58}{\varepsilon} is the step size

  • P update: \frac{dP^{k}}{dt} = - \frac{\partial H}{\partial \omega^{k}} \frac{dP^{k}}{dt} = - \frac{\partial H}{\partial \omega^{k}} = -\frac{\partial H}{\partial Q} = -\frac{dS}{dQ}\Longrightarrow \begin{align*} P(\textcolor{#FFEE58}{\varepsilon}) &= P(0) - \textcolor{#FFEE58}{\varepsilon} \left.\frac{dS}{dQ}\right|_{t=0} \\ &= P(0) - \textcolor{#FFEE58}{\varepsilon} \,\textcolor{#E599F7}{F[U]} \\ &\hspace{1pt}\\ \textcolor{#F06292}{\Gamma}:\,\, P \longrightarrow P' &\coloneqq P - \frac{\varepsilon}{2} F[U] \end{align*}

\textcolor{#E599F7}{F[U]} is the force term

HMC: 4D SU(3)

  • Momentum Update: \textcolor{#F06292}{\Gamma}: P \longrightarrow P' := P - \frac{\varepsilon}{2} F[U]

  • Link Update: \textcolor{#FD971F}{\Lambda}: U \longrightarrow U' := e^{i\varepsilon P'} U\quad\quad

  • We maintain a batch of Nb lattices, all updated in parallel

    • U.dtype = complex128
    • U.shape
      = [Nb, 4, Nt, Nx, Ny, Nz, 3, 3]

Networks 4D SU(3)

 

 

U-Network:

UNet: (U, P) \longrightarrow \left(s_{U},\, t_{U},\, q_{U}\right)

 

P-Network:

PNet: (U, P) \longrightarrow \left(s_{P},\, t_{P},\, q_{P}\right)

Networks 4D SU(3)

 

 

U-Network:

UNet: (U, P) \longrightarrow \left(s_{U},\, t_{U},\, q_{U}\right)

 

P-Network:

PNet: (U, P) \longrightarrow \left(s_{P},\, t_{P},\, q_{P}\right)

\uparrow
let’s look at this

P-Network (pt. 1)

  • input1: \hspace{7pt}\left(U, F\right) \coloneqq (e^{i Q}, F) \begin{align*} h_{0} &= \sigma\left( w_{Q} Q + w_{F} F + b \right) \\ h_{1} &= \sigma\left( w_{1} h_{0} + b_{1} \right) \\ &\vdots \\ h_{n} &= \sigma\left(w_{n-1} h_{n-2} + b_{n}\right) \\ \textcolor{#FF5252}{z} & \coloneqq \sigma\left(w_{n} h_{n-1} + b_{n}\right) \longrightarrow \\ \end{align*}
  • output2: \hspace{7pt} (s_{P}, t_{P}, q_{P})

    • s_{P} = \lambda_{s} \tanh(w_s \textcolor{#FF5252}{z} + b_s)
    • t_{P} = w_{t} \textcolor{#FF5252}{z} + b_{t}
    • q_{P} = \lambda_{q} \tanh(w_{q} \textcolor{#FF5252}{z} + b_{q})

P-Network (pt. 2)

  • Use (s_{P}, t_{P}, q_{P}) to update \Gamma^{\pm}: (U, P) \rightarrow \left(U, P_{\pm}\right)1:

    • forward (d = \textcolor{#FF5252}{+}): \Gamma^{\textcolor{#FF5252}{+}}(U, P) \coloneqq P_{\textcolor{#FF5252}{+}} = P \cdot e^{\frac{\varepsilon}{2} s_{P}} - \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{P}} + t_{P} \right]

    • backward (d = \textcolor{#1A8FFF}{-}): \Gamma^{\textcolor{#1A8FFF}{-}}(U, P) \coloneqq P_{\textcolor{#1A8FFF}{-}} = e^{-\frac{\varepsilon}{2} s_{P}} \left\{P + \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{P}} + t_{P} \right]\right\}

Results: 2D U(1)

Improvement

We can measure the performance by comparing \tau_{\mathrm{int}} for the trained model vs. HMC.

Note: lower is better

Interpretation

Deviation in x_{P}

Topological charge mixing

Artificial influx of energy

Figure 8: Illustration of how different observables evolve over a single L2HMC trajectory.

Interpretation

Average plaquette: \langle x_{P}\rangle vs LF step

Average energy: H - \sum\log|\mathcal{J}|

Figure 9: The trained model artifically increases the energy towards the middle of the trajectory, allowing the sampler to tunnel between isolated sectors.

4D SU(3) Results

(a) 100 train iters

(b) 500 train iters

(c) 1000 train iters

Figure 10: \log|\mathcal{J}| vs. N_{\mathrm{LF}} during training

4D SU(3) Results: \delta U_{\mu\nu}

Figure 11: The difference in the average plaquette \left|\delta U_{\mu\nu}\right|^{2} between the trained model and HMC

4D SU(3) Results: \delta U_{\mu\nu}

Figure 12: The difference in the average plaquette \left|\delta U_{\mu\nu}\right|^{2} between the trained model and HMC

Next Steps

  • Further code development

  • Continue to use / test different network architectures

    • Gauge equivariant NNs for U_{\mu}(x) update
  • Continue to test different loss functions for training

  • Scaling:

    • Lattice volume
    • Network size
    • Batch size
    • # of GPUs

Thank you!

 

Acknowledgements

This research used resources of the Argonne Leadership Computing Facility,
which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.

hits l2hmc-qcd codefactor

arxiv arxiv

hydra pyTorch tensorflow Weights & Biases monitoring

Acknowledgements

  • Huge thank you to:
    • Yannick Meurice
    • Norman Christ
    • Akio Tomiya
    • Nobuyuki Matsumoto
    • Richard Brower
    • Luchang Jin
    • Chulwoo Jung
    • Peter Boyle
    • Taku Izubuchi
    • Denis Boyda
    • Dan Hackett
    • ECP-CSD group
    • ALCF Staff + Datascience Group

References

(I don’t know why this is broken πŸ€·πŸ»β€β™‚οΈ )

Boyda, Denis et al. 2022. β€œApplications of Machine Learning to Lattice Quantum Field Theory.” In Snowmass 2021. https://arxiv.org/abs/2202.05838.
Foreman, Sam, Taku Izubuchi, Luchang Jin, Xiao-Yong Jin, James C. Osborn, and Akio Tomiya. 2022. β€œHMC with Normalizing Flows.” PoS LATTICE2021: 073. https://doi.org/10.22323/1.396.0073.
Foreman, Sam, Xiao-Yong Jin, and James C. Osborn. 2021. β€œDeep Learning Hamiltonian Monte Carlo.” In 9th International Conference on Learning Representations. https://arxiv.org/abs/2105.03418.
β€”β€”β€”. 2022. β€œLeapfrogLayers: A Trainable Framework for Effective Topological Sampling.” PoS LATTICE2021 (May): 508. https://doi.org/10.22323/1.396.0508.
Shanahan, Phiala et al. 2022. β€œSnowmass 2021 Computational Frontier CompF03 Topical Group Report: Machine Learning,” September. https://arxiv.org/abs/2209.07559.

Extras

Integrated Autocorrelation Time

Figure 13: Plot of the integrated autocorrelation time for both the trained model (colored) and HMC (greyscale).

Comparison

(a) Trained model

(b) Generic HMC

Figure 14: Comparison of \langle \delta Q\rangle = \frac{1}{N}\sum_{i=k}^{N} \delta Q_{i} for the trained model Figure 14 (a) vs. HMC Figure 14 (b)

Plaquette analysis: x_{P}

Deviation from V\rightarrow\infty limit, x_{P}^{\ast}

Average \langle x_{P}\rangle, with x_{P}^{\ast} (dotted-lines)

Figure 15: Plot showing how average plaquette, \left\langle x_{P}\right\rangle varies over a single trajectory for models trained at different \beta, with varying trajectory lengths N_{\mathrm{LF}}

Loss Function

  • Want to maximize the expected squared charge difference1: \begin{equation*} \mathcal{L}_{\theta}\left(\xi^{\ast}, \xi\right) = {\mathbb{E}_{p(\xi)}}\big[-\textcolor{#FA5252}{{\delta Q}}^{2} \left(\xi^{\ast}, \xi \right)\cdot A(\xi^{\ast}|\xi)\big] \end{equation*}

  • Where:

    • \delta Q is the tunneling rate: \begin{equation*} \textcolor{#FA5252}{\delta Q}(\xi^{\ast},\xi)=\left|Q^{\ast} - Q\right| \end{equation*}

    • A(\xi^{\ast}|\xi) is the probability2 of accepting the proposal \xi^{\ast}: \begin{equation*} A(\xi^{\ast}|\xi) = \mathrm{min}\left( 1, \frac{p(\xi^{\ast})}{p(\xi)}\left|\frac{\partial \xi^{\ast}}{\partial \xi^{T}}\right|\right) \end{equation*}

Networks 2D U(1)

  • Stack gauge links as shape\left(U_{\mu}\right)=[Nb, 2, Nt, Nx] \in \mathbb{C}

    x_{\mu}(n) ≔ \left[\cos(x), \sin(x)\right]

    with shape\left(x_{\mu}\right)= [Nb, 2, Nt, Nx, 2] \in \mathbb{R}

  • x-Network:

    • \psi_{\theta}: (x, v) \longrightarrow \left(s_{x},\, t_{x},\, q_{x}\right)
  • v-Network:

    • \varphi_{\theta}: (x, v) \longrightarrow \left(s_{v},\, t_{v},\, q_{v}\right) \hspace{2pt}\longleftarrow lets look at this

v-Update1

  • forward (d = \textcolor{#FF5252}{+}):

\Gamma^{\textcolor{#FF5252}{+}}: (x, v) \rightarrow v' \coloneqq v \cdot e^{\frac{\varepsilon}{2} s_{v}} - \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{v}} + t_{v} \right]

  • backward (d = \textcolor{#1A8FFF}{-}):

\Gamma^{\textcolor{#1A8FFF}{-}}: (x, v) \rightarrow v' \coloneqq e^{-\frac{\varepsilon}{2} s_{v}} \left\{v + \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{v}} + t_{v} \right]\right\}

x-Update

  • forward (d = \textcolor{#FF5252}{+}):

\Lambda^{\textcolor{#FF5252}{+}}(x, v) = x \cdot e^{\frac{\varepsilon}{2} s_{x}} - \frac{\varepsilon}{2}\left[ v \cdot e^{\varepsilon q_{x}} + t_{x} \right]

  • backward (d = \textcolor{#1A8FFF}{-}):

\Lambda^{\textcolor{#1A8FFF}{-}}(x, v) = e^{-\frac{\varepsilon}{2} s_{x}} \left\{x + \frac{\varepsilon}{2}\left[ v \cdot e^{\varepsilon q_{x}} + t_{x} \right]\right\}

Lattice Gauge Theory (2D U(1))

Link Variables

U_{\mu}(n) = e^{i x_{\mu}(n)}\in \mathbb{C},\quad \text{where}\quad x_{\mu}(n) \in [-\pi,\pi)

Wilson Action

S_{\beta}(x) = \beta\sum_{P} \cos \textcolor{#00CCFF}{x_{P}},

\textcolor{#00CCFF}{x_{P}} = \left[x_{\mu}(n) + x_{\nu}(n+\hat{\mu}) - x_{\mu}(n+\hat{\nu})-x_{\nu}(n)\right]

Note: \textcolor{#00CCFF}{x_{P}} is the product of links around 1\times 1 square, called a β€œplaquette”

2D Lattice

Figure 16: Jupyter Notebook

Annealing Schedule

  • Introduce an annealing schedule during the training phase:

    \left\{ \gamma_{t} \right\}_{t=0}^{N} = \left\{\gamma_{0}, \gamma_{1}, \ldots, \gamma_{N-1}, \gamma_{N} \right\}

    where \gamma_{0} < \gamma_{1} < \cdots < \gamma_{N} \equiv 1, and \left|\gamma_{t+1} - \gamma_{t}\right| \ll 1

  • Note:

    • for \left|\gamma_{t}\right| < 1, this rescaling helps to reduce the height of the energy barriers \Longrightarrow
    • easier for our sampler to explore previously inaccessible regions of the phase space

Networks 2D U(1)

  • Stack gauge links as shape\left(U_{\mu}\right)=[Nb, 2, Nt, Nx] \in \mathbb{C}

    x_{\mu}(n) ≔ \left[\cos(x), \sin(x)\right]

    with shape\left(x_{\mu}\right)= [Nb, 2, Nt, Nx, 2] \in \mathbb{R}

  • x-Network:

    • \psi_{\theta}: (x, v) \longrightarrow \left(s_{x},\, t_{x},\, q_{x}\right)
  • v-Network:

    • \varphi_{\theta}: (x, v) \longrightarrow \left(s_{v},\, t_{v},\, q_{v}\right)

Toy Example: GMM \in \mathbb{R}^{2}

Physical Quantities

  • To estimate physical quantities, we:
    • Calculate physical observables at increasing spatial resolution
    • Perform extrapolation to continuum limit

Figure 17: Increasing the physical resolution (a \rightarrow 0) allows us to make predictions about numerical values of physical quantities in the continuum limit.