MLMC: Machine Learning Monte Carlo

2024-04-03

MLMC: Machine Learning Monte Carlo
for Lattice Gauge Theory

 

Sam Foreman
Xiao-Yong Jin, James C. Osborn
saforem2/{lattice23, l2hmc-qcd}

Overview

  1. Background: {MCMC,HMC}
  2. L2HMC: Generalizing MD
  3. References
  4. Extras

Markov Chain Monte Carlo (MCMC)

Goal

Generate independent samples \{x_{i}\}, such that1 \{x_{i}\} \sim p(x) \propto e^{-S(x)} where S(x) is the action (or potential energy)

  • Want to calculate observables \mathcal{O}:
    \left\langle \mathcal{O}\right\rangle \propto \int \left[\mathcal{D}x\right]\hspace{4pt} {\mathcal{O}(x)\, p(x)}

If these were independent, we could approximate: \left\langle\mathcal{O}\right\rangle \simeq \frac{1}{N}\sum^{N}_{n=1}\mathcal{O}(x_{n})
\sigma_{\mathcal{O}}^{2} = \frac{1}{N}\mathrm{Var}{\left[\mathcal{O} (x) \right]}\Longrightarrow \sigma_{\mathcal{O}} \propto \frac{1}{\sqrt{N}}

Markov Chain Monte Carlo (MCMC)

Goal

Generate independent samples \{x_{i}\}, such that1 \{x_{i}\} \sim p(x) \propto e^{-S(x)} where S(x) is the action (or potential energy)

  • Want to calculate observables \mathcal{O}:
    \left\langle \mathcal{O}\right\rangle \propto \int \left[\mathcal{D}x\right]\hspace{4pt} {\mathcal{O}(x)\, p(x)}

Instead, nearby configs are correlated, and we incur a factor of \textcolor{#FF5252}{\tau^{\mathcal{O}}_{\mathrm{int}}}: \sigma_{\mathcal{O}}^{2} = \frac{\textcolor{#FF5252}{\tau^{\mathcal{O}}_{\mathrm{int}}}}{N}\mathrm{Var}{\left[\mathcal{O} (x) \right]}

Hamiltonian Monte Carlo (HMC)

  • Want to (sequentially) construct a chain of states: x_{0} \rightarrow x_{1} \rightarrow x_{i} \rightarrow \cdots \rightarrow x_{N}\hspace{10pt}

    such that, as N \rightarrow \infty: \left\{x_{i}, x_{i+1}, x_{i+2}, \cdots, x_{N} \right\} \xrightarrow[]{N\rightarrow\infty} p(x) \propto e^{-S(x)}

Trick

  • Introduce fictitious momentum v \sim \mathcal{N}(0, \mathbb{1})
    • Normally distributed independent of x, i.e. \begin{align*} p(x, v) &\textcolor{#02b875}{=} p(x)\,p(v) \propto e^{-S{(x)}} e^{-\frac{1}{2} v^{T}v} = e^{-\left[S(x) + \frac{1}{2} v^{T}{v}\right]} \textcolor{#02b875}{=} e^{-H(x, v)} \end{align*}

Hamiltonian Monte Carlo (HMC)

  • Idea: Evolve the (\dot{x}, \dot{v}) system to get new states \{x_{i}\}❗

  • Write the joint distribution p(x, v): p(x, v) \propto e^{-S[x]} e^{-\frac{1}{2}v^{T} v} = e^{-H(x, v)}

Hamiltonian Dynamics

H = S[x] + \frac{1}{2} v^{T} v \Longrightarrow \dot{x} = +\partial_{v} H, \,\,\dot{v} = -\partial_{x} H

Figure 1: Overview of HMC algorithm

Leapfrog Integrator (HMC)

Hamiltonian Dynamics

\left(\dot{x}, \dot{v}\right) = \left(\partial_{v} H, -\partial_{x} H\right)

Leapfrog Step

input \,\left(x, v\right) \rightarrow \left(x', v'\right)\, output

\begin{align*} \tilde{v} &:= \textcolor{#F06292}{\Gamma}(x, v)\hspace{2.2pt} = v - \frac{\varepsilon}{2} \partial_{x} S(x) \\ x' &:= \textcolor{#FD971F}{\Lambda}(x, \tilde{v}) \, = x + \varepsilon \, \tilde{v} \\ v' &:= \textcolor{#F06292}{\Gamma}(x', \tilde{v}) = \tilde{v} - \frac{\varepsilon}{2} \partial_{x} S(x') \end{align*}

Warning!

  • Resample v_{0} \sim \mathcal{N}(0, \mathbb{1})
    at the beginning of each trajectory

Note: \partial_{x} S(x) is the force

HMC Update

  • We build a trajectory of N_{\mathrm{LF}} leapfrog steps1 \begin{equation*} (x_{0}, v_{0})% \rightarrow (x_{1}, v_{1})\rightarrow \cdots% \rightarrow (x', v') \end{equation*}

  • And propose x' as the next state in our chain

\begin{align*} \textcolor{#F06292}{\Gamma}: (x, v) \textcolor{#F06292}{\rightarrow} v' &:= v - \frac{\varepsilon}{2} \partial_{x} S(x) \\ \textcolor{#FD971F}{\Lambda}: (x, v) \textcolor{#FD971F}{\rightarrow} x' &:= x + \varepsilon v \end{align*}

  • We then accept / reject x' using Metropolis-Hastings criteria,
    A(x'|x) = \min\left\{1, \frac{p(x')}{p(x)}\left|\frac{\partial x'}{\partial x}\right|\right\}

HMC Demo

Figure 2: HMC Demo

Issues with HMC

  • What do we want in a good sampler?
    • Fast mixing (small autocorrelations)
    • Fast burn-in (quick convergence)
  • Problems with HMC:
    • Energy levels selected randomly \rightarrow slow mixing
    • Cannot easily traverse low-density zones \rightarrow slow convergence

HMC Samples with \varepsilon=0.25

HMC Samples with \varepsilon=0.5
Figure 3: HMC Samples generated with varying step sizes \varepsilon

Topological Freezing

Topological Charge: Q = \frac{1}{2\pi}\sum_{P}\left\lfloor x_{P}\right\rfloor \in \mathbb{Z}

note: \left\lfloor x_{P} \right\rfloor = x_{P} - 2\pi \left\lfloor\frac{x_{P} + \pi}{2\pi}\right\rfloor

Critical Slowing Down

  • Q gets stuck!
    • as \beta\longrightarrow \infty:
      • Q \longrightarrow \text{const.}
      • \delta Q = \left(Q^{\ast} - Q\right) \rightarrow 0 \textcolor{#FF5252}{\Longrightarrow}
    • # configs required to estimate errors
      grows exponentially: \tau_{\mathrm{int}}^{Q} \longrightarrow \infty

Note \delta Q \rightarrow 0 at increasing \beta

Can we do better?

  • Introduce two (invertible NNs) vNet and xNet1:
    • vNet: (x, F) \longrightarrow \left(s_{v},\, t_{v},\, q_{v}\right)
    • xNet: (x, v) \longrightarrow \left(s_{x},\, t_{x},\, q_{x}\right)

 

  • Use these (s, t, q) in the generalized MD update:
    • \Gamma_{\theta}^{\pm} : ({x}, \textcolor{#07B875}{v}) \xrightarrow[]{\textcolor{#F06292}{s_{v}, t_{v}, q_{v}}} (x, \textcolor{#07B875}{v'})
    • \Lambda_{\theta}^{\pm} : (\textcolor{#AE81FF}{x}, v) \xrightarrow[]{\textcolor{#FD971F}{s_{x}, t_{x}, q_{x}}} (\textcolor{#AE81FF}{x'}, v)
Figure 4: Generalized MD update where \Lambda_{\theta}^{\pm}, \Gamma_{\theta}^{\pm} are invertible NNs

L2HMC: Generalizing the MD Update

L2HMC Update

  • Introduce d \sim \mathcal{U}(\pm) to determine the direction of our update

    1. \textcolor{#07B875}{v'} = \Gamma^{\pm}({x}, \textcolor{#07B875}{v}) \hspace{46pt} update v

    2. \textcolor{#AE81FF}{x'} = x_{B}\,+\,\Lambda^{\pm}(x_{A}, {v'}) \hspace{10pt} update first half: x_{A}

    3. \textcolor{#AE81FF}{x''} = x'_{A}\,+\,\Lambda^{\pm}(x'_{B}, {v'}) \hspace{8pt} update other half: x_{B}

    4. \textcolor{#07B875}{v''} = \Gamma^{\pm}({x''}, \textcolor{#07B875}{v'}) \hspace{36pt} update v

🎲 Re-Sampling

  • Resample both v\sim \mathcal{N}(0, 1), and d \sim \mathcal{U}(\pm) at the beginning of each trajectory
    • To ensure ergodicity + reversibility, we split the x update into sequential (complementary) updates
  • Introduce directional variable d \sim \mathcal{U}(\pm), resampled at the beginning of each trajectory:
    • Note that \left(\Gamma^{+}\right)^{-1} = \Gamma^{-}, i.e. \Gamma^{+}\left[\Gamma^{-}(x, v)\right] = \Gamma^{-}\left[\Gamma^{+}(x, v)\right] = (x, v)
Figure 5: Generalized MD update with \Lambda_{\theta}^{\pm}, \Gamma_{\theta}^{\pm} invertible NNs

L2HMC: Leapfrog Layer

L2HMC Update

Algorithm

  1. input: x

    • Resample: \textcolor{#07B875}{v} \sim \mathcal{N}(0, \mathbb{1}); \,\,{d\sim\mathcal{U}(\pm)}
    • Construct initial state: \textcolor{#939393}{\xi} =(\textcolor{#AE81FF}{x}, \textcolor{#07B875}{v}, {\pm})
  2. forward: Generate proposal \xi' by passing initial \xi through N_{\mathrm{LF}} leapfrog layers
    \textcolor{#939393} \xi \hspace{1pt}\xrightarrow[]{\tiny{\mathrm{LF} \text{ layer}}}\xi_{1} \longrightarrow\cdots \longrightarrow \xi_{N_{\mathrm{LF}}} = \textcolor{#f8f8f8}{\xi'} := (\textcolor{#AE81FF}{x''}, \textcolor{#07B875}{v''})

    • Accept / Reject: \begin{equation*} A({\textcolor{#f8f8f8}{\xi'}}|{\textcolor{#939393}{\xi}})= \mathrm{min}\left\{1, \frac{\pi(\textcolor{#f8f8f8}{\xi'})}{\pi(\textcolor{#939393}{\xi})} \left| \mathcal{J}\left(\textcolor{#f8f8f8}{\xi'},\textcolor{#939393}{\xi}\right)\right| \right\} \end{equation*}
  3. backward (if training):

    • Evaluate the loss function1 \mathcal{L}\gets \mathcal{L}_{\theta}(\textcolor{#f8f8f8}{\xi'}, \textcolor{#939393}{\xi}) and backprop
  4. return: \textcolor{#AE81FF}{x}_{i+1}
    Evaluate MH criteria (1) and return accepted config, \textcolor{#AE81FF}{{x}_{i+1}}\gets \begin{cases} \textcolor{#f8f8f8}{\textcolor{#AE81FF}{x''}} \small{\text{ w/ prob }} A(\textcolor{#f8f8f8}{\xi''}|\textcolor{#939393}{\xi}) \hspace{26pt} βœ… \\ \textcolor{#939393}{\textcolor{#AE81FF}{x}} \hspace{5pt}\small{\text{ w/ prob }} 1 - A(\textcolor{#f8f8f8}{\xi''}|{\textcolor{#939393}{\xi}}) \hspace{10pt} 🚫 \end{cases}

Figure 6: Leapfrog Layer used in generalized MD update

4D SU(3) Model

Link Variables

  • Write link variables U_{\mu}(x) \in SU(3):

    \begin{align*} U_{\mu}(x) &= \mathrm{exp}\left[{i\, \textcolor{#AE81FF}{\omega^{k}_{\mu}(x)} \lambda^{k}}\right]\\ &= e^{i \textcolor{#AE81FF}{Q}},\quad \text{with} \quad \textcolor{#AE81FF}{Q} \in \mathfrak{su}(3) \end{align*}

    where \omega^{k}_{\mu}(x) \in \mathbb{R}, and \lambda^{k} are the generators of SU(3)

Conjugate Momenta

  • Introduce P_{\mu}(x) = P^{k}_{\mu}(x) \lambda^{k} conjugate to \omega^{k}_{\mu}(x)

Wilson Action

S_{G} = -\frac{\beta}{6} \sum \mathrm{Tr}\left[U_{\mu\nu}(x) + U^{\dagger}_{\mu\nu}(x)\right]

where U_{\mu\nu}(x) = U_{\mu}(x) U_{\nu}(x+\hat{\mu}) U^{\dagger}_{\mu}(x+\hat{\nu}) U^{\dagger}_{\nu}(x)

Figure 7: Illustration of the lattice

HMC: 4D SU(3)

Hamiltonian: H[P, U] = \frac{1}{2} P^{2} + S[U] \Longrightarrow

  • U update: \frac{d\omega^{k}}{dt} = \frac{\partial H}{\partial P^{k}} \frac{d\omega^{k}}{dt}\lambda^{k} = P^{k}\lambda^{k} \Longrightarrow \frac{dQ}{dt} = P \begin{align*} Q(\textcolor{#FFEE58}{\varepsilon}) &= Q(0) + \textcolor{#FFEE58}{\varepsilon} P(0)\Longrightarrow\\ -i\, \log U(\textcolor{#FFEE58}{\varepsilon}) &= -i\, \log U(0) + \textcolor{#FFEE58}{\varepsilon} P(0) \\ U(\textcolor{#FFEE58}{\varepsilon}) &= e^{i\,\textcolor{#FFEE58}{\varepsilon} P(0)} U(0)\Longrightarrow \\ &\hspace{1pt}\\ \textcolor{#FD971F}{\Lambda}:\,\, U \longrightarrow U' &:= e^{i\varepsilon P'} U \end{align*}
  • P update: \frac{dP^{k}}{dt} = - \frac{\partial H}{\partial \omega^{k}} \frac{dP^{k}}{dt} = - \frac{\partial H}{\partial \omega^{k}} = -\frac{\partial H}{\partial Q} = -\frac{dS}{dQ}\Longrightarrow \begin{align*} P(\textcolor{#FFEE58}{\varepsilon}) &= P(0) - \textcolor{#FFEE58}{\varepsilon} \left.\frac{dS}{dQ}\right|_{t=0} \\ &= P(0) - \textcolor{#FFEE58}{\varepsilon} \,\textcolor{#E599F7}{F[U]} \\ &\hspace{1pt}\\ \textcolor{#F06292}{\Gamma}:\,\, P \longrightarrow P' &:= P - \frac{\varepsilon}{2} F[U] \end{align*}

HMC: 4D SU(3)

  • Momentum Update: \textcolor{#F06292}{\Gamma}: P \longrightarrow P' := P - \frac{\varepsilon}{2} F[U]

  • Link Update: \textcolor{#FD971F}{\Lambda}: U \longrightarrow U' := e^{i\varepsilon P'} U\quad\quad

  • We maintain a batch of Nb lattices, all updated in parallel

    • U.dtype = complex128
    • U.shape
      = [Nb, 4, Nt, Nx, Ny, Nz, 3, 3]

Networks 4D SU(3)

 

 

U-Network:

UNet: (U, P) \longrightarrow \left(s_{U},\, t_{U},\, q_{U}\right)

 

P-Network:

PNet: (U, P) \longrightarrow \left(s_{P},\, t_{P},\, q_{P}\right)

Networks 4D SU(3)

 

 

U-Network:

UNet: (U, P) \longrightarrow \left(s_{U},\, t_{U},\, q_{U}\right)

 

P-Network:

PNet: (U, P) \longrightarrow \left(s_{P},\, t_{P},\, q_{P}\right)

\uparrow
let’s look at this

P-Network (pt. 1)

  • input1: \hspace{7pt}\left(U, F\right) := (e^{i Q}, F) \begin{align*} h_{0} &= \sigma\left( w_{Q} Q + w_{F} F + b \right) \\ h_{1} &= \sigma\left( w_{1} h_{0} + b_{1} \right) \\ &\vdots \\ h_{n} &= \sigma\left(w_{n-1} h_{n-2} + b_{n}\right) \\ \textcolor{#FF5252}{z} & := \sigma\left(w_{n} h_{n-1} + b_{n}\right) \longrightarrow \\ \end{align*}
  • output2: \hspace{7pt} (s_{P}, t_{P}, q_{P})

    • s_{P} = \lambda_{s} \tanh(w_s \textcolor{#FF5252}{z} + b_s)
    • t_{P} = w_{t} \textcolor{#FF5252}{z} + b_{t}
    • q_{P} = \lambda_{q} \tanh(w_{q} \textcolor{#FF5252}{z} + b_{q})

P-Network (pt. 2)

  • Use (s_{P}, t_{P}, q_{P}) to update \Gamma^{\pm}: (U, P) \rightarrow \left(U, P_{\pm}\right)1:

    • forward (d = \textcolor{#FF5252}{+}): \Gamma^{\textcolor{#FF5252}{+}}(U, P) := P_{\textcolor{#FF5252}{+}} = P \cdot e^{\frac{\varepsilon}{2} s_{P}} - \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{P}} + t_{P} \right]

    • backward (d = \textcolor{#1A8FFF}{-}): \Gamma^{\textcolor{#1A8FFF}{-}}(U, P) := P_{\textcolor{#1A8FFF}{-}} = e^{-\frac{\varepsilon}{2} s_{P}} \left\{P + \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{P}} + t_{P} \right]\right\}

Results: 2D U(1)

Improvement

We can measure the performance by comparing \tau_{\mathrm{int}} for the trained model vs. HMC.

Note: lower is better

Interpretation

Deviation in x_{P}

Topological charge mixing

Artificial influx of energy

Figure 8: Illustration of how different observables evolve over a single L2HMC trajectory.

Interpretation

Average plaquette: \langle x_{P}\rangle vs LF step

Average energy: H - \sum\log|\mathcal{J}|
Figure 9: The trained model artifically increases the energy towards the middle of the trajectory, allowing the sampler to tunnel between isolated sectors.

4D SU(3) Results

  • Distribution of \log|\mathcal{J}| over all chains, at each leapfrog step, N_{\mathrm{LF}} (= 0, 1, \ldots, 8) during training:
Figure 10: 100 train iters
Figure 11: 500 train iters
Figure 12: 1000 train iters

4D SU(3) Results: \delta U_{\mu\nu}

Figure 13: The difference in the average plaquette \left|\delta U_{\mu\nu}\right|^{2} between the trained model and HMC

4D SU(3) Results: \delta U_{\mu\nu}

Figure 14: The difference in the average plaquette \left|\delta U_{\mu\nu}\right|^{2} between the trained model and HMC

Next Steps

  • Further code development

  • Continue to use / test different network architectures

    • Gauge equivariant NNs for U_{\mu}(x) update
  • Continue to test different loss functions for training

  • Scaling:

    • Lattice volume
    • Network size
    • Batch size
    • # of GPUs

Thank you!

 

Acknowledgements

This research used resources of the Argonne Leadership Computing Facility,
which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.

hits l2hmc-qcd codefactor

arxiv arxiv

hydra pyTorch tensorflow Weights & Biases monitoring

Acknowledgements

  • Huge thank you to:
    • Yannick Meurice
    • Norman Christ
    • Akio Tomiya
    • Nobuyuki Matsumoto
    • Richard Brower
    • Luchang Jin
    • Chulwoo Jung
    • Peter Boyle
    • Taku Izubuchi
    • Denis Boyda
    • Dan Hackett
    • ECP-CSD group
    • ALCF Staff + Datascience Group

References

(I don’t know why this is broken πŸ€·πŸ»β€β™‚οΈ )

Boyda, Denis et al. 2022. β€œApplications of Machine Learning to Lattice Quantum Field Theory.” In Snowmass 2021. https://arxiv.org/abs/2202.05838.
Foreman, Sam, Taku Izubuchi, Luchang Jin, Xiao-Yong Jin, James C. Osborn, and Akio Tomiya. 2022. β€œHMC with Normalizing Flows.” PoS LATTICE2021: 073. https://doi.org/10.22323/1.396.0073.
Foreman, Sam, Xiao-Yong Jin, and James C. Osborn. 2021. β€œDeep Learning Hamiltonian Monte Carlo.” In 9th International Conference on Learning Representations. https://arxiv.org/abs/2105.03418.
β€”β€”β€”. 2022. β€œLeapfrogLayers: A Trainable Framework for Effective Topological Sampling.” PoS LATTICE2021 (May): 508. https://doi.org/10.22323/1.396.0508.
Shanahan, Phiala et al. 2022. β€œSnowmass 2021 Computational Frontier CompF03 Topical Group Report: Machine Learning,” September. https://arxiv.org/abs/2209.07559.

Extras

Integrated Autocorrelation Time

Figure 15: Plot of the integrated autocorrelation time for both the trained model (colored) and HMC (greyscale).

Comparison

(a) Trained model
(b) Generic HMC
Figure 16: Comparison of \langle \delta Q\rangle = \frac{1}{N}\sum_{i=k}^{N} \delta Q_{i} for the trained model Figure 16 (a) vs. HMC Figure 16 (b)

Plaquette analysis: x_{P}

Deviation from V\rightarrow\infty limit, x_{P}^{\ast}

Average \langle x_{P}\rangle, with x_{P}^{\ast} (dotted-lines)

Figure 17: Plot showing how average plaquette, \left\langle x_{P}\right\rangle varies over a single trajectory for models trained at different \beta, with varying trajectory lengths N_{\mathrm{LF}}

Loss Function

  • Want to maximize the expected squared charge difference1: \begin{equation*} \mathcal{L}_{\theta}\left(\xi^{\ast}, \xi\right) = {\mathbb{E}_{p(\xi)}}\big[-\textcolor{#FA5252}{{\delta Q}}^{2} \left(\xi^{\ast}, \xi \right)\cdot A(\xi^{\ast}|\xi)\big] \end{equation*}

  • Where:

    • \delta Q is the tunneling rate: \begin{equation*} \textcolor{#FA5252}{\delta Q}(\xi^{\ast},\xi)=\left|Q^{\ast} - Q\right| \end{equation*}

    • A(\xi^{\ast}|\xi) is the probability2 of accepting the proposal \xi^{\ast}: \begin{equation*} A(\xi^{\ast}|\xi) = \mathrm{min}\left( 1, \frac{p(\xi^{\ast})}{p(\xi)}\left|\frac{\partial \xi^{\ast}}{\partial \xi^{T}}\right|\right) \end{equation*}

Networks 2D U(1)

  • Stack gauge links as shape\left(U_{\mu}\right)=[Nb, 2, Nt, Nx] \in \mathbb{C}

    x_{\mu}(n) ≔ \left[\cos(x), \sin(x)\right]

    with shape\left(x_{\mu}\right)= [Nb, 2, Nt, Nx, 2] \in \mathbb{R}

  • x-Network:

    • \psi_{\theta}: (x, v) \longrightarrow \left(s_{x},\, t_{x},\, q_{x}\right)
  • v-Network:

    • \varphi_{\theta}: (x, v) \longrightarrow \left(s_{v},\, t_{v},\, q_{v}\right) \hspace{2pt}\longleftarrow lets look at this

v-Update1

  • forward (d = \textcolor{#FF5252}{+}):

\Gamma^{\textcolor{#FF5252}{+}}: (x, v) \rightarrow v' := v \cdot e^{\frac{\varepsilon}{2} s_{v}} - \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{v}} + t_{v} \right]

  • backward (d = \textcolor{#1A8FFF}{-}):

\Gamma^{\textcolor{#1A8FFF}{-}}: (x, v) \rightarrow v' := e^{-\frac{\varepsilon}{2} s_{v}} \left\{v + \frac{\varepsilon}{2}\left[ F \cdot e^{\varepsilon q_{v}} + t_{v} \right]\right\}

x-Update

  • forward (d = \textcolor{#FF5252}{+}):

\Lambda^{\textcolor{#FF5252}{+}}(x, v) = x \cdot e^{\frac{\varepsilon}{2} s_{x}} - \frac{\varepsilon}{2}\left[ v \cdot e^{\varepsilon q_{x}} + t_{x} \right]

  • backward (d = \textcolor{#1A8FFF}{-}):

\Lambda^{\textcolor{#1A8FFF}{-}}(x, v) = e^{-\frac{\varepsilon}{2} s_{x}} \left\{x + \frac{\varepsilon}{2}\left[ v \cdot e^{\varepsilon q_{x}} + t_{x} \right]\right\}

Lattice Gauge Theory (2D U(1))

Link Variables

U_{\mu}(n) = e^{i x_{\mu}(n)}\in \mathbb{C},\quad \text{where}\quad x_{\mu}(n) \in [-\pi,\pi)

Wilson Action

S_{\beta}(x) = \beta\sum_{P} \cos \textcolor{#00CCFF}{x_{P}},

\textcolor{#00CCFF}{x_{P}} = \left[x_{\mu}(n) + x_{\nu}(n+\hat{\mu}) - x_{\mu}(n+\hat{\nu})-x_{\nu}(n)\right]

Note: \textcolor{#00CCFF}{x_{P}} is the product of links around 1\times 1 square, called a β€œplaquette”

2D Lattice

Figure 18: Jupyter Notebook

Annealing Schedule

  • Introduce an annealing schedule during the training phase:

    \left\{ \gamma_{t} \right\}_{t=0}^{N} = \left\{\gamma_{0}, \gamma_{1}, \ldots, \gamma_{N-1}, \gamma_{N} \right\}

    where \gamma_{0} < \gamma_{1} < \cdots < \gamma_{N} \equiv 1, and \left|\gamma_{t+1} - \gamma_{t}\right| \ll 1

  • Note:

    • for \left|\gamma_{t}\right| < 1, this rescaling helps to reduce the height of the energy barriers \Longrightarrow
    • easier for our sampler to explore previously inaccessible regions of the phase space

Networks 2D U(1)

  • Stack gauge links as shape\left(U_{\mu}\right)=[Nb, 2, Nt, Nx] \in \mathbb{C}

    x_{\mu}(n) ≔ \left[\cos(x), \sin(x)\right]

    with shape\left(x_{\mu}\right)= [Nb, 2, Nt, Nx, 2] \in \mathbb{R}

  • x-Network:

    • \psi_{\theta}: (x, v) \longrightarrow \left(s_{x},\, t_{x},\, q_{x}\right)
  • v-Network:

    • \varphi_{\theta}: (x, v) \longrightarrow \left(s_{v},\, t_{v},\, q_{v}\right)

Toy Example: GMM \in \mathbb{R}^{2}

Figure 19

Physical Quantities

  • To estimate physical quantities, we:
    • Calculate physical observables at increasing spatial resolution
    • Perform extrapolation to continuum limit

Figure 20: Increasing the physical resolution (a \rightarrow 0) allows us to make predictions about numerical values of physical quantities in the continuum limit.

Extra