Example: Approximating $\pi$ using Markov Chain Monte Carlo (MCMC)

Sam Foreman; Huihuo Zheng

Example: Approximating \pi using Markov Chain Monte Carlo (MCMC)

ai4science

hpc

llm

mcmc

Simple example illustrating how to estimate \pi (in parallel) with Markov Chain Monte Carlo (MCMC) and MPI

Authors

Affiliations

Sam Foreman

ANL

Huihuo Zheng

ANL

Published

July 15, 2025

Modified

August 11, 2025

Parallel Computing

Parallel computing refers to the process of breaking down larger problems into smaller, independent, often similar parts that can be executed simultaneously by multiple processors communicating via network or shared memory, the results of which are combined upon completion as part of an overall algorithm.

Example: Estimate \pi

We can calculate the value of \pi using a MPI parallelized version of the Monte Carlo method. The basic idea is to estimate \pi by randomly sampling points within a square and determining how many fall inside a quarter circle inscribed within that square.

The ratio between the area of the circle and the square is

\frac{N_\text{in}}{N_\text{total}} = \frac{\pi r^2}{4r^2} = \frac{\pi}{4}

Therefore, we can calculate \pi using \pi = \frac{4N_\text{in}}{N_\text{total}}

import ambivalent

import matplotlib.pyplot as plt
import seaborn as sns

plt.style.use(ambivalent.STYLES['ambivalent'])
sns.set_context("notebook")
plt.rcParams["figure.figsize"] = [6.4, 4.8]

from IPython.display import display, clear_output
import matplotlib.pyplot as plt
import numpy as np
import random
import time

import ezpz
from rich import print


fig, ax = plt.subplots()
#ax = fig.add_subplot(111)
circle = plt.Circle(( 0. , 0. ), 0.5 )
plt.xlim(-0.5, 0.5)
plt.ylim(-0.5, 0.5)
ax.add_patch(circle)
ax.set_aspect('equal')
N = 500
Nin = 0
t0 = time.time()
for i in range(1, N+1):
    x = random.uniform(-0.5, 0.5)
    y = random.uniform(-0.5, 0.5)
    if (np.sqrt(x*x + y*y) < 0.5):
        Nin += 1
        plt.plot([x], [y], 'o', color='r', markersize=3)
    else:
        plt.plot([x], [y], 'o', color='b', markersize=3)
    display(fig)
    plt.xlabel("$\pi$ = %3.4f \n N_in / N_total = %5d/%5d" %(Nin*4.0/i, Nin, i))
    clear_output(wait=True)

res = np.array(Nin, dtype='d')
t1 = time.time()
print(f"Pi = {res/float(N/4.0)}")
print("Time: %s" %(t1 - t0))

Pi = 3.144

Time: 25.008345127105713

MPI example

Nodes	PyTorch-2.5	PyTorch-2.7	PyTorch-2.8
N1xR12	17.39	31.01	33.09
N2xR12	3.81	32.71	33.26

from mpi4py import MPI
import numpy as np
import random
import time
comm = MPI.COMM_WORLD

N = 5000000
Nin = 0
t0 = time.time()
for i in range(comm.rank, N, comm.size):
    x = random.uniform(-0.5, 0.5)
    y = random.uniform(-0.5, 0.5)
    if (np.sqrt(x*x + y*y) < 0.5):
        Nin += 1
res = np.array(Nin, dtype='d')
res_tot = np.array(Nin, dtype='d')
comm.Allreduce(res, res_tot, op=MPI.SUM)
t1 = time.time()
if comm.rank==0:
    print(res_tot/float(N/4.0))
    print("Time: %s" %(t1 - t0))

3.14222

Time: 3.022225856781006

Running \pi example on Google Colab

Go to https://colab.research.google.com/, sign in or sign up
“File”-> “open notebook”
Choose 01_intro_AI_on_Supercomputer/00_mpi.ipynb from the list

! wget https://raw.githubusercontent.com/argonne-lcf/ai-science-training-series/main/01_intro_AI_on_Supercomputer/mpi_pi.py
! pip install mpi4py

! mpirun -np 1 --allow-run-as-root python mpi_pi.py

Number of processes: 1
Pi = 3.1413128
Time: 2.814826

! mpirun -np 2 --allow-run-as-root --oversubscribe python mpi_pi.py

Number of processes: 2
Pi = 3.1405968
Time: 1.423310

! mpirun -np 4 --allow-run-as-root --oversubscribe python mpi_pi.py

Number of processes: 4
Pi = 3.1424792
Time: 0.729376

Running \pi on Polaris

ssh <username>@polaris.alcf.anl.gov
qsub -A ALCFAITP -l select=1 -q ALCFAITP -l walltime=0:30:00 -l filesystems=home:eagle
# choose debug queue outside of the class
# qsub -A ALCFAITP -l select=1 -q debug -l walltime=0:30:00 -l filesystems=home:eagle

module load conda/2023-10-04
conda activate /soft/datascience/ALCFAITP/2023-10-04
git clone git@github.com:argonne-lcf/ai-science-training-series.git
cd ai-science-training-series/01_intro_AI_on_Supercomputer/
mpirun -np 1 python mpi_pi.py   # 3.141988,   8.029037714004517  s
mpirun -np 2 python mpi_pi.py   # 3.1415096   4.212774038314819  s
mpirun -np 4 python mpi_pi.py   # 3.1425632   2.093632459640503  s
mpirun -np 8 python mpi_pi.py   # 3.1411632   1.0610620975494385 s

Parallel computing in AI

The parallel computing in AI is usually called distributed training.

Distributed training is the process of training I models across multiple GPUs or other accelerators, with the goal of speeding up the training process and enabling the training of larger models on larger datasets.

There are two ways of parallelization in distributed training.

Data parallelism:
- Each worker (GPU) has a complete set of model
- different workers work on different subsets of data.
Model parallelism
- The model is splitted into different parts and stored on different workers
- Different workers work on computation involved in different parts of the model

Citation

BibTeX citation:

@online{foreman2025,
  author = {Foreman, Sam and Foreman, Sam and Zheng, Huihuo},
  title = {Example: {Approximating} \$\textbackslash Pi\$ Using {Markov}
    {Chain} {Monte} {Carlo} {(MCMC)}},
  date = {2025-07-15},
  url = {https://saforem2.github.io/hpc-bootcamp-2025/00-intro-AI-HPC/5-mcmc-example/},
  langid = {en}
}

For attribution, please cite this work as:

Foreman, Sam, Sam Foreman, and Huihuo Zheng. 2025. “Example: Approximating $\Pi$ Using Markov Chain Monte Carlo (MCMC).” July 15, 2025. https://saforem2.github.io/hpc-bootcamp-2025/00-intro-AI-HPC/5-mcmc-example/.

--- title: 'Example: Approximating $\pi$ using Markov Chain Monte Carlo (MCMC)' description: 'Simple example illustrating how to estimate $\pi$ (in parallel) with Markov Chain Monte Carlo (MCMC) and MPI' categories: - ai4science - hpc - llm - mcmc toc: true date: 2025-07-15 date-modified: last-modified format: ipynb: default-image-extension: svg html: default gfm: toc: true author: - id: sf name: Sam Foreman orcid: 0000-0002-9981-0876 email: foremans@anl.gov affiliation: - name: '[ANL](https://www.anl.gov/)' city: Lemont state: IL url: https://alcf.anl.gov/about/people/sam-foreman - id: hz name: Huihuo Zheng # orcid: 0000-0002-9981-0876 email: huihuo.zheng@anl.gov affiliation: - name: '[ANL](https://www.anl.gov/)' city: Lemont state: IL url: https://alcf.anl.gov/about/people/huihuo-zheng --- [![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/saforem2/intro-hpc-bootcamp-2025/blob/main/docs/00-intro-AI-HPC/5-mcmc-example/index.ipynb) [![](https://img.shields.io/badge/-View%20on%20GitHub-333333?style=flat&logo=github&labelColor=gray.png)](https://github.com/saforem2/intro-hpc-bootcamp-2025/blob/main/docs/00-intro-AI-HPC/5-mcmc-example/README.md)  ::: {.callout-important title="Parallel Computing" collapse="false"} **Parallel computing** refers to the process of breaking down larger problems into smaller, independent, often similar parts that can be executed simultaneously by multiple processors communicating via network or shared memory, the results of which are combined upon completion as part of an overall algorithm. ::: ## Example: Estimate $\pi$ We can calculate the value of $\pi$ using a MPI parallelized version of the Monte Carlo method. The basic idea is to estimate $\pi$ by randomly sampling points within a square and determining how many fall inside a quarter circle inscribed within that square. ![$\pi$](https://www.101computing.net/wp/wp-content/uploads/estimating-pi-monte-carlo-method.png) The ratio between the area of the circle and the square is $$\frac{N_\text{in}}{N_\text{total}} = \frac{\pi r^2}{4r^2} = \frac{\pi}{4}$$ Therefore, we can calculate $\pi$ using $\pi = \frac{4N_\text{in}}{N_\text{total}}$ ```{python} import ambivalent import matplotlib.pyplot as plt import seaborn as sns plt.style.use(ambivalent.STYLES['ambivalent']) sns.set_context("notebook") plt.rcParams["figure.figsize"] = [6.4, 4.8] from IPython.display import display, clear_output import matplotlib.pyplot as plt import numpy as np import random import time import ezpz from rich import print fig, ax = plt.subplots() #ax = fig.add_subplot(111) circle = plt.Circle(( 0. , 0. ), 0.5 ) plt.xlim(-0.5, 0.5) plt.ylim(-0.5, 0.5) ax.add_patch(circle) ax.set_aspect('equal') N = 500 Nin = 0 t0 = time.time() for i in range(1, N+1): x = random.uniform(-0.5, 0.5) y = random.uniform(-0.5, 0.5) if (np.sqrt(x*x + y*y) < 0.5): Nin += 1 plt.plot([x], [y], 'o', color='r', markersize=3) else: plt.plot([x], [y], 'o', color='b', markersize=3) display(fig) plt.xlabel("$\pi$ = %3.4f \n N_in / N_total = %5d/%5d" %(Nin*4.0/i, Nin, i)) clear_output(wait=True) res = np.array(Nin, dtype='d') t1 = time.time() print(f"Pi = {res/float(N/4.0)}") print("Time: %s" %(t1 - t0)) ``` ### MPI example | Nodes | PyTorch-2.5 | PyTorch-2.7 | PyTorch-2.8 | | :----: | :---------: | :---------: | :---------: | | N1xR12 | 17.39 | 31.01 | 33.09 | | N2xR12 | 3.81 | 32.71 | 33.26 | ```{python} from mpi4py import MPI import numpy as np import random import time comm = MPI.COMM_WORLD N = 5000000 Nin = 0 t0 = time.time() for i in range(comm.rank, N, comm.size): x = random.uniform(-0.5, 0.5) y = random.uniform(-0.5, 0.5) if (np.sqrt(x*x + y*y) < 0.5): Nin += 1 res = np.array(Nin, dtype='d') res_tot = np.array(Nin, dtype='d') comm.Allreduce(res, res_tot, op=MPI.SUM) t1 = time.time() if comm.rank==0: print(res_tot/float(N/4.0)) print("Time: %s" %(t1 - t0)) ``` ### Running $\pi$ example on Google Colab * Go to https://colab.research.google.com/, sign in or sign up * "File"-> "open notebook" * Choose `01_intro_AI_on_Supercomputer/00_mpi.ipynb` from the list ![Google Colab](../figures/colab.png) ```{python} #| output: false ! wget https://raw.githubusercontent.com/argonne-lcf/ai-science-training-series/main/01_intro_AI_on_Supercomputer/mpi_pi.py ! pip install mpi4py ``` ```{python} ! mpirun -np 1 --allow-run-as-root python mpi_pi.py ``` ```{python} ! mpirun -np 2 --allow-run-as-root --oversubscribe python mpi_pi.py ``` ```{python} ! mpirun -np 4 --allow-run-as-root --oversubscribe python mpi_pi.py ``` ### Running $\pi$ on Polaris ```bash ssh <username>@polaris.alcf.anl.gov qsub -A ALCFAITP -l select=1 -q ALCFAITP -l walltime=0:30:00 -l filesystems=home:eagle # choose debug queue outside of the class # qsub -A ALCFAITP -l select=1 -q debug -l walltime=0:30:00 -l filesystems=home:eagle module load conda/2023-10-04 conda activate /soft/datascience/ALCFAITP/2023-10-04 git clone git@github.com:argonne-lcf/ai-science-training-series.git cd ai-science-training-series/01_intro_AI_on_Supercomputer/ mpirun -np 1 python mpi_pi.py # 3.141988, 8.029037714004517 s mpirun -np 2 python mpi_pi.py # 3.1415096 4.212774038314819 s mpirun -np 4 python mpi_pi.py # 3.1425632 2.093632459640503 s mpirun -np 8 python mpi_pi.py # 3.1411632 1.0610620975494385 s ``` ## Parallel computing in AI The parallel computing in AI is usually called distributed training. Distributed training is the process of training I models across multiple GPUs or other accelerators, with the goal of speeding up the training process and enabling the training of larger models on larger datasets. There are two ways of parallelization in distributed training. * **Data parallelism**: - Each worker (GPU) has a complete set of model - different workers work on different subsets of data. * **Model parallelism** - The model is splitted into different parts and stored on different workers - Different workers work on computation involved in different parts of the model ::: {#fig-parallel-computing} ![](../figures/parallel_computing.png) PI ::: :::{#fig-3dllm} ![](../figures/3DLLM.png) 3D LLM :::