π QuickstartΒΆ
π ezpz provides a set of dynamic, light weight utilities that simplify
running experiments with distributed PyTorch.
These can be broken down, roughly into two distinct categories:
-
Shell Environment and Setup:
- Bash script at
ezpz/bin/
utils.sh,
Use via:
This script contains utilities for automatic:
- Job scheduler detection with Slurm and PBS
- Module loading and base Python environment setup
- Virtual environment creation and activation
... and more!
- Check out ποΈ Shell Environment for additional information.
- Bash script at
ezpz/bin/
-
[Python Library]:
- Launching and running distributed PyTorch code (from python!)
- Device Management, and running on different
{
cuda,xpu,mps,cpu} devices - Experiment Tracking and tools for automatically recording, saving and plotting metrics.
Pick and Choose
Each of these components are designed so that you can pick and choose only those tools that are useful for you.
For example, if you're only interested in the automatic device detection, all you need is:
π Write Hardware Agnostic Distributed PyTorch CodeΒΆ
-
Accelerator detection:
ezpz.get_torch_device_type()andezpz.setup_torch()normalize CUDA/XPU/MPS/CPU selection. -
Scheduler smarts: detects PBS/Slurm automatically;
Otherwise falls back tompirunwith sensible env forwarding. For launcher-only flags/env (e.g.,-x FOO=bar), place them before--; everything after--is the command to run:e.g.:
or, specify
-n 8processes, forward a specificPYTHONPATH, and setEZPZ_LOG_LEVEL=DEBUG:
π€ Using ezpz in Your ApplicationΒΆ
The real usefulness of ezpz comes from its usefulness in other applications.
-
ezpz.setup_torch()replaces manualtorch.distributedinitialization: -
ezpz.get_local_rank()replaces manualos.environ["LOCAL_RANK"]: -
ezpz.get_rank()replaces manualos.environ["RANK"]: -
ezpz.get_world_size()replaces manualos.environ["WORLD_SIZE"]: -
ezpz.get_torch_device()replaces manual device assignment: -
ezpz.wrap_model()replaces manualDistributedDataParallelwrapping: -
ezpz.synchronize()replaces manual device synchronization:
π Track metrics with ezpz.HistoryΒΆ
Capture metrics across all ranks, persist JSONL, generate text/PNG plots, and (when configured) log to Weights & Biasesβno extra code on worker ranks.