Training LLMs with π ezpz + π€ Trainer
The
src/ezpz/examples/hf_trainer.py
module provides a mechanism for distributed training with π€ huggingface /
transformers .
In particular, it allows for distributed training using the
transformers.Trainer
object with any (compatible) combination of
{models ,
datasets }.
[!NOTE]
Quick start:
# setup env
source <( curl -sL https://bit.ly/ezpz-utils)
ezpz_setup_env
# install ezpz
uv pip install --no-cache --link-mode= copy "git+https://github.com/saforem2/ezpz"
# launch ezpz.examples.hf_trainer
ezpz launch -- python3 -m ezpz.examples.hf_trainer \
--streaming \
--dataset_name= stanfordnlp/imdb \
--tokenizer_name meta-llama/Llama-3.2-1B \
--model_name_or_path meta-llama/Llama-3.2-1B \
--bf16= true \
--do_train= true \
--do_eval= true \
--report-to= wandb \
--logging-steps= 1 \
--include-tokens-per-second= true \
--max-steps= 50000 \
--include-num-input-tokens-seen= true \
--optim= adamw_torch \
--logging-first-step \
--include-for-metrics= 'inputs,loss' \
--max-eval-samples= 50 \
--per_device_train_batch_size= 1 \
--block-size= 8192 \
--gradient_checkpointing= true # --fsdp=shard_grad_op
π£ Getting Started
π‘ Setup environment (on ANY {Intel, NVIDIA, AMD} accelerator)
source <( curl -L https://bit.ly/ezpz-utils)
ezpz_setup_env
π¦ Install dependencies:
Install π ezpz (from GitHub):
uv pip install --no-cache --link-mode= copy "git+https://github.com/saforem2/ezpz"
# or:
# python3 -m pip install "git+https://github.com/saforem2/ezpz" --require-virtualenv
<!--
2. Update {tiktoken, sentencepiece, transformers, evaluate}:
```bash
python3 -m pip install --upgrade tiktoken sentencepiece transformers evaluate
```
β
β Details
βοΈ Build DeepSpeed config:
python3 -c 'import ezpz; ezpz.utils.write_deepspeed_zero12_auto_config(zero_stage=1)'
π Launch training:
TSTAMP = $( date +%s)
python3 -m ezpz.launch -m ezpz.examples.hf_trainer \
--model_name_or_path meta-llama/Llama-3.2-1B \
--dataset_name stanfordnlp/imdb \
--deepspeed= ds_configs/deepspeed_zero1_auto_config.json \
--auto-find-batch-size= true \
--bf16= true \
--block-size= 4096 \
--do-eval= true \
--do-predict= true \
--do-train= true \
--gradient-checkpointing= true \
--include-for-metrics= inputs,loss \
--include-num-input-tokens-seen= true \
--include-tokens-per-second= true \
--log-level= info \
--logging-steps= 1 \
--max-steps= 10000 \
--output_dir= "hf-trainer-output/ ${ TSTAMP } " \
--report-to= wandb \
| tee "hf-trainer-output- ${ TSTAMP } .log"
πͺ Magic :
Behind the scenes, this will πͺ automagically
determine the specifics of the running job, and use this information to
construct (and subsequently run) the appropriate:
mpiexec <mpi-args> $( which python3) <cmd-to-launch>
across all of our available accelerators.
β Tip :
Call:
python3 -m ezpz.examples.hf_trainer --help
to see the full list of supported arguments.
In particular, any transformers.TrainingArguments should be supported.
π DeepSpeed Support
Additionally, DeepSpeed is fully
supported and can be configured by specifying the path to a compatible
DeepSpeed config json file , e.g.:
Build a DeepSpeed config:
python3 -c 'import ezpz; ezpz.utils.write_deepspeed_zero12_auto_config(zero_stage=2)'
Train:
python3 -m ezpz.launch -m ezpz.hf_trainer \
--dataset_name stanfordnlp/imdb \
--model_name_or_path meta-llama/Llama-3.2-1B \
--bf16 \
--do_train \
--report-to= wandb \
--logging-steps= 1 \
--include-tokens-per-second= true \
--auto-find-batch-size= true \
--deepspeed= ds_configs/deepspeed_zero2_auto_config.json
π 2 ez