ML Engineering
    • Repository
    • Source Code
    • New Issue
  1. πŸ‹οΈ Training
  2. Re-train HF Hub Models From Scratch Using Finetuning Examples
  • πŸ““ Resources
  • ✏️ Testing
  • πŸ€— Transformers
  • πŸ’» Compute
    • CPU memory
    • CPU
    • Accelerators
      • Accelerators
      • Nvidia
        • Troubleshooting NVIDIA GPUs
  • πŸ› Debugging
    • A Back up of scripts
    • Faster debug and development with tiny models, tokenizers and datasets
    • NCCL: Debug and Performance
    • Debugging PyTorch programs
    • Debug Tools
    • Diagnosing Hangings and Deadlocks in Multi-Node Multi-GPU Python Programs
    • Underflow and Overflow Detection
  • 🧠 Insights
    • πŸͺ– The AI Battlefield
  • πŸ›œ Networking
    • Networking Benchmarks
      • Network Benchmarks Results
        • Disabling NVLink Benchmark
  • 🎻 Orchestration
    • Working in SLURM Environment
      • SLURM Administration
      • Launchers with SLURM
      • SLURM Performance
      • SLURM for users
  • πŸ“¦ Storage
    • Benchmarks
      • Results
        • fio benchmark results for hope on 2023-12-20-14:37:02
  • πŸ‹οΈ Training
    • Tensor precision / Data types
    • Emulate a multi-node setup using just a single node
    • Selecting Training Hyper-Parameters And Model Initializations
    • Checkpoints
    • Fault Tolerance
    • Model Parallelism
    • Software Tune Up For The Best Performance
    • Reproducibility
    • Re-train HF Hub Models From Scratch Using Finetuning Examples
    • Avoiding, Recovering From and Understanding Instabilities
      • Understanding Training Loss Patterns

On this page

  • Re-train HF Hub Models From Scratch Using Finetuning Examples
  • View source
  • Edit this page
  • Report an issue

Other Formats

  • Github (GFM)
  1. πŸ‹οΈ Training
  2. Re-train HF Hub Models From Scratch Using Finetuning Examples

February 20, 2024

Re-train HF Hub Models From Scratch Using Finetuning Examples

HF Transformers has awesome finetuning examples https://github.com/huggingface/transformers/tree/main/examples/pytorch, that cover pretty much any modality and these examples work out of box.

But what if you wanted to re-train from scratch rather than finetune.

Here is a simple hack to accomplish that.

We will use facebook/opt-1.3b and we will plan to use bf16 training regime as an example here:

cat << EOT > prep-bf16.py
from transformers import AutoConfig, AutoModel, AutoTokenizer
import torch

mname = "facebook/opt-1.3b"

config = AutoConfig.from_pretrained(mname)
model = AutoModel.from_config(config, torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained(mname)

path = "opt-1.3b-bf16"

model.save_pretrained(path)
tokenizer.save_pretrained(path)
EOT

now run:

python prep-bf16.py

This will create a folder: opt-1.3b-bf16 with everything you need to train the model from scratch. In other words you have a pretrained-like model, except it only had its initializations done and none of the training yet.

Adjust to script above to use torch.float16 or torch.float32 if that’s what you plan to use instead.

Now you can proceed with finetuning this saved model as normal:

python -m torch.distributed.run \
--nproc_per_node=1 --nnode=1 --node_rank=0 \
--master_addr=127.0.0.1 --master_port=9901 \
examples/pytorch/language-modeling/run_clm.py --bf16 \
--seed 42 --model_name_or_path opt-1.3b-bf16 \
--dataset_name wikitext --dataset_config_name wikitext-103-raw-v1 \
--per_device_train_batch_size 12 --per_device_eval_batch_size 12 \
--gradient_accumulation_steps 1 --do_train --do_eval --logging_steps 10 \
--save_steps 1000 --eval_steps 100 --weight_decay 0.1 --num_train_epochs 1 \
--adam_beta1 0.9 --adam_beta2 0.95 --learning_rate 0.0002 --lr_scheduler_type \
linear --warmup_steps 500 --report_to tensorboard --output_dir save_dir

The key entry being:

--model_name_or_path opt-1.3b-bf16

where opt-1.3b-bf16 is your local directory you have just generated in the previous step.

Sometimes it’s possible to find the same dataset that the original model was trained on, sometimes you have to use an alternative dataset.

The rest of the hyper-parameters can often be found in the paper or documentation that came with the model.

To summarize, this recipe allows you to use finetuning examples to re-train whatever model you can find on the HF hub.

Back to top

Citation

BibTeX citation:
@online{bekman2024,
  author = {Bekman, Stas and Foreman, Sam},
  title = {ML {Engineering}},
  date = {2024-02-20},
  url = {https://saforem2.github.io/ml-engineering},
  langid = {en}
}
For attribution, please cite this work as:
Bekman, Stas, and Sam Foreman. 2024. β€œML Engineering.” February 20, 2024. https://saforem2.github.io/ml-engineering.
Reproducibility
Avoiding, Recovering From and Understanding Instabilities

ML-Engineering

2024

  • View source
  • Edit this page
  • Report an issue