đď¸ Training
Training
Subsections:
Emulate a multi-node setup using just a single node - instructions on how to emulate a multi-node setup using just a single node - we use the
deepspeed
launcher here.Re-train HF hub models from scratch using finetuning examples
Tools:
printflock.py - a tiny library that makes your
print
calls non-interleaved in a multi-gpu environment.multi-gpu-non-interleaved-print.py - a
flock
-based wrapper aroundprint
that prevents messages from getting interleaved when multiple processes print at the same time - which is the case withtorch.distributed
used with multiple-gpus.
Citation
BibTeX citation:
@online{bekman2024,
author = {Bekman, Stas and Foreman, Sam},
title = {ML {Engineering}},
date = {2024-02-20},
url = {https://saforem2.github.io/ml-engineering},
langid = {en}
}
For attribution, please cite this work as:
Bekman, Stas, and Sam Foreman. 2024. âML Engineering.â
February 20, 2024. https://saforem2.github.io/ml-engineering.