Checkpoints
torch-checkpoint-convert-to-bf16 - converts an existing fp32 torch checkpoint to bf16. If safetensors are found those are converted as well. Should be easily adaptable to other similar use cases.
torch-checkpoint-shrink.py - this script fixes checkpoints which for some reason stored tensors with storage larger than their view at the moment of saving. It clones the current view and re-saves them with just the storage of the current view.
Citation
BibTeX citation:
@online{bekman2024,
author = {Bekman, Stas and Foreman, Sam},
title = {ML {Engineering}},
date = {2024-02-20},
url = {https://saforem2.github.io/ml-engineering},
langid = {en}
}
For attribution, please cite this work as:
Bekman, Stas, and Sam Foreman. 2024. âML Engineering.â
February 20, 2024. https://saforem2.github.io/ml-engineering.