`wordplay` 🎮 💬

Sam Foreman

`wordplay` 🎮 💬

Sam Foreman

February 2, 2024

A set of simple, scalable and highly configurable tools for working¹ with LLMs.

Background

What started as some simple modifications to Andrej Karpathy's nanoGPT has now grown into the wordplay project.

If you’re curious…

While nanoGPT is a great project and an excellent resource; it is, by design, very minimal² and limited in its flexibility.

Working through the code I found myself making minor changes here and there to test new ideas and run variations on different experiments. These changes eventually built to the point where my {goals, scope, code} for the project had diverged significantly from the original vision.

As a result, I figured it made more sense to move things to a new project, wordplay.

I’ve priortized adding functionality that I have found to be useful or interesting, but am absolutely open to input or suggestions for improvement.

Different aspects of this project have been motivated by some of my recent work on LLMs.

Projects:
- ezpz: Painless distributed training with your favorite {framework, backend} combo.
- Megatron-DeepSpeed: Ongoing research training transformer language models at scale, including: BERT & GPT-2
Collaboration(s):
- DeepSpeed4Science (2023-09)
  - Loooooooong Sequence Lengths
  - Project Website
  - Preprint Song et al. (2023)
  - Blog Post
  - Tutorial
- GenSLMs:
Talks / Workshops:
- LLM-lunch-talk (2023-10-12): LLMs at ALCF.
  - Slides
  - GitHub
- Creating Small(-ish) LLMs (2023-11-30)
  - Workshop
  - Slides
  - GitHub

Completed

Work with any 🤗 HuggingFace dataset
Effortless distributed training using ezpz
Improved (type-safe) and extensible configuration system (powered by hydra), see #config
Automatic, detailed experiment + metric tracking with Weights & Biases
- Example Workspace
- Example Run
Rich informative logging with enrich
DeepSpeed support [~~completed~~: 2024-12-24]

In Progress

Full-Sharded Data-Parallel (FSDP) support
- Introducing PyTorch Fully Sharded Data Parallel (FSDP) API | PyTorch
3D Parallelism support via:
- Megatron-DeepSpeed
- native PyTorch:
  - Pipeline Parallelism — PyTorch 2.1 documentation
  - pytorch/PiPPy: Pipeline Parallelism for PyTorch

Install

Grab-n-Go

The easiest way to get the most recent version is to:

python3 -m pip install "git+https://github.com/saforem2/wordplay.git"

Development

If you’d like to work with the project and run / change things yourself, I’d recommend installing from a local (editable) clone of this repository:

git clone "https://github.com/saforem2/wordplay"
cd wordplay
mkdir v venv
python3 -m venv venv --system-site-packages
source venv/bin/activate
python3 -m pip install -e .

Last Updated: 02/02/2024 @ 21:53:52

Back to top

References

Song, Shuaiwen Leon, Bonnie Kruft, Minjia Zhang, Conglong Li, Shiyang Chen, Chengming Zhang, Masahiro Tanaka, et al. 2023. “DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery Through Sophisticated AI System Technologies.” https://arxiv.org/abs/2310.04610.

Footnotes

{
  "training",
  "fine-tuning",
  "benchmarking",
  "parallelizing",
  "distributing",
  "measuring",
  "..."
}

large models at scale.↩︎

nano, even 😂↩︎

Citation

BibTeX citation:

@online{foreman2024,
  author = {Foreman, Sam},
  title = {`Wordplay` 🎮 💬},
  date = {2024-02-02},
  url = {https://saforem2.github.io/wordplay},
  langid = {en}
}

For attribution, please cite this work as:

Foreman, Sam. 2024. “`Wordplay` 🎮 💬.” February 2, 2024. https://saforem2.github.io/wordplay.