# [`wordplay` üéÆ üí¨](https://github.com/saforem2/wordplay): Shakespeare

[Sam Foreman](https://samforeman.me)
(\[[ALCF](https://alcf.anl.gov/about/people/sam-foreman)\](<https://alcf.anl.gov/about/people/sam-foreman>))  
2025-07-22

‚úçÔ∏è

[Sam Foreman](https://samforeman.me)
(\[[ALCF](https://alcf.anl.gov/about/people/sam-foreman)\](<https://alcf.anl.gov/about/people/sam-foreman>))  
2025-07-22

We will be using the [Shakespeare
dataset](https://github.com/saforem2/wordplay/blob/main/data/shakespeare/readme.md)
to train a (~ small) 10M param LLM *from scratch*.

<img src="https://github.com/saforem2/wordplay/blob/main/assets/shakespeare.jpeg?raw=true" width="45%" align="center" /><br>

Image generated from
[stabilityai/stable-diffusion](https://huggingface.co/spaces/stabilityai/stable-diffusion)
on [ü§ó Spaces](https://huggingface.co/spaces).<br>

Prompt Details

Prompt:

<t><q> Shakespeare himself, dressed in full Shakespearean garb, writing
code at a modern workstation with multiple monitors, hacking away
profusely, backlit, high quality for publication </q></t>

Negative Prompt:

<t><q> low quality, 3d, photorealistic, ugly </q></t>

## Install / Setup

<b>Warning!</b><br>

**IF YOU ARE EXECUTING ON GOOGLE COLAB**:

You will need to restart your runtime (`Runtime` $\rightarrow\,$
`Restart runtime`)  
*after* executing the following cell:

In [1]:
%%bash

python3 -c 'import wordplay; print(wordplay.__file__)' 2> '/dev/null'

if [[ $? -eq 0 ]]; then
    echo "Has wordplay installed. Nothing to do."
else
    echo "Does not have wordplay installed. Installing..."
    git clone 'https://github.com/saforem2/wordplay'
    python3 wordplay/data/shakespeare_char/prepare.py
    python3 wordplay/data/shakespeare/prepare.py
    python3 -m pip install deepspeed
    python3 -m pip install -e wordplay
fi

/content/wordplay/src/wordplay/__init__.py
Has wordplay installed. Nothing to do.

## Post Install

If installed correctly, you should be able to:

``` python
>>> import wordplay
>>> wordplay.__file__
'/path/to/wordplay/src/wordplay/__init__.py'
```

In [2]:
%load_ext autoreload
%autoreload 2
import os
import sys
import ezpz

os.environ['COLORTERM'] = 'truecolor'
if sys.platform == 'darwin':
    # If running on MacOS:
    # os.environ['PYTORCH_ENABLE_MPS_FALLBACK'] = '1'
    os.environ['TORCH_DEVICE'] = 'cpu'
# -----------------------------------------------

logger = ezpz.get_logger()

import wordplay
logger.info(wordplay.__file__)

## Build Trainer

Explicitly, we:

1.  `setup_torch(...)`
2.  Build `cfg: DictConfig = get_config(...)`
3.  Instnatiate `config: ExperimentConfig = instantiate(cfg)`
4.  Build `trainer = Trainer(config)`

In [3]:
import wordplay
print(wordplay.__file__)

/content/wordplay/src/wordplay/__init__.py

In [4]:
import os
import numpy as np
from ezpz import setup
from hydra.utils import instantiate
from wordplay.configs import get_config, PROJECT_ROOT
from wordplay.trainer import Trainer

HF_DATASETS_CACHE = PROJECT_ROOT.joinpath('.cache', 'huggingface')
HF_DATASETS_CACHE.mkdir(exist_ok=True, parents=True)

os.environ['HF_DATASETS_CACHE'] = HF_DATASETS_CACHE.as_posix()

BACKEND = 'DDP'

rank = setup(
    framework='pytorch',
    backend=BACKEND,
    seed=1234,
)

cfg = get_config(
    [
        'data=shakespeare',
        'model=shakespeare',
        'model.batch_size=8',
        'model.block_size=1024',
        'optimizer=shakespeare',
        'train=shakespeare',
        f'train.backend={BACKEND}',
        'train.compile=false',
        'train.dtype=bfloat16',
        'train.max_iters=1000',
        'train.log_interval=10',
        'train.eval_interval=100',
    ]
)
config = instantiate(cfg)

### Build `Trainer` object

In [5]:
trainer = Trainer(config)

## Prompt (**prior** to training)

In [6]:
query = "What is an LLM?"
outputs = trainer.evaluate(
    query,
    num_samples=1,
    max_new_tokens=256,
    top_k=16,
    display=False
)
logger.info(f"['prompt']: '{query}'")
logger.info("['response']:\n\n" + fr"{outputs['0']['raw']}")

## Train Model

|  name  |         description          |
|:------:|:----------------------------:|
| `step` |    Current training step     |
| `loss` |          Loss value          |
|  `dt`  |  Time per step (in **ms**)   |
| `sps`  |      Samples per second      |
| `mtps` |   (million) Tokens per sec   |
| `mfu`  | Model Flops utilization\[1\] |

^legend: #tbl-legend

\[1\] in units of A100 `bfloat16` peak FLOPS

In [7]:
trainer.config.device_type

'cuda'

In [8]:
from rich import print

print(trainer.model)

## (partial) Training:

We‚Äôll first train for 500 iterations and then evaluate the models
performance on the same prompt:

> What is an LLM?

In [9]:
trainer.train(train_iters=500)

In [10]:
import time

query = "What is an LLM?"
t0 = time.perf_counter()
outputs = trainer.evaluate(
    query,
    num_samples=1,
    max_new_tokens=256,
    top_k=16,
    display=False
)
logger.info(f'took: {time.perf_counter() - t0:.4f}s')
logger.info(f"['prompt']: '{query}'")
logger.info("['response']:\n\n" + fr"{outputs['0']['raw']}")

## Resume Training‚Ä¶

In [11]:
trainer.train()

## Evaluate Model

In [12]:
import time

query = "What is an LLM?"
t0 = time.perf_counter()
outputs = trainer.evaluate(
    query,
    num_samples=1,
    max_new_tokens=256,
    top_k=2,
    display=False
)
logger.info(f'took: {time.perf_counter() - t0:.4f}s')
logger.info(f"['prompt']: '{query}'")
logger.info("['response']:\n\n" + fr"{outputs['0']['raw']}")