wordplay 🎮 💬: Shakespeare

Author
Affiliation
Published

July 22, 2025

Modified

August 5, 2025

✍️

Sam Foreman ([ALCF](https://alcf.anl.gov/about/people/sam-foreman))
2025-07-22

We will be using the Shakespeare dataset to train a (~ small) 10M param LLM from scratch.


Image generated from stabilityai/stable-diffusion on 🤗 Spaces.

Prompt Details

  • Prompt:

  • Shakespeare himself, dressed in full Shakespearean garb, writing code at a modern workstation with multiple monitors, hacking away profusely, backlit, high quality for publication

  • Negative Prompt:

  • low quality, 3d, photorealistic, ugly

Install / Setup

Warning!

IF YOU ARE EXECUTING ON GOOGLE COLAB:

You will need to restart your runtime (Runtime \rightarrow\, Restart runtime)
after executing the following cell:

%%bash

python3 -c 'import wordplay; print(wordplay.__file__)' 2> '/dev/null'

if [[ $? -eq 0 ]]; then
    echo "Has wordplay installed. Nothing to do."
else
    echo "Does not have wordplay installed. Installing..."
    git clone 'https://github.com/saforem2/wordplay'
    python3 wordplay/data/shakespeare_char/prepare.py
    python3 wordplay/data/shakespeare/prepare.py
    python3 -m pip install deepspeed
    python3 -m pip install -e wordplay
fi
/content/wordplay/src/wordplay/__init__.py
Has wordplay installed. Nothing to do.

Post Install

If installed correctly, you should be able to:

>>> import wordplay
>>> wordplay.__file__
'/path/to/wordplay/src/wordplay/__init__.py'
%load_ext autoreload
%autoreload 2
import os
import sys
import ezpz

os.environ['COLORTERM'] = 'truecolor'
if sys.platform == 'darwin':
    # If running on MacOS:
    # os.environ['PYTORCH_ENABLE_MPS_FALLBACK'] = '1'
    os.environ['TORCH_DEVICE'] = 'cpu'
# -----------------------------------------------

logger = ezpz.get_logger()

import wordplay
logger.info(wordplay.__file__)
[2025-07-23 17:07:07,066155][I][ezpz/__init__:265:ezpz] Setting logging level to 'INFO' on 'RANK == 0'
[2025-07-23 17:07:07,072771][I][ezpz/__init__:266:ezpz] Setting logging level to 'CRITICAL' on all others 'RANK != 0'
[2025-07-23 17:07:07,079375][I][tmp/ipython-input-2-2338663768:17:ezpz.log] /content/wordplay/src/wordplay/__init__.py

Build Trainer

Explicitly, we:

  1. setup_torch(...)
  2. Build cfg: DictConfig = get_config(...)
  3. Instnatiate config: ExperimentConfig = instantiate(cfg)
  4. Build trainer = Trainer(config)
import wordplay
print(wordplay.__file__)
/content/wordplay/src/wordplay/__init__.py
import os
import numpy as np
from ezpz import setup
from hydra.utils import instantiate
from wordplay.configs import get_config, PROJECT_ROOT
from wordplay.trainer import Trainer

HF_DATASETS_CACHE = PROJECT_ROOT.joinpath('.cache', 'huggingface')
HF_DATASETS_CACHE.mkdir(exist_ok=True, parents=True)

os.environ['HF_DATASETS_CACHE'] = HF_DATASETS_CACHE.as_posix()

BACKEND = 'DDP'

rank = setup(
    framework='pytorch',
    backend=BACKEND,
    seed=1234,
)

cfg = get_config(
    [
        'data=shakespeare',
        'model=shakespeare',
        'model.batch_size=8',
        'model.block_size=1024',
        'optimizer=shakespeare',
        'train=shakespeare',
        f'train.backend={BACKEND}',
        'train.compile=false',
        'train.dtype=bfloat16',
        'train.max_iters=1000',
        'train.log_interval=10',
        'train.eval_interval=100',
    ]
)
config = instantiate(cfg)
[2025-07-23 17:07:07,409437][I][wordplay/configs:81] Setting HF_DATASETS_CACHE to /content/wordplay/.cache/huggingface/datasets
[2025-07-23 17:07:07,435593][I][ezpz/dist:1159] Using fw='ddp' with torch_{device,backend}= {cuda, nccl}
[2025-07-23 17:07:07,438150][I][ezpz/dist:1026] Caught MASTER_PORT=41765 from environment!
[2025-07-23 17:07:07,440989][I][ezpz/dist:1042] Using torch.distributed.init_process_group with
- master_addr='588b3fb1cb70'
- master_port='41765'
- world_size=1
- rank=0
- local_rank=0
- timeout=datetime.timedelta(seconds=3600)
- backend='nccl'
[2025-07-23 17:07:07,447590][I][ezpz/dist:759] Calling torch.distributed.init_process_group_with: rank=0 world_size=1 backend=nccl
[2025-07-23 17:07:07,462711][I][ezpz/dist:1377] Using device='cuda' with backend='nccl' + 'nccl' for distributed training.
[2025-07-23 17:07:07,465933][I][ezpz/dist:1422] ['588b3fb1cb70'][0/0] 
[2025-07-23 17:07:08,215788][I][wordplay/configs:317] Loading val from /content/wordplay/data/shakespeare_char/val.bin
[2025-07-23 17:07:08,221368][I][wordplay/configs:317] Loading train from /content/wordplay/data/shakespeare_char/train.bin
[2025-07-23 17:07:08,226696][I][wordplay/configs:442] Tokens per iteration: 8,192
[2025-07-23 17:07:08,231221][I][wordplay/configs:465] Using self.ptdtype=torch.bfloat16 on self.device_type='cuda'
[2025-07-23 17:07:08,234866][I][wordplay/configs:471] Initializing a new model from scratch

Build Trainer object

trainer = Trainer(config)
[2025-07-23 17:07:08,315621][I][wordplay/trainer:248] Initializing a new model from scratch
[2025-07-23 17:07:08,654618][I][wordplay/model:255] number of parameters: 10.65M
[2025-07-23 17:07:08,675995][I][wordplay/trainer:266] Model size: num_params=10646784
[2025-07-23 17:07:08,686453][I][wordplay/model:445] num decayed parameter tensors: 26, with 11,035,008 parameters
[2025-07-23 17:07:08,690282][I][wordplay/model:449] num non-decayed parameter tensors: 13, with 4,992 parameters
[2025-07-23 17:07:08,696244][I][wordplay/model:465] using fused AdamW: True
[2025-07-23 17:07:08,699647][C][wordplay/trainer:318] "devid='cuda:0'"
[2025-07-23 17:07:08,703940][I][wordplay/trainer:358] • self.model=GPT(
  (transformer): ModuleDict(
    (wte): Embedding(65, 384)
    (wpe): Embedding(1024, 384)
    (drop): Dropout(p=0.2, inplace=False)
    (h): ModuleList(
      (0-5): 6 x Block(
        (ln_1): LayerNorm()
        (attn): CausalSelfAttention(
          (c_attn): Linear(in_features=384, out_features=1152, bias=False)
          (c_proj): Linear(in_features=384, out_features=384, bias=False)
          (attn_dropout): Dropout(p=0.2, inplace=False)
          (resid_dropout): Dropout(p=0.2, inplace=False)
        )
        (ln_2): LayerNorm()
        (mlp): MLP(
          (c_fc): Linear(in_features=384, out_features=1536, bias=False)
          (act_fn): GELU(approximate='none')
          (c_proj): Linear(in_features=1536, out_features=384, bias=False)
          (dropout): Dropout(p=0.2, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm()
  )
  (lm_head): Linear(in_features=384, out_features=65, bias=False)
)
[2025-07-23 17:07:08,731597][I][wordplay/trainer:359] • self.grad_scaler=<torch.cuda.amp.grad_scaler.GradScaler object at 0x7cbd3c9a85d0>
[2025-07-23 17:07:08,737375][I][wordplay/trainer:360] • self.model_engine=GPT(
  (transformer): ModuleDict(
    (wte): Embedding(65, 384)
    (wpe): Embedding(1024, 384)
    (drop): Dropout(p=0.2, inplace=False)
    (h): ModuleList(
      (0-5): 6 x Block(
        (ln_1): LayerNorm()
        (attn): CausalSelfAttention(
          (c_attn): Linear(in_features=384, out_features=1152, bias=False)
          (c_proj): Linear(in_features=384, out_features=384, bias=False)
          (attn_dropout): Dropout(p=0.2, inplace=False)
          (resid_dropout): Dropout(p=0.2, inplace=False)
        )
        (ln_2): LayerNorm()
        (mlp): MLP(
          (c_fc): Linear(in_features=384, out_features=1536, bias=False)
          (act_fn): GELU(approximate='none')
          (c_proj): Linear(in_features=1536, out_features=384, bias=False)
          (dropout): Dropout(p=0.2, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm()
  )
  (lm_head): Linear(in_features=384, out_features=65, bias=False)
)
[2025-07-23 17:07:08,760469][I][wordplay/trainer:361] • self.optimizer=AdamW (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.99)
    capturable: False
    differentiable: False
    eps: 1e-08
    foreach: None
    fused: True
    lr: 0.001
    maximize: False
    weight_decay: 0.1

Parameter Group 1
    amsgrad: False
    betas: (0.9, 0.99)
    capturable: False
    differentiable: False
    eps: 1e-08
    foreach: None
    fused: True
    lr: 0.001
    maximize: False
    weight_decay: 0.0
)

Prompt (prior to training)

query = "What is an LLM?"
outputs = trainer.evaluate(
    query,
    num_samples=1,
    max_new_tokens=256,
    top_k=16,
    display=False
)
logger.info(f"['prompt']: '{query}'")
logger.info("['response']:\n\n" + fr"{outputs['0']['raw']}")
[2025-07-23 17:07:10,765047][I][tmp/ipython-input-6-3496000222:9:ezpz.log] ['prompt']: 'What is an LLM?'
[2025-07-23 17:07:10,767795][I][tmp/ipython-input-6-3496000222:10:ezpz.log] ['response']:

What is an LLM?ouuu'fU?UUUU-LLlVmoYY;?U$IMwwYDjMYYXSSdIss;I''DPOjHhooooMZtmkoGXjZ
BDDddZkydVPcM'MAWILMDDP'''!A'Vzl;R
dtA$ttoXttJJffobJJ;b-vkwwJJOHHwQFccddlobAGGnM'''$kW;kzZlSwZkAoR;wmooo$J-fffoYDd'UBooXYB;JSf?P'MJ..t'hPffID;R.XXo'''SPZkXXXe'VS.JoMdkXSffo''RHQklK''UUUSoMn

Train Model

name description
step Current training step
loss Loss value
dt Time per step (in ms)
sps Samples per second
mtps (million) Tokens per sec
mfu Model Flops utilization[1]

^legend: #tbl-legend

[1] in units of A100 bfloat16 peak FLOPS

trainer.config.device_type
'cuda'
from rich import print

print(trainer.model)
GPT(
  (transformer): ModuleDict(
    (wte): Embedding(65, 384)
    (wpe): Embedding(1024, 384)
    (drop): Dropout(p=0.2, inplace=False)
    (h): ModuleList(
      (0-5): 6 x Block(
        (ln_1): LayerNorm()
        (attn): CausalSelfAttention(
          (c_attn): Linear(in_features=384, out_features=1152, bias=False)
          (c_proj): Linear(in_features=384, out_features=384, bias=False)
          (attn_dropout): Dropout(p=0.2, inplace=False)
          (resid_dropout): Dropout(p=0.2, inplace=False)
        )
        (ln_2): LayerNorm()
        (mlp): MLP(
          (c_fc): Linear(in_features=384, out_features=1536, bias=False)
          (act_fn): GELU(approximate='none')
          (c_proj): Linear(in_features=1536, out_features=384, bias=False)
          (dropout): Dropout(p=0.2, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm()
  )
  (lm_head): Linear(in_features=384, out_features=65, bias=False)
)

(partial) Training:

We’ll first train for 500 iterations and then evaluate the models performance on the same prompt:

What is an LLM?

trainer.train(train_iters=500)
                Training Legend                 
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃        abbr  desc                           ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│        step │ Current training iteration     │
│        loss │ Loss value                     │
│          dt │ Elapsed time per training step │
│         dtf │ Elapsed time per forward step  │
│         dtb │ Elapsed time per backward step │
│         sps │ Samples per second             │
│ sps_per_gpu │ Samples per second (per GPU)   │
│         tps │ Tokens per second              │
│ tps_per_gpu │ Tokens per second (per GPU)    │
│         mfu │ Model flops utilization        │
└─────────────┴────────────────────────────────┘
[2025-07-23 17:07:12,567707][I][wordplay/trainer:827] ['prompt']: 'What is an LLM?'
[2025-07-23 17:07:12,572514][I][wordplay/trainer:831] ['response']:

What is an LLM?ZIoZo-om';-'MAhB,RcOVP!JJhhkkJnnUzI''&D&jH!ddWJJhfUUVkRhZoZ:MoJRtDjkkhhdMM'Sdd-dbUoXXLSfyXXXRb3ZOS''$!o&&jnVJ3MMkjJ'Mffe-cm..J3Oa;'$hooJ3z!jUSDn
'DqBJtHH;!ozZIZokzoooYlMKLJm.DDmkkXRX'NnhMSccJsH;Ude.tRzDoUtm'JmCd;Jd&j'Qo&'$$DAJTPPVv&j'jjtmmtdls;wNNoooJ3$DDJ
[2025-07-23 17:08:14,213943][I][wordplay/trainer:894] step=10 loss=3.28901 dt=0.388647 dtf=0.0077605 dtb=0.0102481 sps=2.57303 sps_per_gpu=2.57303 tps=21078.3 tps_per_gpu=21078.3 mfu=0.622837
[2025-07-23 17:08:18,050755][I][wordplay/trainer:894] step=20 loss=2.82665 dt=0.392386 dtf=0.0123749 dtb=0.0163346 sps=2.54851 sps_per_gpu=2.54851 tps=20877.4 tps_per_gpu=20877.4 mfu=0.622244
[2025-07-23 17:08:21,869708][I][wordplay/trainer:894] step=30 loss=2.64874 dt=0.379033 dtf=0.00770909 dtb=0.0103789 sps=2.6383 sps_per_gpu=2.6383 tps=21612.9 tps_per_gpu=21612.9 mfu=0.623883
[2025-07-23 17:08:25,681515][I][wordplay/trainer:894] step=40 loss=2.58119 dt=0.375823 dtf=0.00982569 dtb=0.0116637 sps=2.66083 sps_per_gpu=2.66083 tps=21797.5 tps_per_gpu=21797.5 mfu=0.625904
[2025-07-23 17:08:29,489842][I][wordplay/trainer:894] step=50 loss=2.5564 dt=0.381329 dtf=0.00818184 dtb=0.0101487 sps=2.6224 sps_per_gpu=2.6224 tps=21482.7 tps_per_gpu=21482.7 mfu=0.626792
[2025-07-23 17:08:33,295135][I][wordplay/trainer:894] step=60 loss=2.55377 dt=0.37768 dtf=0.00809329 dtb=0.00990252 sps=2.64775 sps_per_gpu=2.64775 tps=21690.3 tps_per_gpu=21690.3 mfu=0.628205
[2025-07-23 17:08:37,094848][I][wordplay/trainer:894] step=70 loss=2.53792 dt=0.37185 dtf=0.00804143 dtb=0.010255 sps=2.68926 sps_per_gpu=2.68926 tps=22030.4 tps_per_gpu=22030.4 mfu=0.630482
[2025-07-23 17:08:40,894946][I][wordplay/trainer:894] step=80 loss=2.56441 dt=0.380709 dtf=0.00861202 dtb=0.0100984 sps=2.62668 sps_per_gpu=2.62668 tps=21517.8 tps_per_gpu=21517.8 mfu=0.631016
[2025-07-23 17:08:44,697477][I][wordplay/trainer:894] step=90 loss=2.5338 dt=0.368932 dtf=0.00809296 dtb=0.00962644 sps=2.71053 sps_per_gpu=2.71053 tps=22204.6 tps_per_gpu=22204.6 mfu=0.633527
[2025-07-23 17:08:48,500289][I][wordplay/trainer:894] step=100 loss=2.53127 dt=0.376976 dtf=0.00801782 dtb=0.0100192 sps=2.65269 sps_per_gpu=2.65269 tps=21730.8 tps_per_gpu=21730.8 mfu=0.634386
[2025-07-23 17:08:49,332883][I][wordplay/trainer:827] ['prompt']: 'What is an LLM?'
[2025-07-23 17:08:49,334601][I][wordplay/trainer:831] ['response']:

What is an LLM?
AREThe he anghangatr ho misen fave by the t fe wh w onk pe wns w s did s fithe s.

CHather s, t be angenont ofous sts se mathan se.


An s tr be the acice pllll is s anontharanonte as wakar s sthe toore sthe towar thag, tin toullon llly my makndheacove t 
[2025-07-23 17:09:47,060965][I][wordplay/trainer:762] Saving checkpoint to: /content
[2025-07-23 17:09:47,063008][I][wordplay/trainer:763] Saving model to: /content/model.pth
[2025-07-23 17:09:47,414828][I][wordplay/configs:141] Appending /content to /content/wordplay/src/ckpts/checkpoints.log
[2025-07-23 17:09:51,240684][I][wordplay/trainer:894] step=110 loss=2.50749 dt=0.380784 dtf=0.00766138 dtb=0.0102359 sps=2.62616 sps_per_gpu=2.62616 tps=21513.5 tps_per_gpu=21513.5 mfu=0.634517
[2025-07-23 17:09:55,063291][I][wordplay/trainer:894] step=120 loss=2.5274 dt=0.379459 dtf=0.00809937 dtb=0.010612 sps=2.63533 sps_per_gpu=2.63533 tps=21588.7 tps_per_gpu=21588.7 mfu=0.634857
[2025-07-23 17:09:58,886616][I][wordplay/trainer:894] step=130 loss=2.54362 dt=0.380395 dtf=0.00779761 dtb=0.00998153 sps=2.62885 sps_per_gpu=2.62885 tps=21535.5 tps_per_gpu=21535.5 mfu=0.635006
[2025-07-23 17:10:02,708605][I][wordplay/trainer:894] step=140 loss=2.50172 dt=0.381295 dtf=0.00778436 dtb=0.0100367 sps=2.62264 sps_per_gpu=2.62264 tps=21484.7 tps_per_gpu=21484.7 mfu=0.63499
[2025-07-23 17:10:06,528915][I][wordplay/trainer:894] step=150 loss=2.50335 dt=0.373231 dtf=0.0079468 dtb=0.0108304 sps=2.67931 sps_per_gpu=2.67931 tps=21948.9 tps_per_gpu=21948.9 mfu=0.636348
[2025-07-23 17:10:10,344712][I][wordplay/trainer:894] step=160 loss=2.48674 dt=0.372652 dtf=0.0117069 dtb=0.0104974 sps=2.68347 sps_per_gpu=2.68347 tps=21983 tps_per_gpu=21983 mfu=0.63767
[2025-07-23 17:10:14,168118][I][wordplay/trainer:894] step=170 loss=2.47736 dt=0.380656 dtf=0.00807191 dtb=0.0106655 sps=2.62705 sps_per_gpu=2.62705 tps=21520.8 tps_per_gpu=21520.8 mfu=0.637494
[2025-07-23 17:10:17,988492][I][wordplay/trainer:894] step=180 loss=2.46811 dt=0.380603 dtf=0.0078251 dtb=0.0103172 sps=2.62741 sps_per_gpu=2.62741 tps=21523.8 tps_per_gpu=21523.8 mfu=0.637345
[2025-07-23 17:10:21,810169][I][wordplay/trainer:894] step=190 loss=2.45376 dt=0.381434 dtf=0.013805 dtb=0.0137897 sps=2.62169 sps_per_gpu=2.62169 tps=21476.9 tps_per_gpu=21476.9 mfu=0.637072
[2025-07-23 17:10:25,634107][I][wordplay/trainer:894] step=200 loss=2.47938 dt=0.383512 dtf=0.00936293 dtb=0.0101239 sps=2.60748 sps_per_gpu=2.60748 tps=21360.5 tps_per_gpu=21360.5 mfu=0.636483
[2025-07-23 17:10:26,457547][I][wordplay/trainer:827] ['prompt']: 'What is an LLM?'
[2025-07-23 17:10:26,459401][I][wordplay/trainer:831] ['response']:

What is an LLM?
HLUS:
LII hethin.
TE: hast seatisurindo wiretyo benin tige, manens, br athetir hyors, blireriarond te me and, f llfes thes thor ists a m thives me windou,



HA oulince s muce oll sse s avelo the rurd p as aver themes l neas:
Heratho w ts the o w. thane r
[2025-07-23 17:11:24,182085][I][wordplay/trainer:762] Saving checkpoint to: /content
[2025-07-23 17:11:24,184071][I][wordplay/trainer:763] Saving model to: /content/model.pth
[2025-07-23 17:11:24,514195][I][wordplay/configs:141] Appending /content to /content/wordplay/src/ckpts/checkpoints.log
[2025-07-23 17:11:28,333182][I][wordplay/trainer:894] step=210 loss=2.45724 dt=0.380321 dtf=0.00789146 dtb=0.00988756 sps=2.62936 sps_per_gpu=2.62936 tps=21539.7 tps_per_gpu=21539.7 mfu=0.636482
[2025-07-23 17:11:32,159664][I][wordplay/trainer:894] step=220 loss=2.48242 dt=0.383149 dtf=0.00807603 dtb=0.0101043 sps=2.60995 sps_per_gpu=2.60995 tps=21380.7 tps_per_gpu=21380.7 mfu=0.636011
[2025-07-23 17:11:35,989095][I][wordplay/trainer:894] step=230 loss=2.48992 dt=0.381508 dtf=0.00775943 dtb=0.00976974 sps=2.62117 sps_per_gpu=2.62117 tps=21472.7 tps_per_gpu=21472.7 mfu=0.635859
[2025-07-23 17:11:39,818287][I][wordplay/trainer:894] step=240 loss=2.45306 dt=0.382383 dtf=0.00783342 dtb=0.0103981 sps=2.61518 sps_per_gpu=2.61518 tps=21423.5 tps_per_gpu=21423.5 mfu=0.635577
[2025-07-23 17:11:43,651793][I][wordplay/trainer:894] step=250 loss=2.48512 dt=0.381244 dtf=0.00790653 dtb=0.00995927 sps=2.623 sps_per_gpu=2.623 tps=21487.6 tps_per_gpu=21487.6 mfu=0.635512
[2025-07-23 17:11:47,488905][I][wordplay/trainer:894] step=260 loss=2.45921 dt=0.375016 dtf=0.0110469 dtb=0.0137554 sps=2.66655 sps_per_gpu=2.66655 tps=21844.4 tps_per_gpu=21844.4 mfu=0.636509
[2025-07-23 17:11:51,323856][I][wordplay/trainer:894] step=270 loss=2.46985 dt=0.38433 dtf=0.00785675 dtb=0.0111291 sps=2.60193 sps_per_gpu=2.60193 tps=21315 tps_per_gpu=21315 mfu=0.635841
[2025-07-23 17:11:55,157805][I][wordplay/trainer:894] step=280 loss=2.47304 dt=0.38265 dtf=0.00785524 dtb=0.010542 sps=2.61336 sps_per_gpu=2.61336 tps=21408.6 tps_per_gpu=21408.6 mfu=0.635517
[2025-07-23 17:11:58,985311][I][wordplay/trainer:894] step=290 loss=2.4519 dt=0.38073 dtf=0.0100743 dtb=0.0128665 sps=2.62653 sps_per_gpu=2.62653 tps=21516.5 tps_per_gpu=21516.5 mfu=0.635544
[2025-07-23 17:12:02,814627][I][wordplay/trainer:894] step=300 loss=2.44979 dt=0.383147 dtf=0.00804455 dtb=0.0103887 sps=2.60996 sps_per_gpu=2.60996 tps=21380.8 tps_per_gpu=21380.8 mfu=0.635167
[2025-07-23 17:12:03,628924][I][wordplay/trainer:827] ['prompt']: 'What is an LLM?'
[2025-07-23 17:12:03,630654][I][wordplay/trainer:831] ['response']:

What is an LLM? muroursee aril icalis

We lal pl mal.
CIO:

LESTerthe coprideve, y wingrenget mir bue powin ithe an w
AN:
INI heshas be, intaly ws avevethay aiourofourthelin wous ans ay ber IUS:
Wh f y have s n t.
IOLONThaventer the t at tho, I win thounepancke and find 
[2025-07-23 17:13:01,480227][I][wordplay/trainer:762] Saving checkpoint to: /content
[2025-07-23 17:13:01,482159][I][wordplay/trainer:763] Saving model to: /content/model.pth
[2025-07-23 17:13:01,816991][I][wordplay/configs:141] Appending /content to /content/wordplay/src/ckpts/checkpoints.log
[2025-07-23 17:13:05,641415][I][wordplay/trainer:894] step=310 loss=2.45647 dt=0.383054 dtf=0.00785093 dtb=0.00992947 sps=2.6106 sps_per_gpu=2.6106 tps=21386 tps_per_gpu=21386 mfu=0.634844
[2025-07-23 17:13:09,467371][I][wordplay/trainer:894] step=320 loss=2.45905 dt=0.382875 dtf=0.0081 dtb=0.010746 sps=2.61182 sps_per_gpu=2.61182 tps=21396 tps_per_gpu=21396 mfu=0.634582
[2025-07-23 17:13:13,297667][I][wordplay/trainer:894] step=330 loss=2.4555 dt=0.38572 dtf=0.0108775 dtb=0.0128777 sps=2.59256 sps_per_gpu=2.59256 tps=21238.2 tps_per_gpu=21238.2 mfu=0.63388
[2025-07-23 17:13:17,131895][I][wordplay/trainer:894] step=340 loss=2.4634 dt=0.384959 dtf=0.00957926 dtb=0.010189 sps=2.59768 sps_per_gpu=2.59768 tps=21280.2 tps_per_gpu=21280.2 mfu=0.633373
[2025-07-23 17:13:20,957109][I][wordplay/trainer:894] step=350 loss=2.49212 dt=0.38072 dtf=0.00796532 dtb=0.0103618 sps=2.6266 sps_per_gpu=2.6266 tps=21517.1 tps_per_gpu=21517.1 mfu=0.633616
[2025-07-23 17:13:24,791303][I][wordplay/trainer:894] step=360 loss=2.42521 dt=0.380351 dtf=0.00941999 dtb=0.0131558 sps=2.62915 sps_per_gpu=2.62915 tps=21538 tps_per_gpu=21538 mfu=0.633897
[2025-07-23 17:13:28,625122][I][wordplay/trainer:894] step=370 loss=2.46779 dt=0.383116 dtf=0.00759078 dtb=0.0105659 sps=2.61017 sps_per_gpu=2.61017 tps=21382.5 tps_per_gpu=21382.5 mfu=0.63369
[2025-07-23 17:13:32,456066][I][wordplay/trainer:894] step=380 loss=2.46751 dt=0.384732 dtf=0.00849637 dtb=0.0100098 sps=2.59921 sps_per_gpu=2.59921 tps=21292.8 tps_per_gpu=21292.8 mfu=0.633238
[2025-07-23 17:13:36,284446][I][wordplay/trainer:894] step=390 loss=2.47132 dt=0.390981 dtf=0.0104592 dtb=0.0141359 sps=2.55767 sps_per_gpu=2.55767 tps=20952.4 tps_per_gpu=20952.4 mfu=0.631826
[2025-07-23 17:13:40,120231][I][wordplay/trainer:894] step=400 loss=2.50043 dt=0.382461 dtf=0.00788739 dtb=0.011582 sps=2.61465 sps_per_gpu=2.61465 tps=21419.2 tps_per_gpu=21419.2 mfu=0.631935
[2025-07-23 17:13:40,955053][I][wordplay/trainer:827] ['prompt']: 'What is an LLM?'
[2025-07-23 17:13:40,956742][I][wordplay/trainer:831] ['response']:

What is an LLM?
HUSUS:
Wingens thent ndd the se thof heare oupeed s te ase harot anes hant wisthe het clor m at t somy th br his s he, thanononoun heco he bong were asesonor t wearesp



NUS: th ber d, ay sh thout wo pavavond ay touch the hastrd omer hes ias may perengor
[2025-07-23 17:14:38,666483][I][wordplay/trainer:762] Saving checkpoint to: /content
[2025-07-23 17:14:38,673966][I][wordplay/trainer:763] Saving model to: /content/model.pth
[2025-07-23 17:14:39,050214][I][wordplay/configs:141] Appending /content to /content/wordplay/src/ckpts/checkpoints.log
[2025-07-23 17:14:42,870655][I][wordplay/trainer:894] step=410 loss=2.48579 dt=0.380458 dtf=0.00763788 dtb=0.00967759 sps=2.62841 sps_per_gpu=2.62841 tps=21531.9 tps_per_gpu=21531.9 mfu=0.632366
[2025-07-23 17:14:46,696620][I][wordplay/trainer:894] step=420 loss=2.44756 dt=0.389089 dtf=0.0143081 dtb=0.0108978 sps=2.57011 sps_per_gpu=2.57011 tps=21054.3 tps_per_gpu=21054.3 mfu=0.631342
[2025-07-23 17:14:50,528406][I][wordplay/trainer:894] step=430 loss=2.46498 dt=0.383404 dtf=0.0097532 dtb=0.0132017 sps=2.60821 sps_per_gpu=2.60821 tps=21366.5 tps_per_gpu=21366.5 mfu=0.631343
[2025-07-23 17:14:54,360775][I][wordplay/trainer:894] step=440 loss=2.46993 dt=0.384899 dtf=0.00866323 dtb=0.0128457 sps=2.59808 sps_per_gpu=2.59808 tps=21283.5 tps_per_gpu=21283.5 mfu=0.631099
[2025-07-23 17:14:58,197581][I][wordplay/trainer:894] step=450 loss=2.45371 dt=0.383754 dtf=0.00799181 dtb=0.0108706 sps=2.60584 sps_per_gpu=2.60584 tps=21347 tps_per_gpu=21347 mfu=0.631067
[2025-07-23 17:15:02,033033][I][wordplay/trainer:894] step=460 loss=2.43378 dt=0.379863 dtf=0.0110734 dtb=0.0147297 sps=2.63253 sps_per_gpu=2.63253 tps=21565.6 tps_per_gpu=21565.6 mfu=0.631684
[2025-07-23 17:15:05,868916][I][wordplay/trainer:894] step=470 loss=2.41934 dt=0.378727 dtf=0.00844342 dtb=0.0111405 sps=2.64043 sps_per_gpu=2.64043 tps=21630.4 tps_per_gpu=21630.4 mfu=0.632431
[2025-07-23 17:15:09,703796][I][wordplay/trainer:894] step=480 loss=2.45929 dt=0.382927 dtf=0.00844033 dtb=0.0114589 sps=2.61146 sps_per_gpu=2.61146 tps=21393.1 tps_per_gpu=21393.1 mfu=0.632402
[2025-07-23 17:15:13,538234][I][wordplay/trainer:894] step=490 loss=2.4835 dt=0.383195 dtf=0.0079397 dtb=0.0104966 sps=2.60964 sps_per_gpu=2.60964 tps=21378.1 tps_per_gpu=21378.1 mfu=0.632332
[2025-07-23 17:15:17,374316][I][wordplay/trainer:894] step=500 loss=2.43789 dt=0.382541 dtf=0.00727845 dtb=0.0100782 sps=2.6141 sps_per_gpu=2.6141 tps=21414.7 tps_per_gpu=21414.7 mfu=0.632376
import time

query = "What is an LLM?"
t0 = time.perf_counter()
outputs = trainer.evaluate(
    query,
    num_samples=1,
    max_new_tokens=256,
    top_k=16,
    display=False
)
logger.info(f'took: {time.perf_counter() - t0:.4f}s')
logger.info(f"['prompt']: '{query}'")
logger.info("['response']:\n\n" + fr"{outputs['0']['raw']}")
[2025-07-23 17:15:18,240721][I][tmp/ipython-input-10-1425179755:12:ezpz.log] took: 0.8133s
[2025-07-23 17:15:18,242822][I][tmp/ipython-input-10-1425179755:13:ezpz.log] ['prompt']: 'What is an LLM?'
[2025-07-23 17:15:18,245933][I][tmp/ipython-input-10-1425179755:14:ezpz.log] ['response']:

What is an LLM? burthilio s in o th twiser mbalilis ar sis alincore tt t mes mpresofo m whe hary ht ourighothast omy pomithe d?




Bu le wie IUTore ll ishath tes d fr irme nco s f maksere,
IAn he ise wicouss s, areatath meangre the, my hare wis pay toth laut athe s,
Ano

Resume Training…

trainer.train()
[2025-07-23 17:15:19,128023][I][wordplay/trainer:827] ['prompt']: 'What is an LLM?'
[2025-07-23 17:15:19,129812][I][wordplay/trainer:831] ['response']:

What is an LLM?

POOSTOLENETES:
INIONEO: oft ffan yo pe hous tor ce me s here serste buthe he ase he


NENIO:
Whe arallin hatithofoull the, fousencay yont paris.
PENTER:
An o, that s f lllle ishan be be acer se war tha pe iopre is ore nckat, me my?

WI tofifre he llly po
[2025-07-23 17:16:16,858986][I][wordplay/trainer:762] Saving checkpoint to: /content
[2025-07-23 17:16:16,861272][I][wordplay/trainer:763] Saving model to: /content/model.pth
[2025-07-23 17:16:17,190207][I][wordplay/configs:141] Appending /content to /content/wordplay/src/ckpts/checkpoints.log
[2025-07-23 17:16:21,016363][I][wordplay/trainer:894] step=510 loss=2.46165 dt=0.380989 dtf=0.00765307 dtb=0.0102919 sps=2.62475 sps_per_gpu=2.62475 tps=21502 tps_per_gpu=21502 mfu=0.635357
[2025-07-23 17:16:24,851610][I][wordplay/trainer:894] step=520 loss=2.44981 dt=0.383659 dtf=0.00791765 dtb=0.0103253 sps=2.60648 sps_per_gpu=2.60648 tps=21352.3 tps_per_gpu=21352.3 mfu=0.634915
[2025-07-23 17:16:28,687465][I][wordplay/trainer:894] step=530 loss=2.45632 dt=0.388874 dtf=0.01204 dtb=0.0159266 sps=2.57153 sps_per_gpu=2.57153 tps=21066 tps_per_gpu=21066 mfu=0.633671
[2025-07-23 17:16:32,526883][I][wordplay/trainer:894] step=540 loss=2.45869 dt=0.38549 dtf=0.00823117 dtb=0.0103854 sps=2.5941 sps_per_gpu=2.5941 tps=21250.9 tps_per_gpu=21250.9 mfu=0.633098
[2025-07-23 17:16:36,360809][I][wordplay/trainer:894] step=550 loss=2.44677 dt=0.385398 dtf=0.00789234 dtb=0.0121862 sps=2.59472 sps_per_gpu=2.59472 tps=21256 tps_per_gpu=21256 mfu=0.632597
[2025-07-23 17:16:40,195560][I][wordplay/trainer:894] step=560 loss=2.43464 dt=0.385434 dtf=0.0106042 dtb=0.0129227 sps=2.59448 sps_per_gpu=2.59448 tps=21254 tps_per_gpu=21254 mfu=0.63214
[2025-07-23 17:16:44,032374][I][wordplay/trainer:894] step=570 loss=2.45685 dt=0.382214 dtf=0.00815606 dtb=0.0103625 sps=2.61633 sps_per_gpu=2.61633 tps=21433 tps_per_gpu=21433 mfu=0.632258
[2025-07-23 17:16:47,866282][I][wordplay/trainer:894] step=580 loss=2.42042 dt=0.383891 dtf=0.00803656 dtb=0.010343 sps=2.60491 sps_per_gpu=2.60491 tps=21339.4 tps_per_gpu=21339.4 mfu=0.632087
[2025-07-23 17:16:51,705365][I][wordplay/trainer:894] step=590 loss=2.45867 dt=0.381508 dtf=0.0139744 dtb=0.0143725 sps=2.62118 sps_per_gpu=2.62118 tps=21472.7 tps_per_gpu=21472.7 mfu=0.632328
[2025-07-23 17:16:55,543539][I][wordplay/trainer:894] step=600 loss=2.42416 dt=0.391623 dtf=0.0130454 dtb=0.0146926 sps=2.55347 sps_per_gpu=2.55347 tps=20918.1 tps_per_gpu=20918.1 mfu=0.630905
[2025-07-23 17:16:56,372045][I][wordplay/trainer:827] ['prompt']: 'What is an LLM?'
[2025-07-23 17:16:56,373698][I][wordplay/trainer:831] ['response']:

What is an LLM?



KILINGSBRK:
Ye oinot ath lord nous cke, iat ckin and;
Yor te, wad caco aver h
Tow, tom harrds, wer ow coon nalilllll th m thol s s heree, an sus alleris malatetoung ty nd mimarssin myeayelof f my bungrentind's bee and oulodo oter hendin ndind at
Ifowar
[2025-07-23 17:17:54,112942][I][wordplay/trainer:762] Saving checkpoint to: /content
[2025-07-23 17:17:54,116645][I][wordplay/trainer:763] Saving model to: /content/model.pth
[2025-07-23 17:17:54,560498][I][wordplay/configs:141] Appending /content to /content/wordplay/src/ckpts/checkpoints.log
[2025-07-23 17:17:58,382321][I][wordplay/trainer:894] step=610 loss=2.40125 dt=0.380516 dtf=0.00768699 dtb=0.0103504 sps=2.62801 sps_per_gpu=2.62801 tps=21528.7 tps_per_gpu=21528.7 mfu=0.63143
[2025-07-23 17:18:02,217685][I][wordplay/trainer:894] step=620 loss=2.38897 dt=0.382149 dtf=0.00761661 dtb=0.00966352 sps=2.61678 sps_per_gpu=2.61678 tps=21436.7 tps_per_gpu=21436.7 mfu=0.631629
[2025-07-23 17:18:06,047977][I][wordplay/trainer:894] step=630 loss=2.38868 dt=0.378137 dtf=0.00969834 dtb=0.0128937 sps=2.64454 sps_per_gpu=2.64454 tps=21664.1 tps_per_gpu=21664.1 mfu=0.632481
[2025-07-23 17:18:09,883308][I][wordplay/trainer:894] step=640 loss=2.4127 dt=0.382373 dtf=0.00796208 dtb=0.0101229 sps=2.61525 sps_per_gpu=2.61525 tps=21424.1 tps_per_gpu=21424.1 mfu=0.632539
[2025-07-23 17:18:13,722090][I][wordplay/trainer:894] step=650 loss=2.41445 dt=0.385077 dtf=0.00783048 dtb=0.0110297 sps=2.59688 sps_per_gpu=2.59688 tps=21273.7 tps_per_gpu=21273.7 mfu=0.632146
[2025-07-23 17:18:17,557001][I][wordplay/trainer:894] step=660 loss=2.38916 dt=0.397191 dtf=0.0126378 dtb=0.0280523 sps=2.51768 sps_per_gpu=2.51768 tps=20624.8 tps_per_gpu=20624.8 mfu=0.629875
[2025-07-23 17:18:21,395377][I][wordplay/trainer:894] step=670 loss=2.40125 dt=0.37982 dtf=0.00799165 dtb=0.0102509 sps=2.63282 sps_per_gpu=2.63282 tps=21568.1 tps_per_gpu=21568.1 mfu=0.630619
[2025-07-23 17:18:25,229485][I][wordplay/trainer:894] step=680 loss=2.36815 dt=0.367467 dtf=0.00798743 dtb=0.0101859 sps=2.72133 sps_per_gpu=2.72133 tps=22293.2 tps_per_gpu=22293.2 mfu=0.633431
[2025-07-23 17:18:29,069577][I][wordplay/trainer:894] step=690 loss=2.40319 dt=0.379338 dtf=0.00789747 dtb=0.0107017 sps=2.63617 sps_per_gpu=2.63617 tps=21595.5 tps_per_gpu=21595.5 mfu=0.6339
[2025-07-23 17:18:32,902179][I][wordplay/trainer:894] step=700 loss=2.4019 dt=0.382542 dtf=0.00746426 dtb=0.0101071 sps=2.61409 sps_per_gpu=2.61409 tps=21414.6 tps_per_gpu=21414.6 mfu=0.633787
[2025-07-23 17:18:33,732336][I][wordplay/trainer:827] ['prompt']: 'What is an LLM?'
[2025-07-23 17:18:33,734156][I][wordplay/trainer:831] ['response']:

What is an LLM?

Thile than bat ton dor nong mur,
NO belll lit lop gereing ichth ts heas fopoo l s fowis the

Wofores pis wiceris chithith d concofabththesthis t me t t of sis meagoury.

ARO:
Whe my m bo ar f s yourel s f ther thindusolofe s m s le iserangofothin thesith
[2025-07-23 17:19:31,494673][I][wordplay/trainer:762] Saving checkpoint to: /content
[2025-07-23 17:19:31,499399][I][wordplay/trainer:763] Saving model to: /content/model.pth
[2025-07-23 17:19:31,956960][I][wordplay/configs:141] Appending /content to /content/wordplay/src/ckpts/checkpoints.log
[2025-07-23 17:19:35,780584][I][wordplay/trainer:894] step=710 loss=2.41346 dt=0.378216 dtf=0.00835439 dtb=0.0104115 sps=2.64399 sps_per_gpu=2.64399 tps=21659.6 tps_per_gpu=21659.6 mfu=0.63441
[2025-07-23 17:19:39,611784][I][wordplay/trainer:894] step=720 loss=2.39009 dt=0.383217 dtf=0.00772173 dtb=0.010444 sps=2.60949 sps_per_gpu=2.60949 tps=21376.9 tps_per_gpu=21376.9 mfu=0.634135
[2025-07-23 17:19:43,450301][I][wordplay/trainer:894] step=730 loss=2.38395 dt=0.38477 dtf=0.0103028 dtb=0.0132564 sps=2.59896 sps_per_gpu=2.59896 tps=21290.6 tps_per_gpu=21290.6 mfu=0.633633
[2025-07-23 17:19:47,286173][I][wordplay/trainer:894] step=740 loss=2.35507 dt=0.382978 dtf=0.00775962 dtb=0.00999175 sps=2.61112 sps_per_gpu=2.61112 tps=21390.3 tps_per_gpu=21390.3 mfu=0.633475
[2025-07-23 17:19:51,122311][I][wordplay/trainer:894] step=750 loss=2.34116 dt=0.385881 dtf=0.00818335 dtb=0.0122375 sps=2.59147 sps_per_gpu=2.59147 tps=21229.3 tps_per_gpu=21229.3 mfu=0.632858
[2025-07-23 17:19:54,958706][I][wordplay/trainer:894] step=760 loss=2.35229 dt=0.395003 dtf=0.0133316 dtb=0.0176366 sps=2.53163 sps_per_gpu=2.53163 tps=20739.1 tps_per_gpu=20739.1 mfu=0.630854
[2025-07-23 17:19:58,793260][I][wordplay/trainer:894] step=770 loss=2.34521 dt=0.381653 dtf=0.00799117 dtb=0.0100162 sps=2.62018 sps_per_gpu=2.62018 tps=21464.5 tps_per_gpu=21464.5 mfu=0.631194
[2025-07-23 17:20:02,627603][I][wordplay/trainer:894] step=780 loss=2.31829 dt=0.384113 dtf=0.00808119 dtb=0.0106393 sps=2.6034 sps_per_gpu=2.6034 tps=21327.1 tps_per_gpu=21327.1 mfu=0.631093
[2025-07-23 17:20:06,463581][I][wordplay/trainer:894] step=790 loss=2.31021 dt=0.383535 dtf=0.00812252 dtb=0.0103508 sps=2.60732 sps_per_gpu=2.60732 tps=21359.2 tps_per_gpu=21359.2 mfu=0.631098
[2025-07-23 17:20:10,293805][I][wordplay/trainer:894] step=800 loss=2.30534 dt=0.376394 dtf=0.00790557 dtb=0.0103412 sps=2.65679 sps_per_gpu=2.65679 tps=21764.4 tps_per_gpu=21764.4 mfu=0.632299
[2025-07-23 17:20:11,127431][I][wordplay/trainer:827] ['prompt']: 'What is an LLM?'
[2025-07-23 17:20:11,129261][I][wordplay/trainer:831] ['response']:

What is an LLM?

HESSTY OMy MONN:
The as a thestop skin cof or we or bines best busplo cothe.

FORCAMPHY:
ANaracapat there t cathe dyou toraron

And ndinis aca t t dis tir.


STRENIO:
No ano or, where my sint stthe bllos t ho sow the the,
Tise sigan t.

YCLES:
Matacou f 
[2025-07-23 17:21:08,807775][I][wordplay/trainer:762] Saving checkpoint to: /content
[2025-07-23 17:21:08,810267][I][wordplay/trainer:763] Saving model to: /content/model.pth
[2025-07-23 17:21:09,289617][I][wordplay/configs:141] Appending /content to /content/wordplay/src/ckpts/checkpoints.log
[2025-07-23 17:21:13,120320][I][wordplay/trainer:894] step=810 loss=2.31587 dt=0.381621 dtf=0.00781348 dtb=0.0120922 sps=2.6204 sps_per_gpu=2.6204 tps=21466.3 tps_per_gpu=21466.3 mfu=0.6325
[2025-07-23 17:21:16,950119][I][wordplay/trainer:894] step=820 loss=2.32552 dt=0.378177 dtf=0.00779952 dtb=0.0102359 sps=2.64426 sps_per_gpu=2.64426 tps=21661.8 tps_per_gpu=21661.8 mfu=0.633258
[2025-07-23 17:21:20,780635][I][wordplay/trainer:894] step=830 loss=2.27354 dt=0.387149 dtf=0.0106936 dtb=0.0140346 sps=2.58298 sps_per_gpu=2.58298 tps=21159.8 tps_per_gpu=21159.8 mfu=0.632457
[2025-07-23 17:21:24,610506][I][wordplay/trainer:894] step=840 loss=2.26241 dt=0.383837 dtf=0.00787966 dtb=0.0108706 sps=2.60527 sps_per_gpu=2.60527 tps=21342.4 tps_per_gpu=21342.4 mfu=0.632275
[2025-07-23 17:21:28,446417][I][wordplay/trainer:894] step=850 loss=2.26027 dt=0.383713 dtf=0.00800034 dtb=0.0100456 sps=2.60611 sps_per_gpu=2.60611 tps=21349.3 tps_per_gpu=21349.3 mfu=0.632132
[2025-07-23 17:21:32,273517][I][wordplay/trainer:894] step=860 loss=2.25673 dt=0.382741 dtf=0.0083715 dtb=0.0101342 sps=2.61273 sps_per_gpu=2.61273 tps=21403.5 tps_per_gpu=21403.5 mfu=0.632164
[2025-07-23 17:21:36,109224][I][wordplay/trainer:894] step=870 loss=2.21383 dt=0.381168 dtf=0.00781913 dtb=0.0098429 sps=2.62351 sps_per_gpu=2.62351 tps=21491.8 tps_per_gpu=21491.8 mfu=0.632453
[2025-07-23 17:21:39,941412][I][wordplay/trainer:894] step=880 loss=2.21413 dt=0.380526 dtf=0.00772047 dtb=0.00999847 sps=2.62794 sps_per_gpu=2.62794 tps=21528.1 tps_per_gpu=21528.1 mfu=0.632821
[2025-07-23 17:21:43,768114][I][wordplay/trainer:894] step=890 loss=2.21783 dt=0.370921 dtf=0.00774233 dtb=0.0108925 sps=2.69599 sps_per_gpu=2.69599 tps=22085.6 tps_per_gpu=22085.6 mfu=0.634799
[2025-07-23 17:21:47,604118][I][wordplay/trainer:894] step=900 loss=2.20972 dt=0.389311 dtf=0.0136295 dtb=0.0109 sps=2.56864 sps_per_gpu=2.56864 tps=21042.3 tps_per_gpu=21042.3 mfu=0.633497
[2025-07-23 17:21:48,462679][I][wordplay/trainer:827] ['prompt']: 'What is an LLM?'
[2025-07-23 17:21:48,464365][I][wordplay/trainer:831] ['response']:

What is an LLM?

DURENCK:
Me so my nou, hou ward thes ler noms he he,
Oxt my the my de is by beperd.

HARY ORK:
Whe tho su win th ars at herd pedis.

KING RICHARD II:
That we sco arre,
Thade so frener sheran may or tot tremedonght oness.
GLUCER:
He le inest soul mok, son
[2025-07-23 17:22:46,157466][I][wordplay/trainer:762] Saving checkpoint to: /content
[2025-07-23 17:22:46,159878][I][wordplay/trainer:763] Saving model to: /content/model.pth
[2025-07-23 17:22:46,625786][I][wordplay/configs:141] Appending /content to /content/wordplay/src/ckpts/checkpoints.log
[2025-07-23 17:22:50,449927][I][wordplay/trainer:894] step=910 loss=2.17491 dt=0.379023 dtf=0.00754135 dtb=0.0100832 sps=2.63836 sps_per_gpu=2.63836 tps=21613.5 tps_per_gpu=21613.5 mfu=0.634012
[2025-07-23 17:22:54,284268][I][wordplay/trainer:894] step=920 loss=2.1536 dt=0.383239 dtf=0.0075398 dtb=0.00996356 sps=2.60934 sps_per_gpu=2.60934 tps=21375.7 tps_per_gpu=21375.7 mfu=0.633773
[2025-07-23 17:22:58,116915][I][wordplay/trainer:894] step=930 loss=2.15065 dt=0.381936 dtf=0.00785014 dtb=0.0114434 sps=2.61824 sps_per_gpu=2.61824 tps=21448.6 tps_per_gpu=21448.6 mfu=0.633774
[2025-07-23 17:23:01,953658][I][wordplay/trainer:894] step=940 loss=2.12782 dt=0.38311 dtf=0.00824185 dtb=0.0105607 sps=2.61022 sps_per_gpu=2.61022 tps=21382.9 tps_per_gpu=21382.9 mfu=0.633581
[2025-07-23 17:23:05,787479][I][wordplay/trainer:894] step=950 loss=2.18616 dt=0.38379 dtf=0.00788715 dtb=0.0103477 sps=2.60559 sps_per_gpu=2.60559 tps=21345 tps_per_gpu=21345 mfu=0.633295
[2025-07-23 17:23:09,621436][I][wordplay/trainer:894] step=960 loss=2.11422 dt=0.384061 dtf=0.00771515 dtb=0.00979936 sps=2.60376 sps_per_gpu=2.60376 tps=21330 tps_per_gpu=21330 mfu=0.632993
[2025-07-23 17:23:13,455949][I][wordplay/trainer:894] step=970 loss=2.05699 dt=0.383695 dtf=0.00807108 dtb=0.0107169 sps=2.60624 sps_per_gpu=2.60624 tps=21350.3 tps_per_gpu=21350.3 mfu=0.632781
[2025-07-23 17:23:17,284032][I][wordplay/trainer:894] step=980 loss=2.15509 dt=0.376189 dtf=0.00803431 dtb=0.0109163 sps=2.65824 sps_per_gpu=2.65824 tps=21776.3 tps_per_gpu=21776.3 mfu=0.633849
[2025-07-23 17:23:21,114368][I][wordplay/trainer:894] step=990 loss=2.1031 dt=0.393959 dtf=0.0123796 dtb=0.0165355 sps=2.53833 sps_per_gpu=2.53833 tps=20794 tps_per_gpu=20794 mfu=0.631908
[2025-07-23 17:23:24,949152][I][wordplay/trainer:894] step=1000 loss=2.05209 dt=0.371632 dtf=0.00834119 dtb=0.0110242 sps=2.69083 sps_per_gpu=2.69083 tps=22043.3 tps_per_gpu=22043.3 mfu=0.633853
[2025-07-23 17:23:25,760378][I][wordplay/trainer:827] ['prompt']: 'What is an LLM?'
[2025-07-23 17:23:25,762149][I][wordplay/trainer:831] ['response']:

What is an LLM?


WAMILLY:
And I tucke thimbok have doorcent mone,
Wavert mus of me the han hat the deant.
DEORK:
Far thall is coors sited not de ind,
But theat to ad coftitest fort sthengers,
They my thous sor was to yourte mee.
TARK:
I leer, men you, wit the by the the
[2025-07-23 17:24:23,438968][I][wordplay/trainer:762] Saving checkpoint to: /content
[2025-07-23 17:24:23,441364][I][wordplay/trainer:763] Saving model to: /content/model.pth
[2025-07-23 17:24:23,918157][I][wordplay/configs:141] Appending /content to /content/wordplay/src/ckpts/checkpoints.log
[2025-07-23 17:24:27,738089][I][wordplay/trainer:894] step=1010 loss=2.07161 dt=0.367348 dtf=0.00773713 dtb=0.0102199 sps=2.72222 sps_per_gpu=2.72222 tps=22300.4 tps_per_gpu=22300.4 mfu=0.636362
[2025-07-23 17:24:31,569441][I][wordplay/trainer:894] step=1020 loss=2.04552 dt=0.383316 dtf=0.00751949 dtb=0.01008 sps=2.60881 sps_per_gpu=2.60881 tps=21371.4 tps_per_gpu=21371.4 mfu=0.635876
[2025-07-23 17:24:35,396411][I][wordplay/trainer:894] step=1030 loss=2.03231 dt=0.384257 dtf=0.00816572 dtb=0.0102516 sps=2.60243 sps_per_gpu=2.60243 tps=21319.1 tps_per_gpu=21319.1 mfu=0.635284
[2025-07-23 17:24:39,228505][I][wordplay/trainer:894] step=1040 loss=2.05762 dt=0.383646 dtf=0.00790242 dtb=0.00997257 sps=2.60657 sps_per_gpu=2.60657 tps=21353 tps_per_gpu=21353 mfu=0.634851
[2025-07-23 17:24:43,061324][I][wordplay/trainer:894] step=1050 loss=2.03493 dt=0.378067 dtf=0.00783631 dtb=0.00984342 sps=2.64504 sps_per_gpu=2.64504 tps=21668.1 tps_per_gpu=21668.1 mfu=0.635392
[2025-07-23 17:24:46,898059][I][wordplay/trainer:894] step=1060 loss=1.99328 dt=0.383855 dtf=0.00812065 dtb=0.0102131 sps=2.60515 sps_per_gpu=2.60515 tps=21341.4 tps_per_gpu=21341.4 mfu=0.634914
[2025-07-23 17:24:50,734315][I][wordplay/trainer:894] step=1070 loss=2.02538 dt=0.38352 dtf=0.00975553 dtb=0.00995462 sps=2.60743 sps_per_gpu=2.60743 tps=21360.1 tps_per_gpu=21360.1 mfu=0.634539
[2025-07-23 17:24:54,571713][I][wordplay/trainer:894] step=1080 loss=1.98803 dt=0.383255 dtf=0.00790832 dtb=0.0101534 sps=2.60923 sps_per_gpu=2.60923 tps=21374.8 tps_per_gpu=21374.8 mfu=0.634245
[2025-07-23 17:24:58,396586][I][wordplay/trainer:894] step=1090 loss=2.05368 dt=0.379503 dtf=0.00809327 dtb=0.0106979 sps=2.63503 sps_per_gpu=2.63503 tps=21586.1 tps_per_gpu=21586.1 mfu=0.634605
[2025-07-23 17:25:02,230324][I][wordplay/trainer:894] step=1100 loss=1.99345 dt=0.386284 dtf=0.0115638 dtb=0.0162085 sps=2.58877 sps_per_gpu=2.58877 tps=21207.2 tps_per_gpu=21207.2 mfu=0.633809
[2025-07-23 17:25:03,086185][I][wordplay/trainer:827] ['prompt']: 'What is an LLM?'
[2025-07-23 17:25:03,088005][I][wordplay/trainer:831] ['response']:

What is an LLM? Godeel we ye the live courerd, mare you the sill:
This bent the do we shre yeat pert
So but yerter the him theely?

KING EDWARD IV:
Yis past whis to is witer gor miny,
To the corts a have could heret
This the the deears, so your cers tee a be.

CLESTER:
M
[2025-07-23 17:26:00,773544][I][wordplay/trainer:762] Saving checkpoint to: /content
[2025-07-23 17:26:00,776016][I][wordplay/trainer:763] Saving model to: /content/model.pth
[2025-07-23 17:26:01,284390][I][wordplay/configs:141] Appending /content to /content/wordplay/src/ckpts/checkpoints.log
[2025-07-23 17:26:05,114000][I][wordplay/trainer:894] step=1110 loss=1.95124 dt=0.377492 dtf=0.00786764 dtb=0.0100368 sps=2.64906 sps_per_gpu=2.64906 tps=21701.1 tps_per_gpu=21701.1 mfu=0.634553
[2025-07-23 17:26:08,948083][I][wordplay/trainer:894] step=1120 loss=1.98738 dt=0.381927 dtf=0.00748538 dtb=0.00989547 sps=2.6183 sps_per_gpu=2.6183 tps=21449.1 tps_per_gpu=21449.1 mfu=0.634477
[2025-07-23 17:26:12,776837][I][wordplay/trainer:894] step=1130 loss=1.89314 dt=0.374098 dtf=0.008244 dtb=0.0108253 sps=2.67309 sps_per_gpu=2.67309 tps=21898 tps_per_gpu=21898 mfu=0.635735
[2025-07-23 17:26:16,611706][I][wordplay/trainer:894] step=1140 loss=1.92855 dt=0.393585 dtf=0.0130297 dtb=0.014002 sps=2.54075 sps_per_gpu=2.54075 tps=20813.8 tps_per_gpu=20813.8 mfu=0.633664
[2025-07-23 17:26:20,447771][I][wordplay/trainer:894] step=1150 loss=1.83626 dt=0.384681 dtf=0.00807674 dtb=0.0107471 sps=2.59955 sps_per_gpu=2.59955 tps=21295.6 tps_per_gpu=21295.6 mfu=0.633223
[2025-07-23 17:26:24,282285][I][wordplay/trainer:894] step=1160 loss=1.90146 dt=0.383857 dtf=0.0082585 dtb=0.0104392 sps=2.60514 sps_per_gpu=2.60514 tps=21341.3 tps_per_gpu=21341.3 mfu=0.632962
[2025-07-23 17:26:28,116716][I][wordplay/trainer:894] step=1170 loss=1.88228 dt=0.382931 dtf=0.00886622 dtb=0.0120676 sps=2.61144 sps_per_gpu=2.61144 tps=21392.9 tps_per_gpu=21392.9 mfu=0.632879
[2025-07-23 17:26:31,951239][I][wordplay/trainer:894] step=1180 loss=1.88628 dt=0.381804 dtf=0.00750252 dtb=0.0100093 sps=2.61914 sps_per_gpu=2.61914 tps=21456 tps_per_gpu=21456 mfu=0.632991
[2025-07-23 17:26:35,788827][I][wordplay/trainer:894] step=1190 loss=1.91094 dt=0.383424 dtf=0.00741282 dtb=0.0106785 sps=2.60808 sps_per_gpu=2.60808 tps=21365.4 tps_per_gpu=21365.4 mfu=0.632824
[2025-07-23 17:26:39,625719][I][wordplay/trainer:894] step=1200 loss=1.90239 dt=0.388221 dtf=0.015037 dtb=0.01746 sps=2.57585 sps_per_gpu=2.57585 tps=21101.4 tps_per_gpu=21101.4 mfu=0.631894
[2025-07-23 17:26:40,480918][I][wordplay/trainer:827] ['prompt']: 'What is an LLM?'
[2025-07-23 17:26:40,482683][I][wordplay/trainer:831] ['response']:

What is an LLM?

LADY GLOUCESTES:
And when there to my liker of mady the:
It will the shall contre fature he
the day thengery'd one died me meanty:
Why, which ime dished your wind the oblod thus hemes,
I the conte the caition, fortuse whiches faings,
I her far will there
[2025-07-23 17:27:38,165528][I][wordplay/trainer:762] Saving checkpoint to: /content
[2025-07-23 17:27:38,167913][I][wordplay/trainer:763] Saving model to: /content/model.pth
[2025-07-23 17:27:38,675275][I][wordplay/configs:141] Appending /content to /content/wordplay/src/ckpts/checkpoints.log
[2025-07-23 17:27:42,506973][I][wordplay/trainer:894] step=1210 loss=1.83644 dt=0.379808 dtf=0.00813016 dtb=0.0106532 sps=2.63291 sps_per_gpu=2.63291 tps=21568.8 tps_per_gpu=21568.8 mfu=0.632438
[2025-07-23 17:27:46,341248][I][wordplay/trainer:894] step=1220 loss=1.85 dt=0.384616 dtf=0.00783149 dtb=0.0105751 sps=2.6  sps_per_gpu=2.6  tps=21299.2 tps_per_gpu=21299.2 mfu=0.63213
[2025-07-23 17:27:50,174132][I][wordplay/trainer:894] step=1230 loss=1.85794 dt=0.384468 dtf=0.00796023 dtb=0.0101979 sps=2.60099 sps_per_gpu=2.60099 tps=21307.3 tps_per_gpu=21307.3 mfu=0.631878
[2025-07-23 17:27:53,996352][I][wordplay/trainer:894] step=1240 loss=1.86443 dt=0.381407 dtf=0.00777995 dtb=0.00996514 sps=2.62187 sps_per_gpu=2.62187 tps=21478.3 tps_per_gpu=21478.3 mfu=0.632156
[2025-07-23 17:27:57,829111][I][wordplay/trainer:894] step=1250 loss=1.76382 dt=0.382476 dtf=0.00785835 dtb=0.0100383 sps=2.61454 sps_per_gpu=2.61454 tps=21418.3 tps_per_gpu=21418.3 mfu=0.632229
[2025-07-23 17:28:01,663291][I][wordplay/trainer:894] step=1260 loss=1.74205 dt=0.385531 dtf=0.00776372 dtb=0.0138436 sps=2.59382 sps_per_gpu=2.59382 tps=21248.6 tps_per_gpu=21248.6 mfu=0.631793
[2025-07-23 17:28:05,497559][I][wordplay/trainer:894] step=1270 loss=1.86381 dt=0.395746 dtf=0.0125432 dtb=0.0178912 sps=2.52688 sps_per_gpu=2.52688 tps=20700.2 tps_per_gpu=20700.2 mfu=0.62978
[2025-07-23 17:28:09,331924][I][wordplay/trainer:894] step=1280 loss=1.85107 dt=0.382921 dtf=0.0081101 dtb=0.00997405 sps=2.61151 sps_per_gpu=2.61151 tps=21393.5 tps_per_gpu=21393.5 mfu=0.630017
[2025-07-23 17:28:13,161160][I][wordplay/trainer:894] step=1290 loss=1.84071 dt=0.382439 dtf=0.00762057 dtb=0.0106278 sps=2.6148 sps_per_gpu=2.6148 tps=21420.4 tps_per_gpu=21420.4 mfu=0.630311
[2025-07-23 17:28:16,996729][I][wordplay/trainer:894] step=1300 loss=1.82688 dt=0.383368 dtf=0.0123784 dtb=0.0184451 sps=2.60846 sps_per_gpu=2.60846 tps=21368.5 tps_per_gpu=21368.5 mfu=0.630421
[2025-07-23 17:28:17,833682][I][wordplay/trainer:827] ['prompt']: 'What is an LLM?'
[2025-07-23 17:28:17,835402][I][wordplay/trainer:831] ['response']:

What is an LLM?

Good my RICHARD III:
He you will distent, I may
Is like pret to fort,
To some that fold my part they lok.
A farther's to consonce which sater,
And fater and him in the shall it them do her this,
The a my navin his more the with of haver,
But me and the a
[2025-07-23 17:29:15,475172][I][wordplay/trainer:762] Saving checkpoint to: /content
[2025-07-23 17:29:15,477113][I][wordplay/trainer:763] Saving model to: /content/model.pth
[2025-07-23 17:29:15,945199][I][wordplay/configs:141] Appending /content to /content/wordplay/src/ckpts/checkpoints.log
[2025-07-23 17:29:19,772665][I][wordplay/trainer:894] step=1310 loss=1.83877 dt=0.380515 dtf=0.00832942 dtb=0.00994461 sps=2.62802 sps_per_gpu=2.62802 tps=21528.7 tps_per_gpu=21528.7 mfu=0.630994
[2025-07-23 17:29:23,597414][I][wordplay/trainer:894] step=1320 loss=1.79997 dt=0.380789 dtf=0.00753653 dtb=0.0100344 sps=2.62613 sps_per_gpu=2.62613 tps=21513.2 tps_per_gpu=21513.2 mfu=0.631463
[2025-07-23 17:29:27,425373][I][wordplay/trainer:894] step=1330 loss=1.84227 dt=0.383599 dtf=0.00811679 dtb=0.0102277 sps=2.60689 sps_per_gpu=2.60689 tps=21355.6 tps_per_gpu=21355.6 mfu=0.63142
[2025-07-23 17:29:31,259289][I][wordplay/trainer:894] step=1340 loss=1.77032 dt=0.381153 dtf=0.00731168 dtb=0.00972694 sps=2.62362 sps_per_gpu=2.62362 tps=21492.7 tps_per_gpu=21492.7 mfu=0.631787
[2025-07-23 17:29:35,088601][I][wordplay/trainer:894] step=1350 loss=1.8076 dt=0.384321 dtf=0.00808188 dtb=0.0116733 sps=2.60199 sps_per_gpu=2.60199 tps=21315.5 tps_per_gpu=21315.5 mfu=0.631593
[2025-07-23 17:29:38,914972][I][wordplay/trainer:894] step=1360 loss=1.79383 dt=0.383019 dtf=0.00830957 dtb=0.0104623 sps=2.61084 sps_per_gpu=2.61084 tps=21388 tps_per_gpu=21388 mfu=0.631632
[2025-07-23 17:29:42,746913][I][wordplay/trainer:894] step=1370 loss=1.73757 dt=0.377326 dtf=0.009339 dtb=0.0118509 sps=2.65023 sps_per_gpu=2.65023 tps=21710.7 tps_per_gpu=21710.7 mfu=0.632622
[2025-07-23 17:29:46,582929][I][wordplay/trainer:894] step=1380 loss=1.74524 dt=0.373365 dtf=0.00773357 dtb=0.0100906 sps=2.67835 sps_per_gpu=2.67835 tps=21941 tps_per_gpu=21941 mfu=0.634193
[2025-07-23 17:29:50,410901][I][wordplay/trainer:894] step=1390 loss=1.75995 dt=0.382166 dtf=0.00797486 dtb=0.0104627 sps=2.61667 sps_per_gpu=2.61667 tps=21435.7 tps_per_gpu=21435.7 mfu=0.634113
[2025-07-23 17:29:54,241756][I][wordplay/trainer:894] step=1400 loss=1.81278 dt=0.391504 dtf=0.0126958 dtb=0.0182819 sps=2.55425 sps_per_gpu=2.55425 tps=20924.4 tps_per_gpu=20924.4 mfu=0.632531
[2025-07-23 17:29:55,175194][I][wordplay/trainer:827] ['prompt']: 'What is an LLM?'
[2025-07-23 17:29:55,177068][I][wordplay/trainer:831] ['response']:

What is an LLM?

ROHUMERS:
Citizen:
The's no worth bold of I heave is the port art.

SICINIUS:
Alay, sir, thou away the perfored,
Belie a hard set the of to your pakial;
Sirt are a a shall in thee.
Yet come, I chould cound thy king will.

BRATUS:
The good is heart thou t
[2025-07-23 17:30:52,849338][I][wordplay/trainer:762] Saving checkpoint to: /content
[2025-07-23 17:30:52,851168][I][wordplay/trainer:763] Saving model to: /content/model.pth
[2025-07-23 17:30:53,184812][I][wordplay/configs:141] Appending /content to /content/wordplay/src/ckpts/checkpoints.log
[2025-07-23 17:30:57,012134][I][wordplay/trainer:894] step=1410 loss=1.79791 dt=0.381525 dtf=0.00881983 dtb=0.0102005 sps=2.62106 sps_per_gpu=2.62106 tps=21471.7 tps_per_gpu=21471.7 mfu=0.632724
[2025-07-23 17:31:00,841188][I][wordplay/trainer:894] step=1420 loss=1.74375 dt=0.381039 dtf=0.00761951 dtb=0.0101972 sps=2.6244 sps_per_gpu=2.6244 tps=21499.1 tps_per_gpu=21499.1 mfu=0.632979
[2025-07-23 17:31:04,675786][I][wordplay/trainer:894] step=1430 loss=1.73401 dt=0.388151 dtf=0.00959453 dtb=0.0123491 sps=2.57631 sps_per_gpu=2.57631 tps=21105.2 tps_per_gpu=21105.2 mfu=0.632045
[2025-07-23 17:31:08,511906][I][wordplay/trainer:894] step=1440 loss=1.72673 dt=0.380442 dtf=0.00765078 dtb=0.00993138 sps=2.62852 sps_per_gpu=2.62852 tps=21532.8 tps_per_gpu=21532.8 mfu=0.632467
[2025-07-23 17:31:12,350823][I][wordplay/trainer:894] step=1450 loss=1.75055 dt=0.384587 dtf=0.00793686 dtb=0.0107903 sps=2.60019 sps_per_gpu=2.60019 tps=21300.8 tps_per_gpu=21300.8 mfu=0.632162
[2025-07-23 17:31:16,189335][I][wordplay/trainer:894] step=1460 loss=1.68073 dt=0.381957 dtf=0.00771424 dtb=0.00991214 sps=2.6181 sps_per_gpu=2.6181 tps=21447.4 tps_per_gpu=21447.4 mfu=0.63232
[2025-07-23 17:31:20,023731][I][wordplay/trainer:894] step=1470 loss=1.71749 dt=0.389038 dtf=0.0123934 dtb=0.016246 sps=2.57045 sps_per_gpu=2.57045 tps=21057.1 tps_per_gpu=21057.1 mfu=0.631309
[2025-07-23 17:31:23,858642][I][wordplay/trainer:894] step=1480 loss=1.72494 dt=0.380766 dtf=0.00802833 dtb=0.0109163 sps=2.62629 sps_per_gpu=2.62629 tps=21514.5 tps_per_gpu=21514.5 mfu=0.631751
[2025-07-23 17:31:27,693442][I][wordplay/trainer:894] step=1490 loss=1.72521 dt=0.384513 dtf=0.00979085 dtb=0.0104102 sps=2.60069 sps_per_gpu=2.60069 tps=21304.9 tps_per_gpu=21304.9 mfu=0.631529
[2025-07-23 17:31:31,528345][I][wordplay/trainer:894] step=1500 loss=1.70409 dt=0.385203 dtf=0.0109562 dtb=0.0163935 sps=2.59604 sps_per_gpu=2.59604 tps=21266.7 tps_per_gpu=21266.7 mfu=0.631217

Evaluate Model

import time

query = "What is an LLM?"
t0 = time.perf_counter()
outputs = trainer.evaluate(
    query,
    num_samples=1,
    max_new_tokens=256,
    top_k=2,
    display=False
)
logger.info(f'took: {time.perf_counter() - t0:.4f}s')
logger.info(f"['prompt']: '{query}'")
logger.info("['response']:\n\n" + fr"{outputs['0']['raw']}")
[2025-07-23 17:31:32,597792][I][tmp/ipython-input-12-582817405:12:ezpz.log] took: 0.9968s
[2025-07-23 17:31:32,599918][I][tmp/ipython-input-12-582817405:13:ezpz.log] ['prompt']: 'What is an LLM?'
[2025-07-23 17:31:32,601844][I][tmp/ipython-input-12-582817405:14:ezpz.log] ['response']:

What is an LLM? What, that the wild my lord,
And the shal to may so shal that the shall thee.

RICHARD:
What that there thee shal the const the shall so thine.

RICHARD:
The wil thee the shal shal that that the should.

RICHARD:
Then the shal too the show shal to thee.


Citation

BibTeX citation:
@online{foreman2025,
  author = {Foreman, Sam},
  title = {{[}`Wordplay` 🎮 💬{]}(Https://Github.com/Saforem2/Wordplay):
    {Shakespeare}},
  date = {2025-07-22},
  url = {https://saforem2.github.io/hpc-bootcamp-2025/02-llms/08-shakespeare-example-colab/},
  langid = {en}
}
For attribution, please cite this work as:
Foreman, Sam. 2025. “[`Wordplay` 🎮 💬](Https://Github.com/Saforem2/Wordplay): Shakespeare.” July 22, 2025. https://saforem2.github.io/hpc-bootcamp-2025/02-llms/08-shakespeare-example-colab/.