[`wordplay` 🎮 💬](https://github.com/saforem2/wordplay): Shakespeare

Shakespeare himself, dressed in full Shakespearean garb, writing code at a modern workstation with multiple monitors, hacking away profusely, backlit, high quality for publication

Negative Prompt:

low quality, 3d, photorealistic, ugly

Install / Setup

Warning!

IF YOU ARE EXECUTING ON GOOGLE COLAB:

You will need to restart your runtime (Runtime \rightarrow\, Restart runtime)
after executing the following cell:

%%bash

python3 -c 'import wordplay; print(wordplay.__file__)' 2> '/dev/null'

if [[ $? -eq 0 ]]; then
    echo "Has wordplay installed. Nothing to do."
else
    echo "Does not have wordplay installed. Installing..."
    git clone 'https://github.com/saforem2/wordplay'
    python3 wordplay/data/shakespeare_char/prepare.py
    python3 wordplay/data/shakespeare/prepare.py
    python3 -m pip install deepspeed
    python3 -m pip install -e wordplay
fi

/content/wordplay/src/wordplay/__init__.py
Has wordplay installed. Nothing to do.

Post Install

If installed correctly, you should be able to:

>>> import wordplay
>>> wordplay.__file__
'/path/to/wordplay/src/wordplay/__init__.py'

%load_ext autoreload
%autoreload 2
import os
import sys
import ezpz

os.environ['COLORTERM'] = 'truecolor'
if sys.platform == 'darwin':
    # If running on MacOS:
    # os.environ['PYTORCH_ENABLE_MPS_FALLBACK'] = '1'
    os.environ['TORCH_DEVICE'] = 'cpu'
# -----------------------------------------------

logger = ezpz.get_logger()

import wordplay
logger.info(wordplay.__file__)

[2025-07-23 17:07:07,066155][I][ezpz/__init__:265:ezpz] Setting logging level to 'INFO' on 'RANK == 0'

[2025-07-23 17:07:07,072771][I][ezpz/__init__:266:ezpz] Setting logging level to 'CRITICAL' on all others 'RANK != 0'

[2025-07-23 17:07:07,079375][I][tmp/ipython-input-2-2338663768:17:ezpz.log] /content/wordplay/src/wordplay/__init__.py

Build Trainer

Explicitly, we:

setup_torch(...)
Build cfg: DictConfig = get_config(...)
Instnatiate config: ExperimentConfig = instantiate(cfg)
Build trainer = Trainer(config)

import wordplay
print(wordplay.__file__)

/content/wordplay/src/wordplay/__init__.py

import os
import numpy as np
from ezpz import setup
from hydra.utils import instantiate
from wordplay.configs import get_config, PROJECT_ROOT
from wordplay.trainer import Trainer

HF_DATASETS_CACHE = PROJECT_ROOT.joinpath('.cache', 'huggingface')
HF_DATASETS_CACHE.mkdir(exist_ok=True, parents=True)

os.environ['HF_DATASETS_CACHE'] = HF_DATASETS_CACHE.as_posix()

BACKEND = 'DDP'

rank = setup(
    framework='pytorch',
    backend=BACKEND,
    seed=1234,
)

cfg = get_config(
    [
        'data=shakespeare',
        'model=shakespeare',
        'model.batch_size=8',
        'model.block_size=1024',
        'optimizer=shakespeare',
        'train=shakespeare',
        f'train.backend={BACKEND}',
        'train.compile=false',
        'train.dtype=bfloat16',
        'train.max_iters=1000',
        'train.log_interval=10',
        'train.eval_interval=100',
    ]
)
config = instantiate(cfg)

[2025-07-23 17:07:07,409437][I][wordplay/configs:81] Setting HF_DATASETS_CACHE to /content/wordplay/.cache/huggingface/datasets

[2025-07-23 17:07:07,435593][I][ezpz/dist:1159] Using fw='ddp' with torch_{device,backend}= {cuda, nccl}

[2025-07-23 17:07:07,438150][I][ezpz/dist:1026] Caught MASTER_PORT=41765 from environment!

[2025-07-23 17:07:07,440989][I][ezpz/dist:1042] Using torch.distributed.init_process_group with
- master_addr='588b3fb1cb70'
- master_port='41765'
- world_size=1
- rank=0
- local_rank=0
- timeout=datetime.timedelta(seconds=3600)
- backend='nccl'

[2025-07-23 17:07:07,447590][I][ezpz/dist:759] Calling torch.distributed.init_process_group_with: rank=0 world_size=1 backend=nccl

[2025-07-23 17:07:07,462711][I][ezpz/dist:1377] Using device='cuda' with backend='nccl' + 'nccl' for distributed training.

[2025-07-23 17:07:07,465933][I][ezpz/dist:1422] ['588b3fb1cb70'][0/0]

[2025-07-23 17:07:08,215788][I][wordplay/configs:317] Loading val from /content/wordplay/data/shakespeare_char/val.bin

[2025-07-23 17:07:08,221368][I][wordplay/configs:317] Loading train from /content/wordplay/data/shakespeare_char/train.bin

[2025-07-23 17:07:08,226696][I][wordplay/configs:442] Tokens per iteration: 8,192

[2025-07-23 17:07:08,231221][I][wordplay/configs:465] Using self.ptdtype=torch.bfloat16 on self.device_type='cuda'

[2025-07-23 17:07:08,234866][I][wordplay/configs:471] Initializing a new model from scratch

Build `Trainer` object

trainer = Trainer(config)

[2025-07-23 17:07:08,315621][I][wordplay/trainer:248] Initializing a new model from scratch

[2025-07-23 17:07:08,654618][I][wordplay/model:255] number of parameters: 10.65M

[2025-07-23 17:07:08,675995][I][wordplay/trainer:266] Model size: num_params=10646784

[2025-07-23 17:07:08,686453][I][wordplay/model:445] num decayed parameter tensors: 26, with 11,035,008 parameters

[2025-07-23 17:07:08,690282][I][wordplay/model:449] num non-decayed parameter tensors: 13, with 4,992 parameters

[2025-07-23 17:07:08,696244][I][wordplay/model:465] using fused AdamW: True

[2025-07-23 17:07:08,699647][C][wordplay/trainer:318] "devid='cuda:0'"

[2025-07-23 17:07:08,703940][I][wordplay/trainer:358] • self.model=GPT(
  (transformer): ModuleDict(
    (wte): Embedding(65, 384)
    (wpe): Embedding(1024, 384)
    (drop): Dropout(p=0.2, inplace=False)
    (h): ModuleList(
      (0-5): 6 x Block(
        (ln_1): LayerNorm()
        (attn): CausalSelfAttention(
          (c_attn): Linear(in_features=384, out_features=1152, bias=False)
          (c_proj): Linear(in_features=384, out_features=384, bias=False)
          (attn_dropout): Dropout(p=0.2, inplace=False)
          (resid_dropout): Dropout(p=0.2, inplace=False)
        )
        (ln_2): LayerNorm()
        (mlp): MLP(
          (c_fc): Linear(in_features=384, out_features=1536, bias=False)
          (act_fn): GELU(approximate='none')
          (c_proj): Linear(in_features=1536, out_features=384, bias=False)
          (dropout): Dropout(p=0.2, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm()
  )
  (lm_head): Linear(in_features=384, out_features=65, bias=False)
)

[2025-07-23 17:07:08,731597][I][wordplay/trainer:359] • self.grad_scaler=<torch.cuda.amp.grad_scaler.GradScaler object at 0x7cbd3c9a85d0>

[2025-07-23 17:07:08,737375][I][wordplay/trainer:360] • self.model_engine=GPT(
  (transformer): ModuleDict(
    (wte): Embedding(65, 384)
    (wpe): Embedding(1024, 384)
    (drop): Dropout(p=0.2, inplace=False)
    (h): ModuleList(
      (0-5): 6 x Block(
        (ln_1): LayerNorm()
        (attn): CausalSelfAttention(
          (c_attn): Linear(in_features=384, out_features=1152, bias=False)
          (c_proj): Linear(in_features=384, out_features=384, bias=False)
          (attn_dropout): Dropout(p=0.2, inplace=False)
          (resid_dropout): Dropout(p=0.2, inplace=False)
        )
        (ln_2): LayerNorm()
        (mlp): MLP(
          (c_fc): Linear(in_features=384, out_features=1536, bias=False)
          (act_fn): GELU(approximate='none')
          (c_proj): Linear(in_features=1536, out_features=384, bias=False)
          (dropout): Dropout(p=0.2, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm()
  )
  (lm_head): Linear(in_features=384, out_features=65, bias=False)
)

[2025-07-23 17:07:08,760469][I][wordplay/trainer:361] • self.optimizer=AdamW (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.99)
    capturable: False
    differentiable: False
    eps: 1e-08
    foreach: None
    fused: True
    lr: 0.001
    maximize: False
    weight_decay: 0.1

Parameter Group 1
    amsgrad: False
    betas: (0.9, 0.99)
    capturable: False
    differentiable: False
    eps: 1e-08
    foreach: None
    fused: True
    lr: 0.001
    maximize: False
    weight_decay: 0.0
)

Prompt (prior to training)

query = "What is an LLM?"
outputs = trainer.evaluate(
    query,
    num_samples=1,
    max_new_tokens=256,
    top_k=16,
    display=False
)
logger.info(f"['prompt']: '{query}'")
logger.info("['response']:\n\n" + fr"{outputs['0']['raw']}")

[2025-07-23 17:07:10,765047][I][tmp/ipython-input-6-3496000222:9:ezpz.log] ['prompt']: 'What is an LLM?'

[2025-07-23 17:07:10,767795][I][tmp/ipython-input-6-3496000222:10:ezpz.log] ['response']:

What is an LLM?ouuu'fU?UUUU-LLlVmoYY;?U$IMwwYDjMYYXSSdIss;I''DPOjHhooooMZtmkoGXjZ
BDDddZkydVPcM'MAWILMDDP'''!A'Vzl;R
dtA$ttoXttJJffobJJ;b-vkwwJJOHHwQFccddlobAGGnM'''$kW;kzZlSwZkAoR;wmooo$J-fffoYDd'UBooXYB;JSf?P'MJ..t'hPffID;R.XXo'''SPZkXXXe'VS.JoMdkXSffo''RHQklK''UUUSoMn

Train Model

name	description
`step`	Current training step
`loss`	Loss value
`dt`	Time per step (in ms)
`sps`	Samples per second
`mtps`	(million) Tokens per sec
`mfu`	Model Flops utilization[1]

^legend: #tbl-legend

[1] in units of A100 bfloat16 peak FLOPS

trainer.config.device_type

'cuda'

from rich import print

print(trainer.model)

GPT(
  (transformer): ModuleDict(
    (wte): Embedding(65, 384)
    (wpe): Embedding(1024, 384)
    (drop): Dropout(p=0.2, inplace=False)
    (h): ModuleList(
      (0-5): 6 x Block(
        (ln_1): LayerNorm()
        (attn): CausalSelfAttention(
          (c_attn): Linear(in_features=384, out_features=1152, bias=False)
          (c_proj): Linear(in_features=384, out_features=384, bias=False)
          (attn_dropout): Dropout(p=0.2, inplace=False)
          (resid_dropout): Dropout(p=0.2, inplace=False)
        )
        (ln_2): LayerNorm()
        (mlp): MLP(
          (c_fc): Linear(in_features=384, out_features=1536, bias=False)
          (act_fn): GELU(approximate='none')
          (c_proj): Linear(in_features=1536, out_features=384, bias=False)
          (dropout): Dropout(p=0.2, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm()
  )
  (lm_head): Linear(in_features=384, out_features=65, bias=False)
)

(partial) Training:

We’ll first train for 500 iterations and then evaluate the models performance on the same prompt:

What is an LLM?

trainer.train(train_iters=500)

                Training Legend                 
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃        abbr ┃ desc                           ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│        step │ Current training iteration     │
│        loss │ Loss value                     │
│          dt │ Elapsed time per training step │
│         dtf │ Elapsed time per forward step  │
│         dtb │ Elapsed time per backward step │
│         sps │ Samples per second             │
│ sps_per_gpu │ Samples per second (per GPU)   │
│         tps │ Tokens per second              │
│ tps_per_gpu │ Tokens per second (per GPU)    │
│         mfu │ Model flops utilization        │
└─────────────┴────────────────────────────────┘

[2025-07-23 17:07:12,567707][I][wordplay/trainer:827] ['prompt']: 'What is an LLM?'

[2025-07-23 17:07:12,572514][I][wordplay/trainer:831] ['response']:

What is an LLM?ZIoZo-om';-'MAhB,RcOVP!JJhhkkJnnUzI''&D&jH!ddWJJhfUUVkRhZoZ:MoJRtDjkkhhdMM'Sdd-dbUoXXLSfyXXXRb3ZOS''$!o&&jnVJ3MMkjJ'Mffe-cm..J3Oa;'$hooJ3z!jUSDn
'DqBJtHH;!ozZIZokzoooYlMKLJm.DDmkkXRX'NnhMSccJsH;Ude.tRzDoUtm'JmCd;Jd&j'Qo&'$$DAJTPPVv&j'jjtmmtdls;wNNoooJ3$DDJ

[2025-07-23 17:08:14,213943][I][wordplay/trainer:894] step=10 loss=3.28901 dt=0.388647 dtf=0.0077605 dtb=0.0102481 sps=2.57303 sps_per_gpu=2.57303 tps=21078.3 tps_per_gpu=21078.3 mfu=0.622837

[2025-07-23 17:08:18,050755][I][wordplay/trainer:894] step=20 loss=2.82665 dt=0.392386 dtf=0.0123749 dtb=0.0163346 sps=2.54851 sps_per_gpu=2.54851 tps=20877.4 tps_per_gpu=20877.4 mfu=0.622244

[2025-07-23 17:08:21,869708][I][wordplay/trainer:894] step=30 loss=2.64874 dt=0.379033 dtf=0.00770909 dtb=0.0103789 sps=2.6383 sps_per_gpu=2.6383 tps=21612.9 tps_per_gpu=21612.9 mfu=0.623883

[2025-07-23 17:08:25,681515][I][wordplay/trainer:894] step=40 loss=2.58119 dt=0.375823 dtf=0.00982569 dtb=0.0116637 sps=2.66083 sps_per_gpu=2.66083 tps=21797.5 tps_per_gpu=21797.5 mfu=0.625904

[2025-07-23 17:08:29,489842][I][wordplay/trainer:894] step=50 loss=2.5564 dt=0.381329 dtf=0.00818184 dtb=0.0101487 sps=2.6224 sps_per_gpu=2.6224 tps=21482.7 tps_per_gpu=21482.7 mfu=0.626792

[2025-07-23 17:08:33,295135][I][wordplay/trainer:894] step=60 loss=2.55377 dt=0.37768 dtf=0.00809329 dtb=0.00990252 sps=2.64775 sps_per_gpu=2.64775 tps=21690.3 tps_per_gpu=21690.3 mfu=0.628205

[2025-07-23 17:08:37,094848][I][wordplay/trainer:894] step=70 loss=2.53792 dt=0.37185 dtf=0.00804143 dtb=0.010255 sps=2.68926 sps_per_gpu=2.68926 tps=22030.4 tps_per_gpu=22030.4 mfu=0.630482

[2025-07-23 17:08:40,894946][I][wordplay/trainer:894] step=80 loss=2.56441 dt=0.380709 dtf=0.00861202 dtb=0.0100984 sps=2.62668 sps_per_gpu=2.62668 tps=21517.8 tps_per_gpu=21517.8 mfu=0.631016

[2025-07-23 17:08:44,697477][I][wordplay/trainer:894] step=90 loss=2.5338 dt=0.368932 dtf=0.00809296 dtb=0.00962644 sps=2.71053 sps_per_gpu=2.71053 tps=22204.6 tps_per_gpu=22204.6 mfu=0.633527

[2025-07-23 17:08:48,500289][I][wordplay/trainer:894] step=100 loss=2.53127 dt=0.376976 dtf=0.00801782 dtb=0.0100192 sps=2.65269 sps_per_gpu=2.65269 tps=21730.8 tps_per_gpu=21730.8 mfu=0.634386

[2025-07-23 17:08:49,332883][I][wordplay/trainer:827] ['prompt']: 'What is an LLM?'

[2025-07-23 17:08:49,334601][I][wordplay/trainer:831] ['response']:

What is an LLM?
AREThe he anghangatr ho misen fave by the t fe wh w onk pe wns w s did s fithe s.

CHather s, t be angenont ofous sts se mathan se.


An s tr be the acice pllll is s anontharanonte as wakar s sthe toore sthe towar thag, tin toullon llly my makndheacove t

[2025-07-23 17:09:47,060965][I][wordplay/trainer:762] Saving checkpoint to: /content

[2025-07-23 17:09:47,063008][I][wordplay/trainer:763] Saving model to: /content/model.pth

[2025-07-23 17:09:47,414828][I][wordplay/configs:141] Appending /content to /content/wordplay/src/ckpts/checkpoints.log

[2025-07-23 17:09:51,240684][I][wordplay/trainer:894] step=110 loss=2.50749 dt=0.380784 dtf=0.00766138 dtb=0.0102359 sps=2.62616 sps_per_gpu=2.62616 tps=21513.5 tps_per_gpu=21513.5 mfu=0.634517

[2025-07-23 17:09:55,063291][I][wordplay/trainer:894] step=120 loss=2.5274 dt=0.379459 dtf=0.00809937 dtb=0.010612 sps=2.63533 sps_per_gpu=2.63533 tps=21588.7 tps_per_gpu=21588.7 mfu=0.634857

[2025-07-23 17:09:58,886616][I][wordplay/trainer:894] step=130 loss=2.54362 dt=0.380395 dtf=0.00779761 dtb=0.00998153 sps=2.62885 sps_per_gpu=2.62885 tps=21535.5 tps_per_gpu=21535.5 mfu=0.635006

[2025-07-23 17:10:02,708605][I][wordplay/trainer:894] step=140 loss=2.50172 dt=0.381295 dtf=0.00778436 dtb=0.0100367 sps=2.62264 sps_per_gpu=2.62264 tps=21484.7 tps_per_gpu=21484.7 mfu=0.63499

[2025-07-23 17:10:06,528915][I][wordplay/trainer:894] step=150 loss=2.50335 dt=0.373231 dtf=0.0079468 dtb=0.0108304 sps=2.67931 sps_per_gpu=2.67931 tps=21948.9 tps_per_gpu=21948.9 mfu=0.636348

[2025-07-23 17:10:10,344712][I][wordplay/trainer:894] step=160 loss=2.48674 dt=0.372652 dtf=0.0117069 dtb=0.0104974 sps=2.68347 sps_per_gpu=2.68347 tps=21983 tps_per_gpu=21983 mfu=0.63767

[2025-07-23 17:10:14,168118][I][wordplay/trainer:894] step=170 loss=2.47736 dt=0.380656 dtf=0.00807191 dtb=0.0106655 sps=2.62705 sps_per_gpu=2.62705 tps=21520.8 tps_per_gpu=21520.8 mfu=0.637494

[2025-07-23 17:10:17,988492][I][wordplay/trainer:894] step=180 loss=2.46811 dt=0.380603 dtf=0.0078251 dtb=0.0103172 sps=2.62741 sps_per_gpu=2.62741 tps=21523.8 tps_per_gpu=21523.8 mfu=0.637345

[2025-07-23 17:10:21,810169][I][wordplay/trainer:894] step=190 loss=2.45376 dt=0.381434 dtf=0.013805 dtb=0.0137897 sps=2.62169 sps_per_gpu=2.62169 tps=21476.9 tps_per_gpu=21476.9 mfu=0.637072

[2025-07-23 17:10:25,634107][I][wordplay/trainer:894] step=200 loss=2.47938 dt=0.383512 dtf=0.00936293 dtb=0.0101239 sps=2.60748 sps_per_gpu=2.60748 tps=21360.5 tps_per_gpu=21360.5 mfu=0.636483

[2025-07-23 17:10:26,457547][I][wordplay/trainer:827] ['prompt']: 'What is an LLM?'

[2025-07-23 17:10:26,459401][I][wordplay/trainer:831] ['response']:

What is an LLM?
HLUS:
LII hethin.
TE: hast seatisurindo wiretyo benin tige, manens, br athetir hyors, blireriarond te me and, f llfes thes thor ists a m thives me windou,



HA oulince s muce oll sse s avelo the rurd p as aver themes l neas:
Heratho w ts the o w. thane r

[2025-07-23 17:11:24,182085][I][wordplay/trainer:762] Saving checkpoint to: /content

[2025-07-23 17:11:24,184071][I][wordplay/trainer:763] Saving model to: /content/model.pth

[2025-07-23 17:11:24,514195][I][wordplay/configs:141] Appending /content to /content/wordplay/src/ckpts/checkpoints.log

[2025-07-23 17:11:28,333182][I][wordplay/trainer:894] step=210 loss=2.45724 dt=0.380321 dtf=0.00789146 dtb=0.00988756 sps=2.62936 sps_per_gpu=2.62936 tps=21539.7 tps_per_gpu=21539.7 mfu=0.636482

[2025-07-23 17:11:32,159664][I][wordplay/trainer:894] step=220 loss=2.48242 dt=0.383149 dtf=0.00807603 dtb=0.0101043 sps=2.60995 sps_per_gpu=2.60995 tps=21380.7 tps_per_gpu=21380.7 mfu=0.636011

[2025-07-23 17:11:35,989095][I][wordplay/trainer:894] step=230 loss=2.48992 dt=0.381508 dtf=0.00775943 dtb=0.00976974 sps=2.62117 sps_per_gpu=2.62117 tps=21472.7 tps_per_gpu=21472.7 mfu=0.635859

[2025-07-23 17:11:39,818287][I][wordplay/trainer:894] step=240 loss=2.45306 dt=0.382383 dtf=0.00783342 dtb=0.0103981 sps=2.61518 sps_per_gpu=2.61518 tps=21423.5 tps_per_gpu=21423.5 mfu=0.635577

[2025-07-23 17:11:43,651793][I][wordplay/trainer:894] step=250 loss=2.48512 dt=0.381244 dtf=0.00790653 dtb=0.00995927 sps=2.623 sps_per_gpu=2.623 tps=21487.6 tps_per_gpu=21487.6 mfu=0.635512

[2025-07-23 17:11:47,488905][I][wordplay/trainer:894] step=260 loss=2.45921 dt=0.375016 dtf=0.0110469 dtb=0.0137554 sps=2.66655 sps_per_gpu=2.66655 tps=21844.4 tps_per_gpu=21844.4 mfu=0.636509

[2025-07-23 17:11:51,323856][I][wordplay/trainer:894] step=270 loss=2.46985 dt=0.38433 dtf=0.00785675 dtb=0.0111291 sps=2.60193 sps_per_gpu=2.60193 tps=21315 tps_per_gpu=21315 mfu=0.635841

[2025-07-23 17:11:55,157805][I][wordplay/trainer:894] step=280 loss=2.47304 dt=0.38265 dtf=0.00785524 dtb=0.010542 sps=2.61336 sps_per_gpu=2.61336 tps=21408.6 tps_per_gpu=21408.6 mfu=0.635517

[2025-07-23 17:11:58,985311][I][wordplay/trainer:894] step=290 loss=2.4519 dt=0.38073 dtf=0.0100743 dtb=0.0128665 sps=2.62653 sps_per_gpu=2.62653 tps=21516.5 tps_per_gpu=21516.5 mfu=0.635544

[2025-07-23 17:12:02,814627][I][wordplay/trainer:894] step=300 loss=2.44979 dt=0.383147 dtf=0.00804455 dtb=0.0103887 sps=2.60996 sps_per_gpu=2.60996 tps=21380.8 tps_per_gpu=21380.8 mfu=0.635167

[2025-07-23 17:12:03,628924][I][wordplay/trainer:827] ['prompt']: 'What is an LLM?'

[2025-07-23 17:12:03,630654][I][wordplay/trainer:831] ['response']:

What is an LLM? muroursee aril icalis

We lal pl mal.
CIO:

LESTerthe coprideve, y wingrenget mir bue powin ithe an w
AN:
INI heshas be, intaly ws avevethay aiourofourthelin wous ans ay ber IUS:
Wh f y have s n t.
IOLONThaventer the t at tho, I win thounepancke and find

[2025-07-23 17:13:01,480227][I][wordplay/trainer:762] Saving checkpoint to: /content

[2025-07-23 17:13:01,482159][I][wordplay/trainer:763] Saving model to: /content/model.pth

[2025-07-23 17:13:01,816991][I][wordplay/configs:141] Appending /content to /content/wordplay/src/ckpts/checkpoints.log

[2025-07-23 17:13:05,641415][I][wordplay/trainer:894] step=310 loss=2.45647 dt=0.383054 dtf=0.00785093 dtb=0.00992947 sps=2.6106 sps_per_gpu=2.6106 tps=21386 tps_per_gpu=21386 mfu=0.634844

[2025-07-23 17:13:09,467371][I][wordplay/trainer:894] step=320 loss=2.45905 dt=0.382875 dtf=0.0081 dtb=0.010746 sps=2.61182 sps_per_gpu=2.61182 tps=21396 tps_per_gpu=21396 mfu=0.634582

[2025-07-23 17:13:13,297667][I][wordplay/trainer:894] step=330 loss=2.4555 dt=0.38572 dtf=0.0108775 dtb=0.0128777 sps=2.59256 sps_per_gpu=2.59256 tps=21238.2 tps_per_gpu=21238.2 mfu=0.63388

[2025-07-23 17:13:17,131895][I][wordplay/trainer:894] step=340 loss=2.4634 dt=0.384959 dtf=0.00957926 dtb=0.010189 sps=2.59768 sps_per_gpu=2.59768 tps=21280.2 tps_per_gpu=21280.2 mfu=0.633373

[2025-07-23 17:13:20,957109][I][wordplay/trainer:894] step=350 loss=2.49212 dt=0.38072 dtf=0.00796532 dtb=0.0103618 sps=2.6266 sps_per_gpu=2.6266 tps=21517.1 tps_per_gpu=21517.1 mfu=0.633616

[2025-07-23 17:13:24,791303][I][wordplay/trainer:894] step=360 loss=2.42521 dt=0.380351 dtf=0.00941999 dtb=0.0131558 sps=2.62915 sps_per_gpu=2.62915 tps=21538 tps_per_gpu=21538 mfu=0.633897

[2025-07-23 17:13:28,625122][I][wordplay/trainer:894] step=370 loss=2.46779 dt=0.383116 dtf=0.00759078 dtb=0.0105659 sps=2.61017 sps_per_gpu=2.61017 tps=21382.5 tps_per_gpu=21382.5 mfu=0.63369

[2025-07-23 17:13:32,456066][I][wordplay/trainer:894] step=380 loss=2.46751 dt=0.384732 dtf=0.00849637 dtb=0.0100098 sps=2.59921 sps_per_gpu=2.59921 tps=21292.8 tps_per_gpu=21292.8 mfu=0.633238

[2025-07-23 17:13:36,284446][I][wordplay/trainer:894] step=390 loss=2.47132 dt=0.390981 dtf=0.0104592 dtb=0.0141359 sps=2.55767 sps_per_gpu=2.55767 tps=20952.4 tps_per_gpu=20952.4 mfu=0.631826

[2025-07-23 17:13:40,120231][I][wordplay/trainer:894] step=400 loss=2.50043 dt=0.382461 dtf=0.00788739 dtb=0.011582 sps=2.61465 sps_per_gpu=2.61465 tps=21419.2 tps_per_gpu=21419.2 mfu=0.631935

[2025-07-23 17:13:40,955053][I][wordplay/trainer:827] ['prompt']: 'What is an LLM?'

[2025-07-23 17:13:40,956742][I][wordplay/trainer:831] ['response']:

What is an LLM?
HUSUS:
Wingens thent ndd the se thof heare oupeed s te ase harot anes hant wisthe het clor m at t somy th br his s he, thanononoun heco he bong were asesonor t wearesp



NUS: th ber d, ay sh thout wo pavavond ay touch the hastrd omer hes ias may perengor

[2025-07-23 17:14:38,666483][I][wordplay/trainer:762] Saving checkpoint to: /content

[2025-07-23 17:14:38,673966][I][wordplay/trainer:763] Saving model to: /content/model.pth

[2025-07-23 17:14:39,050214][I][wordplay/configs:141] Appending /content to /content/wordplay/src/ckpts/checkpoints.log

[2025-07-23 17:14:42,870655][I][wordplay/trainer:894] step=410 loss=2.48579 dt=0.380458 dtf=0.00763788 dtb=0.00967759 sps=2.62841 sps_per_gpu=2.62841 tps=21531.9 tps_per_gpu=21531.9 mfu=0.632366

[2025-07-23 17:14:46,696620][I][wordplay/trainer:894] step=420 loss=2.44756 dt=0.389089 dtf=0.0143081 dtb=0.0108978 sps=2.57011 sps_per_gpu=2.57011 tps=21054.3 tps_per_gpu=21054.3 mfu=0.631342

[2025-07-23 17:14:50,528406][I][wordplay/trainer:894] step=430 loss=2.46498 dt=0.383404 dtf=0.0097532 dtb=0.0132017 sps=2.60821 sps_per_gpu=2.60821 tps=21366.5 tps_per_gpu=21366.5 mfu=0.631343

[2025-07-23 17:14:54,360775][I][wordplay/trainer:894] step=440 loss=2.46993 dt=0.384899 dtf=0.00866323 dtb=0.0128457 sps=2.59808 sps_per_gpu=2.59808 tps=21283.5 tps_per_gpu=21283.5 mfu=0.631099

[2025-07-23 17:14:58,197581][I][wordplay/trainer:894] step=450 loss=2.45371 dt=0.383754 dtf=0.00799181 dtb=0.0108706 sps=2.60584 sps_per_gpu=2.60584 tps=21347 tps_per_gpu=21347 mfu=0.631067

[2025-07-23 17:15:02,033033][I][wordplay/trainer:894] step=460 loss=2.43378 dt=0.379863 dtf=0.0110734 dtb=0.0147297 sps=2.63253 sps_per_gpu=2.63253 tps=21565.6 tps_per_gpu=21565.6 mfu=0.631684

[2025-07-23 17:15:05,868916][I][wordplay/trainer:894] step=470 loss=2.41934 dt=0.378727 dtf=0.00844342 dtb=0.0111405 sps=2.64043 sps_per_gpu=2.64043 tps=21630.4 tps_per_gpu=21630.4 mfu=0.632431

[2025-07-23 17:15:09,703796][I][wordplay/trainer:894] step=480 loss=2.45929 dt=0.382927 dtf=0.00844033 dtb=0.0114589 sps=2.61146 sps_per_gpu=2.61146 tps=21393.1 tps_per_gpu=21393.1 mfu=0.632402

[2025-07-23 17:15:13,538234][I][wordplay/trainer:894] step=490 loss=2.4835 dt=0.383195 dtf=0.0079397 dtb=0.0104966 sps=2.60964 sps_per_gpu=2.60964 tps=21378.1 tps_per_gpu=21378.1 mfu=0.632332

[2025-07-23 17:15:17,374316][I][wordplay/trainer:894] step=500 loss=2.43789 dt=0.382541 dtf=0.00727845 dtb=0.0100782 sps=2.6141 sps_per_gpu=2.6141 tps=21414.7 tps_per_gpu=21414.7 mfu=0.632376

import time

query = "What is an LLM?"
t0 = time.perf_counter()
outputs = trainer.evaluate(
    query,
    num_samples=1,
    max_new_tokens=256,
    top_k=16,
    display=False
)
logger.info(f'took: {time.perf_counter() - t0:.4f}s')
logger.info(f"['prompt']: '{query}'")
logger.info("['response']:\n\n" + fr"{outputs['0']['raw']}")

[2025-07-23 17:15:18,240721][I][tmp/ipython-input-10-1425179755:12:ezpz.log] took: 0.8133s

[2025-07-23 17:15:18,242822][I][tmp/ipython-input-10-1425179755:13:ezpz.log] ['prompt']: 'What is an LLM?'

[2025-07-23 17:15:18,245933][I][tmp/ipython-input-10-1425179755:14:ezpz.log] ['response']:

What is an LLM? burthilio s in o th twiser mbalilis ar sis alincore tt t mes mpresofo m whe hary ht ourighothast omy pomithe d?




Bu le wie IUTore ll ishath tes d fr irme nco s f maksere,
IAn he ise wicouss s, areatath meangre the, my hare wis pay toth laut athe s,
Ano

Resume Training…

trainer.train()

[2025-07-23 17:15:19,128023][I][wordplay/trainer:827] ['prompt']: 'What is an LLM?'

[2025-07-23 17:15:19,129812][I][wordplay/trainer:831] ['response']:

What is an LLM?

POOSTOLENETES:
INIONEO: oft ffan yo pe hous tor ce me s here serste buthe he ase he


NENIO:
Whe arallin hatithofoull the, fousencay yont paris.
PENTER:
An o, that s f lllle ishan be be acer se war tha pe iopre is ore nckat, me my?

WI tofifre he llly po

[2025-07-23 17:16:16,858986][I][wordplay/trainer:762] Saving checkpoint to: /content

[2025-07-23 17:16:16,861272][I][wordplay/trainer:763] Saving model to: /content/model.pth

[2025-07-23 17:16:17,190207][I][wordplay/configs:141] Appending /content to /content/wordplay/src/ckpts/checkpoints.log

[2025-07-23 17:16:21,016363][I][wordplay/trainer:894] step=510 loss=2.46165 dt=0.380989 dtf=0.00765307 dtb=0.0102919 sps=2.62475 sps_per_gpu=2.62475 tps=21502 tps_per_gpu=21502 mfu=0.635357

[2025-07-23 17:16:24,851610][I][wordplay/trainer:894] step=520 loss=2.44981 dt=0.383659 dtf=0.00791765 dtb=0.0103253 sps=2.60648 sps_per_gpu=2.60648 tps=21352.3 tps_per_gpu=21352.3 mfu=0.634915

[2025-07-23 17:16:28,687465][I][wordplay/trainer:894] step=530 loss=2.45632 dt=0.388874 dtf=0.01204 dtb=0.0159266 sps=2.57153 sps_per_gpu=2.57153 tps=21066 tps_per_gpu=21066 mfu=0.633671

[2025-07-23 17:16:32,526883][I][wordplay/trainer:894] step=540 loss=2.45869 dt=0.38549 dtf=0.00823117 dtb=0.0103854 sps=2.5941 sps_per_gpu=2.5941 tps=21250.9 tps_per_gpu=21250.9 mfu=0.633098

[2025-07-23 17:16:36,360809][I][wordplay/trainer:894] step=550 loss=2.44677 dt=0.385398 dtf=0.00789234 dtb=0.0121862 sps=2.59472 sps_per_gpu=2.59472 tps=21256 tps_per_gpu=21256 mfu=0.632597

[2025-07-23 17:16:40,195560][I][wordplay/trainer:894] step=560 loss=2.43464 dt=0.385434 dtf=0.0106042 dtb=0.0129227 sps=2.59448 sps_per_gpu=2.59448 tps=21254 tps_per_gpu=21254 mfu=0.63214

[2025-07-23 17:16:44,032374][I][wordplay/trainer:894] step=570 loss=2.45685 dt=0.382214 dtf=0.00815606 dtb=0.0103625 sps=2.61633 sps_per_gpu=2.61633 tps=21433 tps_per_gpu=21433 mfu=0.632258

[2025-07-23 17:16:47,866282][I][wordplay/trainer:894] step=580 loss=2.42042 dt=0.383891 dtf=0.00803656 dtb=0.010343 sps=2.60491 sps_per_gpu=2.60491 tps=21339.4 tps_per_gpu=21339.4 mfu=0.632087

[2025-07-23 17:16:51,705365][I][wordplay/trainer:894] step=590 loss=2.45867 dt=0.381508 dtf=0.0139744 dtb=0.0143725 sps=2.62118 sps_per_gpu=2.62118 tps=21472.7 tps_per_gpu=21472.7 mfu=0.632328

[2025-07-23 17:16:55,543539][I][wordplay/trainer:894] step=600 loss=2.42416 dt=0.391623 dtf=0.0130454 dtb=0.0146926 sps=2.55347 sps_per_gpu=2.55347 tps=20918.1 tps_per_gpu=20918.1 mfu=0.630905

[2025-07-23 17:16:56,372045][I][wordplay/trainer:827] ['prompt']: 'What is an LLM?'

[2025-07-23 17:16:56,373698][I][wordplay/trainer:831] ['response']:

What is an LLM?



KILINGSBRK:
Ye oinot ath lord nous cke, iat ckin and;
Yor te, wad caco aver h
Tow, tom harrds, wer ow coon nalilllll th m thol s s heree, an sus alleris malatetoung ty nd mimarssin myeayelof f my bungrentind's bee and oulodo oter hendin ndind at
Ifowar

[2025-07-23 17:17:54,112942][I][wordplay/trainer:762] Saving checkpoint to: /content

[2025-07-23 17:17:54,116645][I][wordplay/trainer:763] Saving model to: /content/model.pth

[2025-07-23 17:17:54,560498][I][wordplay/configs:141] Appending /content to /content/wordplay/src/ckpts/checkpoints.log

[2025-07-23 17:17:58,382321][I][wordplay/trainer:894] step=610 loss=2.40125 dt=0.380516 dtf=0.00768699 dtb=0.0103504 sps=2.62801 sps_per_gpu=2.62801 tps=21528.7 tps_per_gpu=21528.7 mfu=0.63143

[2025-07-23 17:18:02,217685][I][wordplay/trainer:894] step=620 loss=2.38897 dt=0.382149 dtf=0.00761661 dtb=0.00966352 sps=2.61678 sps_per_gpu=2.61678 tps=21436.7 tps_per_gpu=21436.7 mfu=0.631629

[2025-07-23 17:18:06,047977][I][wordplay/trainer:894] step=630 loss=2.38868 dt=0.378137 dtf=0.00969834 dtb=0.0128937 sps=2.64454 sps_per_gpu=2.64454 tps=21664.1 tps_per_gpu=21664.1 mfu=0.632481

[2025-07-23 17:18:09,883308][I][wordplay/trainer:894] step=640 loss=2.4127 dt=0.382373 dtf=0.00796208 dtb=0.0101229 sps=2.61525 sps_per_gpu=2.61525 tps=21424.1 tps_per_gpu=21424.1 mfu=0.632539

[2025-07-23 17:18:13,722090][I][wordplay/trainer:894] step=650 loss=2.41445 dt=0.385077 dtf=0.00783048 dtb=0.0110297 sps=2.59688 sps_per_gpu=2.59688 tps=21273.7 tps_per_gpu=21273.7 mfu=0.632146

[2025-07-23 17:18:17,557001][I][wordplay/trainer:894] step=660 loss=2.38916 dt=0.397191 dtf=0.0126378 dtb=0.0280523 sps=2.51768 sps_per_gpu=2.51768 tps=20624.8 tps_per_gpu=20624.8 mfu=0.629875

[2025-07-23 17:18:21,395377][I][wordplay/trainer:894] step=670 loss=2.40125 dt=0.37982 dtf=0.00799165 dtb=0.0102509 sps=2.63282 sps_per_gpu=2.63282 tps=21568.1 tps_per_gpu=21568.1 mfu=0.630619

[2025-07-23 17:18:25,229485][I][wordplay/trainer:894] step=680 loss=2.36815 dt=0.367467 dtf=0.00798743 dtb=0.0101859 sps=2.72133 sps_per_gpu=2.72133 tps=22293.2 tps_per_gpu=22293.2 mfu=0.633431

[2025-07-23 17:18:29,069577][I][wordplay/trainer:894] step=690 loss=2.40319 dt=0.379338 dtf=0.00789747 dtb=0.0107017 sps=2.63617 sps_per_gpu=2.63617 tps=21595.5 tps_per_gpu=21595.5 mfu=0.6339

[2025-07-23 17:18:32,902179][I][wordplay/trainer:894] step=700 loss=2.4019 dt=0.382542 dtf=0.00746426 dtb=0.0101071 sps=2.61409 sps_per_gpu=2.61409 tps=21414.6 tps_per_gpu=21414.6 mfu=0.633787

[2025-07-23 17:18:33,732336][I][wordplay/trainer:827] ['prompt']: 'What is an LLM?'

[2025-07-23 17:18:33,734156][I][wordplay/trainer:831] ['response']:

What is an LLM?

Thile than bat ton dor nong mur,
NO belll lit lop gereing ichth ts heas fopoo l s fowis the

Wofores pis wiceris chithith d concofabththesthis t me t t of sis meagoury.

ARO:
Whe my m bo ar f s yourel s f ther thindusolofe s m s le iserangofothin thesith

[2025-07-23 17:19:31,494673][I][wordplay/trainer:762] Saving checkpoint to: /content

[2025-07-23 17:19:31,499399][I][wordplay/trainer:763] Saving model to: /content/model.pth

[2025-07-23 17:19:31,956960][I][wordplay/configs:141] Appending /content to /content/wordplay/src/ckpts/checkpoints.log

[2025-07-23 17:19:35,780584][I][wordplay/trainer:894] step=710 loss=2.41346 dt=0.378216 dtf=0.00835439 dtb=0.0104115 sps=2.64399 sps_per_gpu=2.64399 tps=21659.6 tps_per_gpu=21659.6 mfu=0.63441

[2025-07-23 17:19:39,611784][I][wordplay/trainer:894] step=720 loss=2.39009 dt=0.383217 dtf=0.00772173 dtb=0.010444 sps=2.60949 sps_per_gpu=2.60949 tps=21376.9 tps_per_gpu=21376.9 mfu=0.634135

[2025-07-23 17:19:43,450301][I][wordplay/trainer:894] step=730 loss=2.38395 dt=0.38477 dtf=0.0103028 dtb=0.0132564 sps=2.59896 sps_per_gpu=2.59896 tps=21290.6 tps_per_gpu=21290.6 mfu=0.633633

[2025-07-23 17:19:47,286173][I][wordplay/trainer:894] step=740 loss=2.35507 dt=0.382978 dtf=0.00775962 dtb=0.00999175 sps=2.61112 sps_per_gpu=2.61112 tps=21390.3 tps_per_gpu=21390.3 mfu=0.633475

[2025-07-23 17:19:51,122311][I][wordplay/trainer:894] step=750 loss=2.34116 dt=0.385881 dtf=0.00818335 dtb=0.0122375 sps=2.59147 sps_per_gpu=2.59147 tps=21229.3 tps_per_gpu=21229.3 mfu=0.632858

[2025-07-23 17:19:54,958706][I][wordplay/trainer:894] step=760 loss=2.35229 dt=0.395003 dtf=0.0133316 dtb=0.0176366 sps=2.53163 sps_per_gpu=2.53163 tps=20739.1 tps_per_gpu=20739.1 mfu=0.630854

[2025-07-23 17:19:58,793260][I][wordplay/trainer:894] step=770 loss=2.34521 dt=0.381653 dtf=0.00799117 dtb=0.0100162 sps=2.62018 sps_per_gpu=2.62018 tps=21464.5 tps_per_gpu=21464.5 mfu=0.631194

[2025-07-23 17:20:02,627603][I][wordplay/trainer:894] step=780 loss=2.31829 dt=0.384113 dtf=0.00808119 dtb=0.0106393 sps=2.6034 sps_per_gpu=2.6034 tps=21327.1 tps_per_gpu=21327.1 mfu=0.631093

[2025-07-23 17:20:06,463581][I][wordplay/trainer:894] step=790 loss=2.31021 dt=0.383535 dtf=0.00812252 dtb=0.0103508 sps=2.60732 sps_per_gpu=2.60732 tps=21359.2 tps_per_gpu=21359.2 mfu=0.631098

[2025-07-23 17:20:10,293805][I][wordplay/trainer:894] step=800 loss=2.30534 dt=0.376394 dtf=0.00790557 dtb=0.0103412 sps=2.65679 sps_per_gpu=2.65679 tps=21764.4 tps_per_gpu=21764.4 mfu=0.632299

[2025-07-23 17:20:11,127431][I][wordplay/trainer:827] ['prompt']: 'What is an LLM?'

[2025-07-23 17:20:11,129261][I][wordplay/trainer:831] ['response']:

What is an LLM?

HESSTY OMy MONN:
The as a thestop skin cof or we or bines best busplo cothe.

FORCAMPHY:
ANaracapat there t cathe dyou toraron

And ndinis aca t t dis tir.


STRENIO:
No ano or, where my sint stthe bllos t ho sow the the,
Tise sigan t.

YCLES:
Matacou f

[2025-07-23 17:21:08,807775][I][wordplay/trainer:762] Saving checkpoint to: /content

[2025-07-23 17:21:08,810267][I][wordplay/trainer:763] Saving model to: /content/model.pth

[2025-07-23 17:21:09,289617][I][wordplay/configs:141] Appending /content to /content/wordplay/src/ckpts/checkpoints.log

[2025-07-23 17:21:13,120320][I][wordplay/trainer:894] step=810 loss=2.31587 dt=0.381621 dtf=0.00781348 dtb=0.0120922 sps=2.6204 sps_per_gpu=2.6204 tps=21466.3 tps_per_gpu=21466.3 mfu=0.6325

[2025-07-23 17:21:16,950119][I][wordplay/trainer:894] step=820 loss=2.32552 dt=0.378177 dtf=0.00779952 dtb=0.0102359 sps=2.64426 sps_per_gpu=2.64426 tps=21661.8 tps_per_gpu=21661.8 mfu=0.633258

[2025-07-23 17:21:20,780635][I][wordplay/trainer:894] step=830 loss=2.27354 dt=0.387149 dtf=0.0106936 dtb=0.0140346 sps=2.58298 sps_per_gpu=2.58298 tps=21159.8 tps_per_gpu=21159.8 mfu=0.632457

[2025-07-23 17:21:24,610506][I][wordplay/trainer:894] step=840 loss=2.26241 dt=0.383837 dtf=0.00787966 dtb=0.0108706 sps=2.60527 sps_per_gpu=2.60527 tps=21342.4 tps_per_gpu=21342.4 mfu=0.632275

[2025-07-23 17:21:28,446417][I][wordplay/trainer:894] step=850 loss=2.26027 dt=0.383713 dtf=0.00800034 dtb=0.0100456 sps=2.60611 sps_per_gpu=2.60611 tps=21349.3 tps_per_gpu=21349.3 mfu=0.632132

[2025-07-23 17:21:32,273517][I][wordplay/trainer:894] step=860 loss=2.25673 dt=0.382741 dtf=0.0083715 dtb=0.0101342 sps=2.61273 sps_per_gpu=2.61273 tps=21403.5 tps_per_gpu=21403.5 mfu=0.632164

[2025-07-23 17:21:36,109224][I][wordplay/trainer:894] step=870 loss=2.21383 dt=0.381168 dtf=0.00781913 dtb=0.0098429 sps=2.62351 sps_per_gpu=2.62351 tps=21491.8 tps_per_gpu=21491.8 mfu=0.632453

[2025-07-23 17:21:39,941412][I][wordplay/trainer:894] step=880 loss=2.21413 dt=0.380526 dtf=0.00772047 dtb=0.00999847 sps=2.62794 sps_per_gpu=2.62794 tps=21528.1 tps_per_gpu=21528.1 mfu=0.632821

[2025-07-23 17:21:43,768114][I][wordplay/trainer:894] step=890 loss=2.21783 dt=0.370921 dtf=0.00774233 dtb=0.0108925 sps=2.69599 sps_per_gpu=2.69599 tps=22085.6 tps_per_gpu=22085.6 mfu=0.634799

[2025-07-23 17:21:47,604118][I][wordplay/trainer:894] step=900 loss=2.20972 dt=0.389311 dtf=0.0136295 dtb=0.0109 sps=2.56864 sps_per_gpu=2.56864 tps=21042.3 tps_per_gpu=21042.3 mfu=0.633497

[2025-07-23 17:21:48,462679][I][wordplay/trainer:827] ['prompt']: 'What is an LLM?'

[2025-07-23 17:21:48,464365][I][wordplay/trainer:831] ['response']:

What is an LLM?

DURENCK:
Me so my nou, hou ward thes ler noms he he,
Oxt my the my de is by beperd.

HARY ORK:
Whe tho su win th ars at herd pedis.

KING RICHARD II:
That we sco arre,
Thade so frener sheran may or tot tremedonght oness.
GLUCER:
He le inest soul mok, son

[2025-07-23 17:22:46,157466][I][wordplay/trainer:762] Saving checkpoint to: /content

[2025-07-23 17:22:46,159878][I][wordplay/trainer:763] Saving model to: /content/model.pth

[2025-07-23 17:22:46,625786][I][wordplay/configs:141] Appending /content to /content/wordplay/src/ckpts/checkpoints.log

[2025-07-23 17:22:50,449927][I][wordplay/trainer:894] step=910 loss=2.17491 dt=0.379023 dtf=0.00754135 dtb=0.0100832 sps=2.63836 sps_per_gpu=2.63836 tps=21613.5 tps_per_gpu=21613.5 mfu=0.634012

[2025-07-23 17:22:54,284268][I][wordplay/trainer:894] step=920 loss=2.1536 dt=0.383239 dtf=0.0075398 dtb=0.00996356 sps=2.60934 sps_per_gpu=2.60934 tps=21375.7 tps_per_gpu=21375.7 mfu=0.633773

[2025-07-23 17:22:58,116915][I][wordplay/trainer:894] step=930 loss=2.15065 dt=0.381936 dtf=0.00785014 dtb=0.0114434 sps=2.61824 sps_per_gpu=2.61824 tps=21448.6 tps_per_gpu=21448.6 mfu=0.633774

[2025-07-23 17:23:01,953658][I][wordplay/trainer:894] step=940 loss=2.12782 dt=0.38311 dtf=0.00824185 dtb=0.0105607 sps=2.61022 sps_per_gpu=2.61022 tps=21382.9 tps_per_gpu=21382.9 mfu=0.633581

[2025-07-23 17:23:05,787479][I][wordplay/trainer:894] step=950 loss=2.18616 dt=0.38379 dtf=0.00788715 dtb=0.0103477 sps=2.60559 sps_per_gpu=2.60559 tps=21345 tps_per_gpu=21345 mfu=0.633295

[2025-07-23 17:23:09,621436][I][wordplay/trainer:894] step=960 loss=2.11422 dt=0.384061 dtf=0.00771515 dtb=0.00979936 sps=2.60376 sps_per_gpu=2.60376 tps=21330 tps_per_gpu=21330 mfu=0.632993

[2025-07-23 17:23:13,455949][I][wordplay/trainer:894] step=970 loss=2.05699 dt=0.383695 dtf=0.00807108 dtb=0.0107169 sps=2.60624 sps_per_gpu=2.60624 tps=21350.3 tps_per_gpu=21350.3 mfu=0.632781

[2025-07-23 17:23:17,284032][I][wordplay/trainer:894] step=980 loss=2.15509 dt=0.376189 dtf=0.00803431 dtb=0.0109163 sps=2.65824 sps_per_gpu=2.65824 tps=21776.3 tps_per_gpu=21776.3 mfu=0.633849

[2025-07-23 17:23:21,114368][I][wordplay/trainer:894] step=990 loss=2.1031 dt=0.393959 dtf=0.0123796 dtb=0.0165355 sps=2.53833 sps_per_gpu=2.53833 tps=20794 tps_per_gpu=20794 mfu=0.631908

[2025-07-23 17:23:24,949152][I][wordplay/trainer:894] step=1000 loss=2.05209 dt=0.371632 dtf=0.00834119 dtb=0.0110242 sps=2.69083 sps_per_gpu=2.69083 tps=22043.3 tps_per_gpu=22043.3 mfu=0.633853

[2025-07-23 17:23:25,760378][I][wordplay/trainer:827] ['prompt']: 'What is an LLM?'

[2025-07-23 17:23:25,762149][I][wordplay/trainer:831] ['response']:

What is an LLM?


WAMILLY:
And I tucke thimbok have doorcent mone,
Wavert mus of me the han hat the deant.
DEORK:
Far thall is coors sited not de ind,
But theat to ad coftitest fort sthengers,
They my thous sor was to yourte mee.
TARK:
I leer, men you, wit the by the the

[2025-07-23 17:24:23,438968][I][wordplay/trainer:762] Saving checkpoint to: /content

[2025-07-23 17:24:23,441364][I][wordplay/trainer:763] Saving model to: /content/model.pth

[2025-07-23 17:24:23,918157][I][wordplay/configs:141] Appending /content to /content/wordplay/src/ckpts/checkpoints.log

[2025-07-23 17:24:27,738089][I][wordplay/trainer:894] step=1010 loss=2.07161 dt=0.367348 dtf=0.00773713 dtb=0.0102199 sps=2.72222 sps_per_gpu=2.72222 tps=22300.4 tps_per_gpu=22300.4 mfu=0.636362

[2025-07-23 17:24:31,569441][I][wordplay/trainer:894] step=1020 loss=2.04552 dt=0.383316 dtf=0.00751949 dtb=0.01008 sps=2.60881 sps_per_gpu=2.60881 tps=21371.4 tps_per_gpu=21371.4 mfu=0.635876

[2025-07-23 17:24:35,396411][I][wordplay/trainer:894] step=1030 loss=2.03231 dt=0.384257 dtf=0.00816572 dtb=0.0102516 sps=2.60243 sps_per_gpu=2.60243 tps=21319.1 tps_per_gpu=21319.1 mfu=0.635284

[2025-07-23 17:24:39,228505][I][wordplay/trainer:894] step=1040 loss=2.05762 dt=0.383646 dtf=0.00790242 dtb=0.00997257 sps=2.60657 sps_per_gpu=2.60657 tps=21353 tps_per_gpu=21353 mfu=0.634851

[2025-07-23 17:24:43,061324][I][wordplay/trainer:894] step=1050 loss=2.03493 dt=0.378067 dtf=0.00783631 dtb=0.00984342 sps=2.64504 sps_per_gpu=2.64504 tps=21668.1 tps_per_gpu=21668.1 mfu=0.635392

[2025-07-23 17:24:46,898059][I][wordplay/trainer:894] step=1060 loss=1.99328 dt=0.383855 dtf=0.00812065 dtb=0.0102131 sps=2.60515 sps_per_gpu=2.60515 tps=21341.4 tps_per_gpu=21341.4 mfu=0.634914

[2025-07-23 17:24:50,734315][I][wordplay/trainer:894] step=1070 loss=2.02538 dt=0.38352 dtf=0.00975553 dtb=0.00995462 sps=2.60743 sps_per_gpu=2.60743 tps=21360.1 tps_per_gpu=21360.1 mfu=0.634539

[2025-07-23 17:24:54,571713][I][wordplay/trainer:894] step=1080 loss=1.98803 dt=0.383255 dtf=0.00790832 dtb=0.0101534 sps=2.60923 sps_per_gpu=2.60923 tps=21374.8 tps_per_gpu=21374.8 mfu=0.634245

[2025-07-23 17:24:58,396586][I][wordplay/trainer:894] step=1090 loss=2.05368 dt=0.379503 dtf=0.00809327 dtb=0.0106979 sps=2.63503 sps_per_gpu=2.63503 tps=21586.1 tps_per_gpu=21586.1 mfu=0.634605

[2025-07-23 17:25:02,230324][I][wordplay/trainer:894] step=1100 loss=1.99345 dt=0.386284 dtf=0.0115638 dtb=0.0162085 sps=2.58877 sps_per_gpu=2.58877 tps=21207.2 tps_per_gpu=21207.2 mfu=0.633809

[2025-07-23 17:25:03,086185][I][wordplay/trainer:827] ['prompt']: 'What is an LLM?'

[2025-07-23 17:25:03,088005][I][wordplay/trainer:831] ['response']:

What is an LLM? Godeel we ye the live courerd, mare you the sill:
This bent the do we shre yeat pert
So but yerter the him theely?

KING EDWARD IV:
Yis past whis to is witer gor miny,
To the corts a have could heret
This the the deears, so your cers tee a be.

CLESTER:
M

[2025-07-23 17:26:00,773544][I][wordplay/trainer:762] Saving checkpoint to: /content

[2025-07-23 17:26:00,776016][I][wordplay/trainer:763] Saving model to: /content/model.pth

[2025-07-23 17:26:01,284390][I][wordplay/configs:141] Appending /content to /content/wordplay/src/ckpts/checkpoints.log

[2025-07-23 17:26:05,114000][I][wordplay/trainer:894] step=1110 loss=1.95124 dt=0.377492 dtf=0.00786764 dtb=0.0100368 sps=2.64906 sps_per_gpu=2.64906 tps=21701.1 tps_per_gpu=21701.1 mfu=0.634553

[2025-07-23 17:26:08,948083][I][wordplay/trainer:894] step=1120 loss=1.98738 dt=0.381927 dtf=0.00748538 dtb=0.00989547 sps=2.6183 sps_per_gpu=2.6183 tps=21449.1 tps_per_gpu=21449.1 mfu=0.634477

[2025-07-23 17:26:12,776837][I][wordplay/trainer:894] step=1130 loss=1.89314 dt=0.374098 dtf=0.008244 dtb=0.0108253 sps=2.67309 sps_per_gpu=2.67309 tps=21898 tps_per_gpu=21898 mfu=0.635735

[2025-07-23 17:26:16,611706][I][wordplay/trainer:894] step=1140 loss=1.92855 dt=0.393585 dtf=0.0130297 dtb=0.014002 sps=2.54075 sps_per_gpu=2.54075 tps=20813.8 tps_per_gpu=20813.8 mfu=0.633664

[2025-07-23 17:26:20,447771][I][wordplay/trainer:894] step=1150 loss=1.83626 dt=0.384681 dtf=0.00807674 dtb=0.0107471 sps=2.59955 sps_per_gpu=2.59955 tps=21295.6 tps_per_gpu=21295.6 mfu=0.633223

[2025-07-23 17:26:24,282285][I][wordplay/trainer:894] step=1160 loss=1.90146 dt=0.383857 dtf=0.0082585 dtb=0.0104392 sps=2.60514 sps_per_gpu=2.60514 tps=21341.3 tps_per_gpu=21341.3 mfu=0.632962

[2025-07-23 17:26:28,116716][I][wordplay/trainer:894] step=1170 loss=1.88228 dt=0.382931 dtf=0.00886622 dtb=0.0120676 sps=2.61144 sps_per_gpu=2.61144 tps=21392.9 tps_per_gpu=21392.9 mfu=0.632879

[2025-07-23 17:26:31,951239][I][wordplay/trainer:894] step=1180 loss=1.88628 dt=0.381804 dtf=0.00750252 dtb=0.0100093 sps=2.61914 sps_per_gpu=2.61914 tps=21456 tps_per_gpu=21456 mfu=0.632991

[2025-07-23 17:26:35,788827][I][wordplay/trainer:894] step=1190 loss=1.91094 dt=0.383424 dtf=0.00741282 dtb=0.0106785 sps=2.60808 sps_per_gpu=2.60808 tps=21365.4 tps_per_gpu=21365.4 mfu=0.632824

[2025-07-23 17:26:39,625719][I][wordplay/trainer:894] step=1200 loss=1.90239 dt=0.388221 dtf=0.015037 dtb=0.01746 sps=2.57585 sps_per_gpu=2.57585 tps=21101.4 tps_per_gpu=21101.4 mfu=0.631894

[2025-07-23 17:26:40,480918][I][wordplay/trainer:827] ['prompt']: 'What is an LLM?'

[2025-07-23 17:26:40,482683][I][wordplay/trainer:831] ['response']:

What is an LLM?

LADY GLOUCESTES:
And when there to my liker of mady the:
It will the shall contre fature he
the day thengery'd one died me meanty:
Why, which ime dished your wind the oblod thus hemes,
I the conte the caition, fortuse whiches faings,
I her far will there

[2025-07-23 17:27:38,165528][I][wordplay/trainer:762] Saving checkpoint to: /content

[2025-07-23 17:27:38,167913][I][wordplay/trainer:763] Saving model to: /content/model.pth

[2025-07-23 17:27:38,675275][I][wordplay/configs:141] Appending /content to /content/wordplay/src/ckpts/checkpoints.log

[2025-07-23 17:27:42,506973][I][wordplay/trainer:894] step=1210 loss=1.83644 dt=0.379808 dtf=0.00813016 dtb=0.0106532 sps=2.63291 sps_per_gpu=2.63291 tps=21568.8 tps_per_gpu=21568.8 mfu=0.632438

[2025-07-23 17:27:46,341248][I][wordplay/trainer:894] step=1220 loss=1.85 dt=0.384616 dtf=0.00783149 dtb=0.0105751 sps=2.6  sps_per_gpu=2.6  tps=21299.2 tps_per_gpu=21299.2 mfu=0.63213

[2025-07-23 17:27:50,174132][I][wordplay/trainer:894] step=1230 loss=1.85794 dt=0.384468 dtf=0.00796023 dtb=0.0101979 sps=2.60099 sps_per_gpu=2.60099 tps=21307.3 tps_per_gpu=21307.3 mfu=0.631878

[2025-07-23 17:27:53,996352][I][wordplay/trainer:894] step=1240 loss=1.86443 dt=0.381407 dtf=0.00777995 dtb=0.00996514 sps=2.62187 sps_per_gpu=2.62187 tps=21478.3 tps_per_gpu=21478.3 mfu=0.632156

[2025-07-23 17:27:57,829111][I][wordplay/trainer:894] step=1250 loss=1.76382 dt=0.382476 dtf=0.00785835 dtb=0.0100383 sps=2.61454 sps_per_gpu=2.61454 tps=21418.3 tps_per_gpu=21418.3 mfu=0.632229

[2025-07-23 17:28:01,663291][I][wordplay/trainer:894] step=1260 loss=1.74205 dt=0.385531 dtf=0.00776372 dtb=0.0138436 sps=2.59382 sps_per_gpu=2.59382 tps=21248.6 tps_per_gpu=21248.6 mfu=0.631793

[2025-07-23 17:28:05,497559][I][wordplay/trainer:894] step=1270 loss=1.86381 dt=0.395746 dtf=0.0125432 dtb=0.0178912 sps=2.52688 sps_per_gpu=2.52688 tps=20700.2 tps_per_gpu=20700.2 mfu=0.62978

[2025-07-23 17:28:09,331924][I][wordplay/trainer:894] step=1280 loss=1.85107 dt=0.382921 dtf=0.0081101 dtb=0.00997405 sps=2.61151 sps_per_gpu=2.61151 tps=21393.5 tps_per_gpu=21393.5 mfu=0.630017

[2025-07-23 17:28:13,161160][I][wordplay/trainer:894] step=1290 loss=1.84071 dt=0.382439 dtf=0.00762057 dtb=0.0106278 sps=2.6148 sps_per_gpu=2.6148 tps=21420.4 tps_per_gpu=21420.4 mfu=0.630311

[2025-07-23 17:28:16,996729][I][wordplay/trainer:894] step=1300 loss=1.82688 dt=0.383368 dtf=0.0123784 dtb=0.0184451 sps=2.60846 sps_per_gpu=2.60846 tps=21368.5 tps_per_gpu=21368.5 mfu=0.630421

[2025-07-23 17:28:17,833682][I][wordplay/trainer:827] ['prompt']: 'What is an LLM?'

[2025-07-23 17:28:17,835402][I][wordplay/trainer:831] ['response']:

What is an LLM?

Good my RICHARD III:
He you will distent, I may
Is like pret to fort,
To some that fold my part they lok.
A farther's to consonce which sater,
And fater and him in the shall it them do her this,
The a my navin his more the with of haver,
But me and the a

[2025-07-23 17:29:15,475172][I][wordplay/trainer:762] Saving checkpoint to: /content

[2025-07-23 17:29:15,477113][I][wordplay/trainer:763] Saving model to: /content/model.pth

[2025-07-23 17:29:15,945199][I][wordplay/configs:141] Appending /content to /content/wordplay/src/ckpts/checkpoints.log

[2025-07-23 17:29:19,772665][I][wordplay/trainer:894] step=1310 loss=1.83877 dt=0.380515 dtf=0.00832942 dtb=0.00994461 sps=2.62802 sps_per_gpu=2.62802 tps=21528.7 tps_per_gpu=21528.7 mfu=0.630994

[2025-07-23 17:29:23,597414][I][wordplay/trainer:894] step=1320 loss=1.79997 dt=0.380789 dtf=0.00753653 dtb=0.0100344 sps=2.62613 sps_per_gpu=2.62613 tps=21513.2 tps_per_gpu=21513.2 mfu=0.631463

[2025-07-23 17:29:27,425373][I][wordplay/trainer:894] step=1330 loss=1.84227 dt=0.383599 dtf=0.00811679 dtb=0.0102277 sps=2.60689 sps_per_gpu=2.60689 tps=21355.6 tps_per_gpu=21355.6 mfu=0.63142

[2025-07-23 17:29:31,259289][I][wordplay/trainer:894] step=1340 loss=1.77032 dt=0.381153 dtf=0.00731168 dtb=0.00972694 sps=2.62362 sps_per_gpu=2.62362 tps=21492.7 tps_per_gpu=21492.7 mfu=0.631787

[2025-07-23 17:29:35,088601][I][wordplay/trainer:894] step=1350 loss=1.8076 dt=0.384321 dtf=0.00808188 dtb=0.0116733 sps=2.60199 sps_per_gpu=2.60199 tps=21315.5 tps_per_gpu=21315.5 mfu=0.631593

[2025-07-23 17:29:38,914972][I][wordplay/trainer:894] step=1360 loss=1.79383 dt=0.383019 dtf=0.00830957 dtb=0.0104623 sps=2.61084 sps_per_gpu=2.61084 tps=21388 tps_per_gpu=21388 mfu=0.631632

[2025-07-23 17:29:42,746913][I][wordplay/trainer:894] step=1370 loss=1.73757 dt=0.377326 dtf=0.009339 dtb=0.0118509 sps=2.65023 sps_per_gpu=2.65023 tps=21710.7 tps_per_gpu=21710.7 mfu=0.632622

[2025-07-23 17:29:46,582929][I][wordplay/trainer:894] step=1380 loss=1.74524 dt=0.373365 dtf=0.00773357 dtb=0.0100906 sps=2.67835 sps_per_gpu=2.67835 tps=21941 tps_per_gpu=21941 mfu=0.634193

[2025-07-23 17:29:50,410901][I][wordplay/trainer:894] step=1390 loss=1.75995 dt=0.382166 dtf=0.00797486 dtb=0.0104627 sps=2.61667 sps_per_gpu=2.61667 tps=21435.7 tps_per_gpu=21435.7 mfu=0.634113

[2025-07-23 17:29:54,241756][I][wordplay/trainer:894] step=1400 loss=1.81278 dt=0.391504 dtf=0.0126958 dtb=0.0182819 sps=2.55425 sps_per_gpu=2.55425 tps=20924.4 tps_per_gpu=20924.4 mfu=0.632531

[2025-07-23 17:29:55,175194][I][wordplay/trainer:827] ['prompt']: 'What is an LLM?'

[2025-07-23 17:29:55,177068][I][wordplay/trainer:831] ['response']:

What is an LLM?

ROHUMERS:
Citizen:
The's no worth bold of I heave is the port art.

SICINIUS:
Alay, sir, thou away the perfored,
Belie a hard set the of to your pakial;
Sirt are a a shall in thee.
Yet come, I chould cound thy king will.

BRATUS:
The good is heart thou t

[2025-07-23 17:30:52,849338][I][wordplay/trainer:762] Saving checkpoint to: /content

[2025-07-23 17:30:52,851168][I][wordplay/trainer:763] Saving model to: /content/model.pth

[2025-07-23 17:30:53,184812][I][wordplay/configs:141] Appending /content to /content/wordplay/src/ckpts/checkpoints.log

[2025-07-23 17:30:57,012134][I][wordplay/trainer:894] step=1410 loss=1.79791 dt=0.381525 dtf=0.00881983 dtb=0.0102005 sps=2.62106 sps_per_gpu=2.62106 tps=21471.7 tps_per_gpu=21471.7 mfu=0.632724

[2025-07-23 17:31:00,841188][I][wordplay/trainer:894] step=1420 loss=1.74375 dt=0.381039 dtf=0.00761951 dtb=0.0101972 sps=2.6244 sps_per_gpu=2.6244 tps=21499.1 tps_per_gpu=21499.1 mfu=0.632979

[2025-07-23 17:31:04,675786][I][wordplay/trainer:894] step=1430 loss=1.73401 dt=0.388151 dtf=0.00959453 dtb=0.0123491 sps=2.57631 sps_per_gpu=2.57631 tps=21105.2 tps_per_gpu=21105.2 mfu=0.632045

[2025-07-23 17:31:08,511906][I][wordplay/trainer:894] step=1440 loss=1.72673 dt=0.380442 dtf=0.00765078 dtb=0.00993138 sps=2.62852 sps_per_gpu=2.62852 tps=21532.8 tps_per_gpu=21532.8 mfu=0.632467

[2025-07-23 17:31:12,350823][I][wordplay/trainer:894] step=1450 loss=1.75055 dt=0.384587 dtf=0.00793686 dtb=0.0107903 sps=2.60019 sps_per_gpu=2.60019 tps=21300.8 tps_per_gpu=21300.8 mfu=0.632162

[2025-07-23 17:31:16,189335][I][wordplay/trainer:894] step=1460 loss=1.68073 dt=0.381957 dtf=0.00771424 dtb=0.00991214 sps=2.6181 sps_per_gpu=2.6181 tps=21447.4 tps_per_gpu=21447.4 mfu=0.63232

[2025-07-23 17:31:20,023731][I][wordplay/trainer:894] step=1470 loss=1.71749 dt=0.389038 dtf=0.0123934 dtb=0.016246 sps=2.57045 sps_per_gpu=2.57045 tps=21057.1 tps_per_gpu=21057.1 mfu=0.631309

[2025-07-23 17:31:23,858642][I][wordplay/trainer:894] step=1480 loss=1.72494 dt=0.380766 dtf=0.00802833 dtb=0.0109163 sps=2.62629 sps_per_gpu=2.62629 tps=21514.5 tps_per_gpu=21514.5 mfu=0.631751

[2025-07-23 17:31:27,693442][I][wordplay/trainer:894] step=1490 loss=1.72521 dt=0.384513 dtf=0.00979085 dtb=0.0104102 sps=2.60069 sps_per_gpu=2.60069 tps=21304.9 tps_per_gpu=21304.9 mfu=0.631529

[2025-07-23 17:31:31,528345][I][wordplay/trainer:894] step=1500 loss=1.70409 dt=0.385203 dtf=0.0109562 dtb=0.0163935 sps=2.59604 sps_per_gpu=2.59604 tps=21266.7 tps_per_gpu=21266.7 mfu=0.631217

Evaluate Model

import time

query = "What is an LLM?"
t0 = time.perf_counter()
outputs = trainer.evaluate(
    query,
    num_samples=1,
    max_new_tokens=256,
    top_k=2,
    display=False
)
logger.info(f'took: {time.perf_counter() - t0:.4f}s')
logger.info(f"['prompt']: '{query}'")
logger.info("['response']:\n\n" + fr"{outputs['0']['raw']}")

[2025-07-23 17:31:32,597792][I][tmp/ipython-input-12-582817405:12:ezpz.log] took: 0.9968s

[2025-07-23 17:31:32,599918][I][tmp/ipython-input-12-582817405:13:ezpz.log] ['prompt']: 'What is an LLM?'

[2025-07-23 17:31:32,601844][I][tmp/ipython-input-12-582817405:14:ezpz.log] ['response']:

What is an LLM? What, that the wild my lord,
And the shal to may so shal that the shall thee.

RICHARD:
What that there thee shal the const the shall so thine.

RICHARD:
The wil thee the shal shal that that the should.

RICHARD:
Then the shal too the show shal to thee.

Citation

BibTeX citation:

@online{foreman2025,
  author = {Foreman, Sam},
  title = {{[}`Wordplay` 🎮 💬{]}(Https://Github.com/Saforem2/Wordplay):
    {Shakespeare}},
  date = {2025-07-22},
  url = {https://saforem2.github.io/hpc-bootcamp-2025/02-llms/08-shakespeare-example-colab/},
  langid = {en}
}

For attribution, please cite this work as:

Foreman, Sam. 2025. “[`Wordplay` 🎮 💬](Https://Github.com/Saforem2/Wordplay): Shakespeare.” July 22, 2025. https://saforem2.github.io/hpc-bootcamp-2025/02-llms/08-shakespeare-example-colab/.