`wordplay` 🎮 💬: Shakespeare ✍️

Author

Affiliation

Sam Foreman

ALCF

Published

July 22, 2025

Modified

July 27, 2025

We will be using the Shakespeare dataset to train a (~ small) 10M param LLM from scratch.

Image generated from stabilityai/stable-diffusion on 🤗 Spaces.

Prompt Details

Prompt:

Shakespeare himself, dressed in full Shakespearean garb, writing code at a modern workstation with multiple monitors, hacking away profusely, backlit, high quality for publication

Negative Prompt:

low quality, 3d, photorealistic, ugly

Install / Setup

Warning!

IF YOU ARE EXECUTING ON GOOGLE COLAB:

You will need to restart your runtime (Runtime \rightarrow\, Restart runtime)
after executing the following cell:

%%bash

python3 -c 'import wordplay; print(wordplay.__file__)' 2> '/dev/null'

if [[ $? -eq 0 ]]; then
    echo "Has wordplay installed. Nothing to do."
else
    echo "Does not have wordplay installed. Installing..."
    git clone 'https://github.com/saforem2/wordplay'
    python3 wordplay/data/shakespeare_char/prepare.py
    python3 wordplay/data/shakespeare/prepare.py
    python3 -m pip install deepspeed
    python3 -m pip install -e wordplay
fi

/Users/samforeman/projects/saforem2/wordplay/src/wordplay/__init__.py
Has wordplay installed. Nothing to do.

Post Install

If installed correctly, you should be able to:

>>> import wordplay
>>> wordplay.__file__
'/path/to/wordplay/src/wordplay/__init__.py'

%load_ext autoreload
%autoreload 2
import os
import sys
import ezpz

os.environ['COLORTERM'] = 'truecolor'
if sys.platform == 'darwin':
    # If running on MacOS:
    # os.environ['PYTORCH_ENABLE_MPS_FALLBACK'] = '1'
    os.environ['TORCH_DEVICE'] = 'cpu'
# -----------------------------------------------

logger = ezpz.get_logger()

import wordplay
logger.info(wordplay.__file__)

[07/27/25 11:24:20] INFO     Setting logging level to 'INFO' on 'RANK == 0'                         __init__.py:265

                    INFO     Setting logging level to 'CRITICAL' on all others 'RANK != 0'          __init__.py:266

[07/27/25 11:24:20] INFO     /Users/samforeman/projects/saforem2/wordplay/src/wordplay/__init__.py 2338663768.py:17

Build Trainer

Explicitly, we:

setup_torch(...)
Build cfg: DictConfig = get_config(...)
Instnatiate config: ExperimentConfig = instantiate(cfg)
Build trainer = Trainer(config)

import os
import numpy as np
from ezpz import setup
from hydra.utils import instantiate
from wordplay.configs import get_config, PROJECT_ROOT
from wordplay.trainer import Trainer

HF_DATASETS_CACHE = PROJECT_ROOT.joinpath('.cache', 'huggingface')
HF_DATASETS_CACHE.mkdir(exist_ok=True, parents=True)

os.environ['HF_DATASETS_CACHE'] = HF_DATASETS_CACHE.as_posix()

BACKEND = 'DDP'

rank = setup(
    framework='pytorch',
    backend=BACKEND,
    seed=1234,
)

cfg = get_config(
    [
        'data=shakespeare',
        'model=shakespeare',
        'model.batch_size=1',
        'model.block_size=128',
        'optimizer=shakespeare',
        'train=shakespeare',
        f'train.backend={BACKEND}',
        'train.compile=false',
        'train.dtype=bfloat16',
        'train.max_iters=500',
        'train.log_interval=10',
        'train.eval_interval=50',
    ]
)
config = instantiate(cfg)

                    INFO     Setting HF_DATASETS_CACHE to                                             configs.py:81
                             /Users/samforeman/projects/saforem2/wordplay/.cache/huggingface/datasets

                    WARNING  Caught TORCH_DEVICE=cpu from environment!                                  dist.py:639

                    WARNING  Caught TORCH_DEVICE=cpu from environment!                                  dist.py:639

                    WARNING  Caught TORCH_DEVICE=cpu from environment!                                  dist.py:639

                    INFO     Using fw='ddp' with torch_{device,backend}= {cpu, gloo}                   dist.py:1159

                    INFO     Caught MASTER_PORT=57747 from environment!                                dist.py:1026

                    INFO     Using torch.distributed.init_process_group with                           dist.py:1042
                             - master_addr='Sams-MacBook-Pro-2.local'                                              
                             - master_port='57747'                                                                 
                             - world_size=1                                                                        
                             - rank=0                                                                              
                             - local_rank=0                                                                        
                             - timeout=datetime.timedelta(seconds=3600)                                            
                             - backend='gloo'

                    INFO     Calling torch.distributed.init_process_group_with: rank=0 world_size=1     dist.py:759
                             backend=gloo

                    WARNING  Caught TORCH_DEVICE=cpu from environment!                                  dist.py:639

                    WARNING  Caught TORCH_DEVICE=cpu from environment!                                  dist.py:639

                    INFO     Using device='cpu' with backend='gloo' + 'gloo' for distributed training. dist.py:1377

                    INFO     ['Sams-MacBook-Pro-2.local'][0/0]                                         dist.py:1422

                    INFO     Loading train from                                                      configs.py:317
                             /Users/samforeman/projects/saforem2/wordplay/data/shakespeare_char/trai               
                             n.bin

                    INFO     Loading val from                                                        configs.py:317
                             /Users/samforeman/projects/saforem2/wordplay/data/shakespeare_char/val.               
                             bin

                    INFO     Tokens per iteration: 128                                               configs.py:442

                    WARNING  Caught TORCH_DEVICE=cpu from environment!                                  dist.py:639

                    INFO     Using self.ptdtype=torch.bfloat16 on self.device_type='cpu'             configs.py:465

                    INFO     Initializing a new model from scratch                                   configs.py:471

Build `Trainer` object

trainer = Trainer(config)

                    INFO     Initializing a new model from scratch                                   trainer.py:235

                    INFO     number of parameters: 10.65M                                              model.py:255

                    INFO     Model size: num_params=10646784                                         trainer.py:252

                    INFO     num decayed parameter tensors: 26, with 10,690,944 parameters             model.py:445

                    INFO     num non-decayed parameter tensors: 13, with 4,992 parameters              model.py:449

                    INFO     using fused AdamW: False                                                  model.py:465

                    CRITICAL "devid='cpu:0'"                                                         trainer.py:308

                    INFO     • self.model=GPT(                                                       trainer.py:347
                               (transformer): ModuleDict(                                                          
                                 (wte): Embedding(65, 384)                                                         
                                 (wpe): Embedding(128, 384)                                                        
                                 (drop): Dropout(p=0.2, inplace=False)                                             
                                 (h): ModuleList(                                                                  
                                   (0-5): 6 x Block(                                                               
                                     (ln_1): LayerNorm()                                                           
                                     (attn): CausalSelfAttention(                                                  
                                       (c_attn): Linear(in_features=384, out_features=1152,                        
                             bias=False)                                                                           
                                       (c_proj): Linear(in_features=384, out_features=384,                         
                             bias=False)                                                                           
                                       (attn_dropout): Dropout(p=0.2, inplace=False)                               
                                       (resid_dropout): Dropout(p=0.2, inplace=False)                              
                                     )                                                                             
                                     (ln_2): LayerNorm()                                                           
                                     (mlp): MLP(                                                                   
                                       (c_fc): Linear(in_features=384, out_features=1536,                          
                             bias=False)                                                                           
                                       (act_fn): GELU(approximate='none')                                          
                                       (c_proj): Linear(in_features=1536, out_features=384,                        
                             bias=False)                                                                           
                                       (dropout): Dropout(p=0.2, inplace=False)                                    
                                     )                                                                             
                                   )                                                                               
                                 )                                                                                 
                                 (ln_f): LayerNorm()                                                               
                               )                                                                                   
                               (lm_head): Linear(in_features=384, out_features=65, bias=False)                     
                             )

                    INFO     • self.grad_scaler=None                                                 trainer.py:348

                    INFO     • self.model_engine=GPT(                                                trainer.py:349
                               (transformer): ModuleDict(                                                          
                                 (wte): Embedding(65, 384)                                                         
                                 (wpe): Embedding(128, 384)                                                        
                                 (drop): Dropout(p=0.2, inplace=False)                                             
                                 (h): ModuleList(                                                                  
                                   (0-5): 6 x Block(                                                               
                                     (ln_1): LayerNorm()                                                           
                                     (attn): CausalSelfAttention(                                                  
                                       (c_attn): Linear(in_features=384, out_features=1152,                        
                             bias=False)                                                                           
                                       (c_proj): Linear(in_features=384, out_features=384,                         
                             bias=False)                                                                           
                                       (attn_dropout): Dropout(p=0.2, inplace=False)                               
                                       (resid_dropout): Dropout(p=0.2, inplace=False)                              
                                     )                                                                             
                                     (ln_2): LayerNorm()                                                           
                                     (mlp): MLP(                                                                   
                                       (c_fc): Linear(in_features=384, out_features=1536,                          
                             bias=False)                                                                           
                                       (act_fn): GELU(approximate='none')                                          
                                       (c_proj): Linear(in_features=1536, out_features=384,                        
                             bias=False)                                                                           
                                       (dropout): Dropout(p=0.2, inplace=False)                                    
                                     )                                                                             
                                   )                                                                               
                                 )                                                                                 
                                 (ln_f): LayerNorm()                                                               
                               )                                                                                   
                               (lm_head): Linear(in_features=384, out_features=65, bias=False)                     
                             )

                    INFO     • self.optimizer=AdamW (                                                trainer.py:350
                             Parameter Group 0                                                                     
                                 amsgrad: False                                                                    
                                 betas: (0.9, 0.99)                                                                
                                 capturable: False                                                                 
                                 decoupled_weight_decay: True                                                      
                                 differentiable: False                                                             
                                 eps: 1e-08                                                                        
                                 foreach: None                                                                     
                                 fused: None                                                                       
                                 lr: 0.001                                                                         
                                 maximize: False                                                                   
                                 weight_decay: 0.1                                                                 
                                                                                                                   
                             Parameter Group 1                                                                     
                                 amsgrad: False                                                                    
                                 betas: (0.9, 0.99)                                                                
                                 capturable: False                                                                 
                                 decoupled_weight_decay: True                                                      
                                 differentiable: False                                                             
                                 eps: 1e-08                                                                        
                                 foreach: None                                                                     
                                 fused: None                                                                       
                                 lr: 0.001                                                                         
                                 maximize: False                                                                   
                                 weight_decay: 0.0                                                                 
                             )

Prompt (prior to training)

query = "What is an LLM?"
outputs = trainer.evaluate(
    query,
    num_samples=1,
    max_new_tokens=256,
    top_k=16,
    display=False
)
logger.info(f"['prompt']: '{query}'")
logger.info("['response']:\n\n" + fr"{outputs['0']['raw']}")

[07/27/25 11:24:22] INFO     ['prompt']: 'What is an LLM?'                                          3496000222.py:9

                    INFO     ['response']:                                                         3496000222.py:10
                                                                                                                   
                             What is an                                                                            
                             LLM?A,,osy'exx.ff.fpppxv;;'vt3QjYhhvvYAhowQwwQ,eqeqG;X.YqqQSZQWLsyccc                 
                             cj:ZhaooxkkcfkZ                                                                       
                             ffop- f,hqWl                                                                          
                             oocpppUqAQ;cc''bQqcWAttrqerrwyqqsrqttqYeqWQs'tottcqestbqbbrpWbWYApppp                 
                             BqfhcqqYqqM?qttqQU'gYe?A..'S'rtppW'fJf;??qn.pwrrrqqfA;!!A,,,AtqqqqbW;                 
                             bSoW;;?;;;qQ;;cIA.'M;''g

Train Model

name	description
`step`	Current training step
`loss`	Loss value
`dt`	Time per step (in ms)
`sps`	Samples per second
`mtps`	(million) Tokens per sec
`mfu`	Model Flops utilization¹

^legend: #tbl-legend

trainer.config.device_type

'cpu'

from rich import print

print(trainer.model)

GPT(
  (transformer): ModuleDict(
    (wte): Embedding(65, 384)
    (wpe): Embedding(128, 384)
    (drop): Dropout(p=0.2, inplace=False)
    (h): ModuleList(
      (0-5): 6 x Block(
        (ln_1): LayerNorm()
        (attn): CausalSelfAttention(
          (c_attn): Linear(in_features=384, out_features=1152, bias=False)
          (c_proj): Linear(in_features=384, out_features=384, bias=False)
          (attn_dropout): Dropout(p=0.2, inplace=False)
          (resid_dropout): Dropout(p=0.2, inplace=False)
        )
        (ln_2): LayerNorm()
        (mlp): MLP(
          (c_fc): Linear(in_features=384, out_features=1536, bias=False)
          (act_fn): GELU(approximate='none')
          (c_proj): Linear(in_features=1536, out_features=384, bias=False)
          (dropout): Dropout(p=0.2, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm()
  )
  (lm_head): Linear(in_features=384, out_features=65, bias=False)
)

(partial) Training:

We’ll first train for 500 iterations and then evaluate the models performance on the same prompt:

What is an LLM?

trainer.train(train_iters=500)

                Training Legend                 
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃        abbr ┃ desc                           ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│        step │ Current training iteration     │
│        loss │ Loss value                     │
│          dt │ Elapsed time per training step │
│         dtf │ Elapsed time per forward step  │
│         dtb │ Elapsed time per backward step │
│         sps │ Samples per second             │
│ sps_per_gpu │ Samples per second (per GPU)   │
│         tps │ Tokens per second              │
│ tps_per_gpu │ Tokens per second (per GPU)    │
│         mfu │ Model flops utilization        │
└─────────────┴────────────────────────────────┘

[07/27/25 11:24:24] INFO     ['prompt']: 'What is an LLM?'                                           trainer.py:790

                    INFO     ['response']:                                                           trainer.py:794
                                                                                                                   
                             What is an                                                                            
                             LLM?wCw'.AAAfxo..'yfAQfppyybvFYerr.MfYZAcLyQQCkkexx-3lllrpMqxkko-rZx3b'               
                             3j-ffSSoqq3hhdf'Q''aq'wqqsoKZb'ec3ZAAA;;o,qff..'fArttgbYtturcbcSYrS-Fff               
                             'wwwerwPgJ;.e;yY-SpuyeexqYqgQtpMSYqYgbtQqq''';pfsw,';oA;qqeqcckSAo,,roo               
                             MgyQha'''fAA..gg;;'ggtSvrupptkeweqqcqqkk-SvYYIv

[07/27/25 11:24:28] INFO     step=10 loss=4.28757 dt=0.0185177 dtf=0.0181899 dtb=0.000141292         trainer.py:850
                             sps=54.0022 sps_per_gpu=54.0022 tps=6912.29 tps_per_gpu=6912.29                       
                             mfu=0.149367

[07/27/25 11:24:29] INFO     step=20 loss=4.28569 dt=0.019256 dtf=0.0186655 dtb=0.000153666          trainer.py:850
                             sps=51.932 sps_per_gpu=51.932 tps=6647.29 tps_per_gpu=6647.29                         
                             mfu=0.148794

                    INFO     step=30 loss=4.19012 dt=0.0191065 dtf=0.018786 dtb=0.000126125          trainer.py:850
                             sps=52.3382 sps_per_gpu=52.3382 tps=6699.29 tps_per_gpu=6699.29                       
                             mfu=0.148391

                    INFO     step=40 loss=4.26634 dt=0.0181073 dtf=0.0177951 dtb=0.00012475          trainer.py:850
                             sps=55.2262 sps_per_gpu=55.2262 tps=7068.96 tps_per_gpu=7068.96                       
                             mfu=0.148827

                    INFO     step=50 loss=4.22804 dt=0.0180129 dtf=0.0176396 dtb=0.0001405           trainer.py:850
                             sps=55.5158 sps_per_gpu=55.5158 tps=7106.03 tps_per_gpu=7106.03                       
                             mfu=0.1493

[07/27/25 11:24:31] INFO     ['prompt']: 'What is an LLM?'                                           trainer.py:790

                    INFO     ['response']:                                                           trainer.py:794
                                                                                                                   
                             What is an LLM?fwxx yY'eyffpCx?ZZZ.eevfeesxqQQYoqapxxxsZ                              
                             vrvb'oZ3qoh33roArW;aafAA''f''QYqAob.aqo.Qyyegg'VcqqYbq3AaFskkcAkfvjb'QQ               
                             tqQfArWA;Qp'k'goWoq;bbrppfQSYy,,,qqqqMsQuAQ'qgoowqqstSpgli-gggggjGG;ctt               
                             SAA.pYYIoMSYu;QQSv;?gjJf'eQQQ;yg'Mgo-b';ccIffQSqAA'rqqcII?;;'ecWWllc;''               
                             ;

[07/27/25 11:24:34] INFO     Saving checkpoint to:                                                   trainer.py:733
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example

                    INFO     Saving model to:                                                        trainer.py:734
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example/model.pth

                    INFO     Appending                                                               configs.py:141
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example to                                                        
                             /Users/samforeman/projects/saforem2/wordplay/src/ckpts/checkpoints.log

                    INFO     step=60 loss=4.20216 dt=0.0180862 dtf=0.0177977 dtb=0.000120792         trainer.py:850
                             sps=55.2906 sps_per_gpu=55.2906 tps=7077.2 tps_per_gpu=7077.2                         
                             mfu=0.149663

[07/27/25 11:24:35] INFO     step=70 loss=4.20029 dt=0.0178861 dtf=0.0175673 dtb=0.000132417         trainer.py:850
                             sps=55.9093 sps_per_gpu=55.9093 tps=7156.39 tps_per_gpu=7156.39                       
                             mfu=0.150161

                    INFO     step=80 loss=4.14463 dt=0.0184771 dtf=0.0181706 dtb=0.000118916         trainer.py:850
                             sps=54.1211 sps_per_gpu=54.1211 tps=6927.5 tps_per_gpu=6927.5                         
                             mfu=0.150114

                    INFO     step=90 loss=4.14377 dt=0.0182619 dtf=0.0179472 dtb=0.000122042         trainer.py:850
                             sps=54.7588 sps_per_gpu=54.7588 tps=7009.12 tps_per_gpu=7009.12                       
                             mfu=0.150249

                    INFO     step=100 loss=4.24105 dt=0.0204619 dtf=0.0201264 dtb=0.000129           trainer.py:850
                             sps=48.8714 sps_per_gpu=48.8714 tps=6255.54 tps_per_gpu=6255.54                       
                             mfu=0.148741

[07/27/25 11:24:37] INFO     ['prompt']: 'What is an LLM?'                                           trainer.py:790

                    INFO     ['response']:                                                           trainer.py:794
                                                                                                                   
                             What is an                                                                            
                             LLM?f'xfAhf.qYEZQyyoo--AA,QQAAstpMfYhjc'c..MAj'FF,a33lx.adbssxvVhfsMwyQ               
                             Yosoooc'hzgSSrq.vZZZcq33Sk                                                            
                             ''vaq.w3AmA'..aYjye'ksr'gbvv,,hqb'eSJJm',rSeqfvrrrW;;bZSS:SqeWtttuYgJvk               
                             oBggSA'wst:Sur'txx'rSSqbb;;Qq-;.MsooowbqqqnSpBqSosgggtoo'e;''kG;'g-bWWo               
                             qetQ''os'q'tptSSSYe;

[07/27/25 11:24:40] INFO     Saving checkpoint to:                                                   trainer.py:733
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example

                    INFO     Saving model to:                                                        trainer.py:734
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example/model.pth

                    INFO     Appending                                                               configs.py:141
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example to                                                        
                             /Users/samforeman/projects/saforem2/wordplay/src/ckpts/checkpoints.log

                    INFO     step=110 loss=4.30091 dt=0.0177751 dtf=0.0174488 dtb=0.000141917        trainer.py:850
                             sps=56.2584 sps_per_gpu=56.2584 tps=7201.07 tps_per_gpu=7201.07                       
                             mfu=0.149428

[07/27/25 11:24:41] INFO     step=120 loss=4.23854 dt=0.018823 dtf=0.0184793 dtb=0.000144208         trainer.py:850
                             sps=53.1265 sps_per_gpu=53.1265 tps=6800.19 tps_per_gpu=6800.19                       
                             mfu=0.149179

                    INFO     step=130 loss=4.21194 dt=0.0217986 dtf=0.0213945 dtb=0.000134209        trainer.py:850
                             sps=45.8745 sps_per_gpu=45.8745 tps=5871.93 tps_per_gpu=5871.93                       
                             mfu=0.14695

                    INFO     step=140 loss=4.30343 dt=0.018252 dtf=0.0179529 dtb=0.000112917         trainer.py:850
                             sps=54.7886 sps_per_gpu=54.7886 tps=7012.95 tps_per_gpu=7012.95                       
                             mfu=0.147409

                    INFO     step=150 loss=4.25562 dt=0.019516 dtf=0.0191642 dtb=0.000136958         trainer.py:850
                             sps=51.2401 sps_per_gpu=51.2401 tps=6558.74 tps_per_gpu=6558.74                       
                             mfu=0.146841

[07/27/25 11:24:43] INFO     ['prompt']: 'What is an LLM?'                                           trainer.py:790

                    INFO     ['response']:                                                           trainer.py:794
                                                                                                                   
                             What is an                                                                            
                             LLM?vXvZoQQoLqQewerA'-''.qqQtXxx'V333jo'gQUoojxttYyfQOCCAASc-sseS                     
                                                                                                                   
                             r.GexS-                                                                               
                             Dv'acQqjpwptxxqqZ!!fqzAAf.v3aag;vYgg'fqY:n;QsrkoBQhbYYQQgoMbZg;;cLf..WS               
                             SJhppMSkggkkkkooqWWQ'';xheuAA;pppcSQQqq;??ZppBkqeQsgb'SpWbrr;.gSbbqq;;f               
                             .t'gIBq;;WtgbW,rWWYAAqttMA''ggQQQnxrrrrh;;!

[07/27/25 11:24:46] INFO     Saving checkpoint to:                                                   trainer.py:733
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example

                    INFO     Saving model to:                                                        trainer.py:734
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example/model.pth

                    INFO     Appending                                                               configs.py:141
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example to                                                        
                             /Users/samforeman/projects/saforem2/wordplay/src/ckpts/checkpoints.log

                    INFO     step=160 loss=4.22457 dt=0.0185443 dtf=0.0182662 dtb=0.000111875        trainer.py:850
                             sps=53.9248 sps_per_gpu=53.9248 tps=6902.38 tps_per_gpu=6902.38                       
                             mfu=0.147072

                    INFO     step=170 loss=4.20268 dt=0.0178489 dtf=0.0175266 dtb=0.000147375        trainer.py:850
                             sps=56.0259 sps_per_gpu=56.0259 tps=7171.32 tps_per_gpu=7171.32                       
                             mfu=0.147861

[07/27/25 11:24:47] INFO     step=180 loss=4.23688 dt=0.0191321 dtf=0.0187527 dtb=0.000175709        trainer.py:850
                             sps=52.2681 sps_per_gpu=52.2681 tps=6690.32 tps_per_gpu=6690.32                       
                             mfu=0.147532

                    INFO     step=190 loss=4.28941 dt=0.0229258 dtf=0.0225994 dtb=0.00012675         trainer.py:850
                             sps=43.6189 sps_per_gpu=43.6189 tps=5583.22 tps_per_gpu=5583.22                       
                             mfu=0.144844

                    INFO     step=200 loss=4.25317 dt=0.0195566 dtf=0.019196 dtb=0.000148084         trainer.py:850
                             sps=51.1336 sps_per_gpu=51.1336 tps=6545.1 tps_per_gpu=6545.1                         
                             mfu=0.144503

[07/27/25 11:24:49] INFO     ['prompt']: 'What is an LLM?'                                           trainer.py:790

                    INFO     ['response']:                                                           trainer.py:794
                                                                                                                   
                             What is an                                                                            
                             LLM?qervyyf.af3VAAowAoooooBQecAAqrxxxtXptxGQUVVcNYhhhck;;ooc'DaVqLZZZcP               
                             '''GGl..ooosZppV!333QqYYfQSYUUoofkm.tpcq'e''3esseeqqe;;!f'sx'MBfQttopp,               
                             qccQn3tgQSk-sffQnpSoo'gYpqqQn';qqecAAS'?AAASYf';pMt??pSSpptSbbYj-tWWYQY               
                             ?gYIfkqg.nn'gqqc'gtqqtS??A'tu?MBBp???qq;;??A,,,

[07/27/25 11:24:52] INFO     Saving checkpoint to:                                                   trainer.py:733
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example

                    INFO     Saving model to:                                                        trainer.py:734
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example/model.pth

                    INFO     Appending                                                               configs.py:141
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example to                                                        
                             /Users/samforeman/projects/saforem2/wordplay/src/ckpts/checkpoints.log

                    INFO     step=210 loss=4.22371 dt=0.017668 dtf=0.0173218 dtb=0.00013425          trainer.py:850
                             sps=56.5996 sps_per_gpu=56.5996 tps=7244.75 tps_per_gpu=7244.75                       
                             mfu=0.145707

                    INFO     step=220 loss=4.23227 dt=0.018459 dtf=0.0181719 dtb=0.000110958         trainer.py:850
                             sps=54.1741 sps_per_gpu=54.1741 tps=6934.29 tps_per_gpu=6934.29                       
                             mfu=0.146121

[07/27/25 11:24:53] INFO     step=230 loss=4.22308 dt=0.0179255 dtf=0.0176206 dtb=0.000137           trainer.py:850
                             sps=55.7864 sps_per_gpu=55.7864 tps=7140.67 tps_per_gpu=7140.67                       
                             mfu=0.146939

                    INFO     step=240 loss=4.23777 dt=0.0191189 dtf=0.0187767 dtb=0.000147041        trainer.py:850
                             sps=52.3043 sps_per_gpu=52.3043 tps=6694.95 tps_per_gpu=6694.95                       
                             mfu=0.146712

                    INFO     step=250 loss=4.24408 dt=0.0223422 dtf=0.0219412 dtb=0.000145083        trainer.py:850
                             sps=44.7583 sps_per_gpu=44.7583 tps=5729.07 tps_per_gpu=5729.07                       
                             mfu=0.144421

[07/27/25 11:24:55] INFO     ['prompt']: 'What is an LLM?'                                           trainer.py:790

                    INFO     ['response']:                                                           trainer.py:794
                                                                                                                   
                             What is an LLM?.rooffAA                                                               
                             rW,,aAA'GoA,aUVVcCoGhvZZcd.QEcNAgxvwYa'haccX.aqo?rrQQ;;QbZ                            
                             '''fc3FqqWk.'oceQ-h!?Yvs'rw--Qc'333-.hq3AwvvcLq','J-w'''rhqWo--;hSQgSqq               
                             ;?rqYygAA,asso;q33AA'rbv,J-fof'g'SJJ,;ttcqq;'wgybqppaqttof;;;'''qtqaJpu               
                             uYf;paeyfhqg''''qWWbwAA-bbQyg'Sqqos''qYrM;a;??

[07/27/25 11:24:58] INFO     Saving checkpoint to:                                                   trainer.py:733
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example

                    INFO     Saving model to:                                                        trainer.py:734
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example/model.pth

                    INFO     Appending                                                               configs.py:141
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example to                                                        
                             /Users/samforeman/projects/saforem2/wordplay/src/ckpts/checkpoints.log

                    INFO     step=260 loss=4.2759 dt=0.0181479 dtf=0.0178296 dtb=0.000118833         trainer.py:850
                             sps=55.1029 sps_per_gpu=55.1029 tps=7053.17 tps_per_gpu=7053.17                       
                             mfu=0.14522

[07/27/25 11:24:59] INFO     step=270 loss=4.31702 dt=0.0175359 dtf=0.0172119 dtb=0.000134667        trainer.py:850
                             sps=57.026 sps_per_gpu=57.026 tps=7299.32 tps_per_gpu=7299.32                         
                             mfu=0.146471

                    INFO     step=280 loss=4.20612 dt=0.0180766 dtf=0.0177583 dtb=0.000122875        trainer.py:850
                             sps=55.3202 sps_per_gpu=55.3202 tps=7080.98 tps_per_gpu=7080.98                       
                             mfu=0.147125

                    INFO     step=290 loss=4.22943 dt=0.0187801 dtf=0.0184775 dtb=0.000117416        trainer.py:850
                             sps=53.2478 sps_per_gpu=53.2478 tps=6815.72 tps_per_gpu=6815.72                       
                             mfu=0.14714

                    INFO     step=300 loss=4.11928 dt=0.022491 dtf=0.0219909 dtb=0.000262625         trainer.py:850
                             sps=44.4622 sps_per_gpu=44.4622 tps=5691.17 tps_per_gpu=5691.17                       
                             mfu=0.144724

[07/27/25 11:25:01] INFO     ['prompt']: 'What is an LLM?'                                           trainer.py:790

                    INFO     ['response']:                                                           trainer.py:794
                                                                                                                   
                             What is an                                                                            
                             LLM?L3slghC33vfJQO-eBBBv.Y.Sffs,'gxEUAUCQeswPv,ettLWClrrqeZAtLA.''3NsG.               
                             .''.sAAmebbqYrv''-                                                                    
                             hTkcxhqqVUvvvfv,lxxlAc..3Zpq''Qsk'st;xlneQssssxS;'tt;cb;??rSQ'k--'t::qq               
                             npYbc;nn;WWqqexSe''ftMqYYttttook;;pgSQQcLgycA;;qqbb''aakqrAAk.h''gYbcLL               
                             oopqs:sSSAgZQtiAA.'MMsWllpMt

[07/27/25 11:25:04] INFO     Saving checkpoint to:                                                   trainer.py:733
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example

                    INFO     Saving model to:                                                        trainer.py:734
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example/model.pth

                    INFO     Appending                                                               configs.py:141
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example to                                                        
                             /Users/samforeman/projects/saforem2/wordplay/src/ckpts/checkpoints.log

                    INFO     step=310 loss=4.23252 dt=0.0201538 dtf=0.0198303 dtb=0.000123166        trainer.py:850
                             sps=49.6185 sps_per_gpu=49.6185 tps=6351.16 tps_per_gpu=6351.16                       
                             mfu=0.143976

[07/27/25 11:25:05] INFO     step=320 loss=4.23608 dt=0.0227338 dtf=0.0224232 dtb=0.000121542        trainer.py:850
                             sps=43.9875 sps_per_gpu=43.9875 tps=5630.4 tps_per_gpu=5630.4                         
                             mfu=0.141745

                    INFO     step=330 loss=4.25042 dt=0.0215552 dtf=0.0211734 dtb=0.000124791        trainer.py:850
                             sps=46.3925 sps_per_gpu=46.3925 tps=5938.24 tps_per_gpu=5938.24                       
                             mfu=0.140402

                    INFO     step=340 loss=4.19956 dt=0.0196884 dtf=0.0193743 dtb=0.000119708        trainer.py:850
                             sps=50.7913 sps_per_gpu=50.7913 tps=6501.28 tps_per_gpu=6501.28                       
                             mfu=0.140411

                    INFO     step=350 loss=4.2746 dt=0.021332 dtf=0.0210193 dtb=0.000116208          trainer.py:850
                             sps=46.8778 sps_per_gpu=46.8778 tps=6000.36 tps_per_gpu=6000.36                       
                             mfu=0.139336

[07/27/25 11:25:07] INFO     ['prompt']: 'What is an LLM?'                                           trainer.py:790

                    INFO     ['response']:                                                           trainer.py:794
                                                                                                                   
                             What is an LLM?llBZexQZ wwwZrrxxxcqWa                                                 
                             vqqxtqK..aHqQqqqecaask..--'Ve'll3fh3k..ttesscU''aUxhSpepBqqepp                        
                             'QQ-;AqfwetpM                                                                         
                             vSQwbrrZQqa.CAA,,axqbQu''seyex...'';yyfw'gk:SSWQtrrqW''KKpp?ZQU'''tcb?;               
                             ;;WufBWbb;f'ggYQttSk;?;;;?fA..Sbt;n''rrWqqMeeq;b'k'eMwQQtpufAAqQYAWASSe               
                             'qSpqqtLgWoqSk

[07/27/25 11:25:10] INFO     Saving checkpoint to:                                                   trainer.py:733
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example

                    INFO     Saving model to:                                                        trainer.py:734
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example/model.pth

[07/27/25 11:25:11] INFO     Appending                                                               configs.py:141
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example to                                                        
                             /Users/samforeman/projects/saforem2/wordplay/src/ckpts/checkpoints.log

                    INFO     step=360 loss=4.3276 dt=0.0184409 dtf=0.0180754 dtb=0.000134208         trainer.py:850
                             sps=54.2272 sps_per_gpu=54.2272 tps=6941.09 tps_per_gpu=6941.09                       
                             mfu=0.140401

                    INFO     step=370 loss=4.15959 dt=0.0190562 dtf=0.0186968 dtb=0.000137458        trainer.py:850
                             sps=52.4762 sps_per_gpu=52.4762 tps=6716.96 tps_per_gpu=6716.96                       
                             mfu=0.140876

                    INFO     step=380 loss=4.21489 dt=0.0178422 dtf=0.0175323 dtb=0.000121917        trainer.py:850
                             sps=56.0469 sps_per_gpu=56.0469 tps=7174 tps_per_gpu=7174 mfu=0.14229

                    INFO     step=390 loss=4.18483 dt=0.0188368 dtf=0.0185477 dtb=0.000112584        trainer.py:850
                             sps=53.0875 sps_per_gpu=53.0875 tps=6795.2 tps_per_gpu=6795.2                         
                             mfu=0.142745

[07/27/25 11:25:12] INFO     step=400 loss=4.2439 dt=0.0201257 dtf=0.019817 dtb=0.000122166          trainer.py:850
                             sps=49.6876 sps_per_gpu=49.6876 tps=6360.01 tps_per_gpu=6360.01                       
                             mfu=0.142214

[07/27/25 11:25:13] INFO     ['prompt']: 'What is an LLM?'                                           trainer.py:790

                    INFO     ['response']:                                                           trainer.py:794
                                                                                                                   
                             What is an                                                                            
                             LLM?.3YZfxsaskoRbawwqW3fkYfVUB33emX3cxeQ;XAA,E;hqqqAA,VqYoqep.3-S'eh3cP               
                             e''bqqQAh                                                                             
                             fSpppp;!cbWA'fff3feNhaAo,Ax.tqq33-33--fCttppaww-gkttttt,,oWbb'glQWb'WWb               
                             ZexG?b'sWl'tqt?qqQ'M'rhWlfMMe;tc-eqnnfCqYq;'?;t'Mwhqqq'..oooA,rqqfooWkk               
                             jGqqqqqq;fs;QYbWkkf',,.SSSbqqqbqeeqff

[07/27/25 11:25:16] INFO     Saving checkpoint to:                                                   trainer.py:733
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example

                    INFO     Saving model to:                                                        trainer.py:734
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example/model.pth

[07/27/25 11:25:17] INFO     Appending                                                               configs.py:141
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example to                                                        
                             /Users/samforeman/projects/saforem2/wordplay/src/ckpts/checkpoints.log

                    INFO     step=410 loss=4.23287 dt=0.0214269 dtf=0.0210555 dtb=0.000134084        trainer.py:850
                             sps=46.6704 sps_per_gpu=46.6704 tps=5973.81 tps_per_gpu=5973.81                       
                             mfu=0.140901

                    INFO     step=420 loss=4.27257 dt=0.019262 dtf=0.0189329 dtb=0.000124833         trainer.py:850
                             sps=51.9156 sps_per_gpu=51.9156 tps=6645.19 tps_per_gpu=6645.19                       
                             mfu=0.14117

                    INFO     step=430 loss=4.18557 dt=0.0198845 dtf=0.0194948 dtb=0.00011475         trainer.py:850
                             sps=50.2904 sps_per_gpu=50.2904 tps=6437.17 tps_per_gpu=6437.17                       
                             mfu=0.140963

                    INFO     step=440 loss=4.21616 dt=0.0235005 dtf=0.0231488 dtb=0.000149416        trainer.py:850
                             sps=42.5522 sps_per_gpu=42.5522 tps=5446.68 tps_per_gpu=5446.68                       
                             mfu=0.138637

[07/27/25 11:25:18] INFO     step=450 loss=4.23928 dt=0.0193989 dtf=0.0190223 dtb=0.000140167        trainer.py:850
                             sps=51.5494 sps_per_gpu=51.5494 tps=6598.32 tps_per_gpu=6598.32                       
                             mfu=0.139031

[07/27/25 11:25:20] INFO     ['prompt']: 'What is an LLM?'                                           trainer.py:790

                    INFO     ['response']:                                                           trainer.py:794
                                                                                                                   
                             What is an                                                                            
                             LLM?weeQQQ''QQ'evfhQQ;K.AEsWqb..CfC.h;vvx''bTopBe'gWvXffv3ebssW.;?ptdee               
                             ep                                                                                    
                             vrr..CCfkqcptyhpwTssWqsAxrqqqehmuZqZ:qeqGGGGauyfxrrAtgSrqWQ,,t;;ppMMgye               
                             qfvfAAqcWYtqqoopepwySkkqggt3bZMqqq;;yybkSJcSQuuurruqqQtttoo''fAqq;;vSJZ               
                             ZZtM''qqM???gWWAAAt??MYYYe;yglAg;up'exuqqWtu

[07/27/25 11:25:23] INFO     Saving checkpoint to:                                                   trainer.py:733
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example

                    INFO     Saving model to:                                                        trainer.py:734
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example/model.pth

                    INFO     Appending                                                               configs.py:141
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example to                                                        
                             /Users/samforeman/projects/saforem2/wordplay/src/ckpts/checkpoints.log

                    INFO     step=460 loss=4.24269 dt=0.0180308 dtf=0.0177195 dtb=0.000118291        trainer.py:850
                             sps=55.4608 sps_per_gpu=55.4608 tps=7098.98 tps_per_gpu=7098.98                       
                             mfu=0.140468

                    INFO     step=470 loss=4.26877 dt=0.0187917 dtf=0.0184404 dtb=0.000135           trainer.py:850
                             sps=53.215 sps_per_gpu=53.215 tps=6811.51 tps_per_gpu=6811.51                         
                             mfu=0.14114

                    INFO     step=480 loss=4.19188 dt=0.0186805 dtf=0.0183477 dtb=0.00012775         trainer.py:850
                             sps=53.5318 sps_per_gpu=53.5318 tps=6852.07 tps_per_gpu=6852.07                       
                             mfu=0.141833

[07/27/25 11:25:24] INFO     step=490 loss=4.22611 dt=0.0206145 dtf=0.0202806 dtb=0.000136666        trainer.py:850
                             sps=48.5095 sps_per_gpu=48.5095 tps=6209.22 tps_per_gpu=6209.22                       
                             mfu=0.141067

                    INFO     step=500 loss=4.21804 dt=0.0178817 dtf=0.0175614 dtb=0.000115875        trainer.py:850
                             sps=55.9232 sps_per_gpu=55.9232 tps=7158.17 tps_per_gpu=7158.17                       
                             mfu=0.142428

import time

query = "What is an LLM?"
t0 = time.perf_counter()
outputs = trainer.evaluate(
    query,
    num_samples=1,
    max_new_tokens=256,
    top_k=16,
    display=False
)
logger.info(f'took: {time.perf_counter() - t0:.4f}s')
logger.info(f"['prompt']: '{query}'")
logger.info("['response']:\n\n" + fr"{outputs['0']['raw']}")

[07/27/25 11:25:26] INFO     took: 1.7500s                                                         1425179755.py:12

                    INFO     ['prompt']: 'What is an LLM?'                                         1425179755.py:13

                    INFO     ['response']:                                                         1425179755.py:14
                                                                                                                   
                             What is an LLM?fwll                                                                   
                                                                                                                   
                             b3afqbZZI,r                                                                           
                             oppq3A33QoUUye-fwC'3b3.',A'.hhPlVXXqeQyCCC;xfssc;wTTTTcdGoeehQOCXXXB'                 
                             KZ--qehoF3AqfqqW                                                                      
                             cQAcceffGG,'fSJpppww,txMgQs;;;?qf'fSSrpcg?s,A'rr,aso?''o'MtQrrSSgqftt                 
                             ggSc''Wb'qA,.Apcbb???;pYYySQ'agggScWQgbqWfqYroffSYSYhqfk''qfAA,sgWlnZ                 
                             :pt,JynS'gJZes

Resume Training…

trainer.train()

[07/27/25 11:25:28] INFO     ['prompt']: 'What is an LLM?'                                           trainer.py:790

                    INFO     ['response']:                                                           trainer.py:794
                                                                                                                   
                             What is an                                                                            
                             LLM?wZbbbT'3weew,'foBB.qWWlpwes.qqQevFAA.bbvFF-AkacWWfYhx3fooB'''';vvee               
                             sppWW                                                                                 
                             eeWA3ZZppPZe;dCCvres                                                                  
                             ;ecc--Ws'cqor,JZVVVCCeepfqqWxApBBBBhh;;JeQhMMss,,wshrhW?BiMWYqqwwwAASSw               
                             rrroo,rqtWseMq.Ak'ofA,,'t,,..hh;xx'?sAq';cqxrqWkeMqt'gzAAxhrpqt'g't;?bt               
                             oseq-pqq'qAtttt,eqrM

[07/27/25 11:25:31] INFO     Saving checkpoint to:                                                   trainer.py:733
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example

                    INFO     Saving model to:                                                        trainer.py:734
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example/model.pth

                    INFO     Appending                                                               configs.py:141
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example to                                                        
                             /Users/samforeman/projects/saforem2/wordplay/src/ckpts/checkpoints.log

                    INFO     step=510 loss=4.25518 dt=0.0193091 dtf=0.0189602 dtb=0.000122416        trainer.py:850
                             sps=51.789 sps_per_gpu=51.789 tps=6628.99 tps_per_gpu=6628.99                         
                             mfu=0.143245

                    INFO     step=520 loss=4.20906 dt=0.0182869 dtf=0.0179924 dtb=0.000112625        trainer.py:850
                             sps=54.684 sps_per_gpu=54.684 tps=6999.56 tps_per_gpu=6999.56                         
                             mfu=0.144046

[07/27/25 11:25:32] INFO     step=530 loss=4.22394 dt=0.0183378 dtf=0.0179662 dtb=0.000141666        trainer.py:850
                             sps=54.5322 sps_per_gpu=54.5322 tps=6980.12 tps_per_gpu=6980.12                       
                             mfu=0.144724

                    INFO     step=540 loss=4.23923 dt=0.018275 dtf=0.0179809 dtb=0.000123958         trainer.py:850
                             sps=54.7196 sps_per_gpu=54.7196 tps=7004.1 tps_per_gpu=7004.1                         
                             mfu=0.145387

                    INFO     step=550 loss=4.24928 dt=0.0200772 dtf=0.0197448 dtb=0.000128708        trainer.py:850
                             sps=49.8077 sps_per_gpu=49.8077 tps=6375.39 tps_per_gpu=6375.39                       
                             mfu=0.144625

[07/27/25 11:25:34] INFO     ['prompt']: 'What is an LLM?'                                           trainer.py:790

                    INFO     ['response']:                                                           trainer.py:794
                                                                                                                   
                             What is an                                                                            
                             LLM?wboG',ZZswPZZhsf'V.h;QrppwAfAa''qWWYYfOOx33fvkkfQ'elccB3kkkm....swe               
                             vfsssoAkfQss                                                                          
                             'f;ehewqs3--seuCeerqfQA,XXqooU;?';QhdI'M;;astc;W;?A;p;p',,'''gosS;;WW?'               
                             errs'fwwr''qqWW,w'l;''www''tppwbQWWseSSqYtLtSbQQQ'q;qqM'tbqW,s'r.AAtcbb               
                             q-'ttuuA,;;;Q'S;;;ttMglqYetqeSS;Wq

[07/27/25 11:25:37] INFO     Saving checkpoint to:                                                   trainer.py:733
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example

                    INFO     Saving model to:                                                        trainer.py:734
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example/model.pth

                    INFO     Appending                                                               configs.py:141
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example to                                                        
                             /Users/samforeman/projects/saforem2/wordplay/src/ckpts/checkpoints.log

                    INFO     step=560 loss=4.21979 dt=0.0185737 dtf=0.0182987 dtb=0.000109708        trainer.py:850
                             sps=53.8395 sps_per_gpu=53.8395 tps=6891.46 tps_per_gpu=6891.46                       
                             mfu=0.145054

[07/27/25 11:25:38] INFO     step=570 loss=4.27896 dt=0.018959 dtf=0.0185998 dtb=0.000151583         trainer.py:850
                             sps=52.7454 sps_per_gpu=52.7454 tps=6751.41 tps_per_gpu=6751.41                       
                             mfu=0.145138

                    INFO     step=580 loss=4.25036 dt=0.0188471 dtf=0.0184447 dtb=0.00018775         trainer.py:850
                             sps=53.0586 sps_per_gpu=53.0586 tps=6791.5 tps_per_gpu=6791.5                         
                             mfu=0.1453

                    INFO     step=590 loss=4.30325 dt=0.021447 dtf=0.0210627 dtb=0.0001295           trainer.py:850
                             sps=46.6266 sps_per_gpu=46.6266 tps=5968.2 tps_per_gpu=5968.2                         
                             mfu=0.143666

                    INFO     step=600 loss=4.24977 dt=0.0181719 dtf=0.0174561 dtb=0.000136083        trainer.py:850
                             sps=55.03 sps_per_gpu=55.03 tps=7043.84 tps_per_gpu=7043.84 mfu=0.14452

[07/27/25 11:25:40] INFO     ['prompt']: 'What is an LLM?'                                           trainer.py:790

                    INFO     ['response']:                                                           trainer.py:794
                                                                                                                   
                             What is an LLM?LQ3vvye! wePZ                                                          
                             ewbAII''QYUfY.vTcaQlccCfhsZblYe''vS'xqosfoxCx'q33ckkxpppcecZZ-caqAb''fQ               
                             -eqb'.AGGGZZ?--s..h.ttppMq3ZQs,e';pwsf..se;;pqtcenr'.nxnqqgbqQYtttM'fSb               
                             ttcqqqqgYYjjrqfAkkSSSuQqoh'''S;SYYYAG;SSSo'QQQuu;'QSfqo'.tgSggkqWYYbbvq               
                             qtuiqrhS;QC'QSrSbWWSJJeuuiWYu

[07/27/25 11:25:43] INFO     Saving checkpoint to:                                                   trainer.py:733
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example

                    INFO     Saving model to:                                                        trainer.py:734
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example/model.pth

                    INFO     Appending                                                               configs.py:141
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example to                                                        
                             /Users/samforeman/projects/saforem2/wordplay/src/ckpts/checkpoints.log

                    INFO     step=610 loss=4.27699 dt=0.0194192 dtf=0.019049 dtb=0.000122208         trainer.py:850
                             sps=51.4955 sps_per_gpu=51.4955 tps=6591.43 tps_per_gpu=6591.43                       
                             mfu=0.144312

                    INFO     step=620 loss=4.2417 dt=0.0203904 dtf=0.0201204 dtb=0.000116084         trainer.py:850
                             sps=49.0427 sps_per_gpu=49.0427 tps=6277.47 tps_per_gpu=6277.47                       
                             mfu=0.143445

[07/27/25 11:25:44] INFO     step=630 loss=4.1949 dt=0.0202023 dtf=0.0199125 dtb=0.000115            trainer.py:850
                             sps=49.4992 sps_per_gpu=49.4992 tps=6335.9 tps_per_gpu=6335.9                         
                             mfu=0.142792

                    INFO     step=640 loss=4.21554 dt=0.0184285 dtf=0.0181117 dtb=0.000119542        trainer.py:850
                             sps=54.2639 sps_per_gpu=54.2639 tps=6945.78 tps_per_gpu=6945.78                       
                             mfu=0.143522

                    INFO     step=650 loss=4.26643 dt=0.0191115 dtf=0.018803 dtb=0.000116417         trainer.py:850
                             sps=52.3245 sps_per_gpu=52.3245 tps=6697.54 tps_per_gpu=6697.54                       
                             mfu=0.143642

[07/27/25 11:25:46] INFO     ['prompt']: 'What is an LLM?'                                           trainer.py:790

                    INFO     ['response']:                                                           trainer.py:794
                                                                                                                   
                             What is an                                                                            
                             LLM?qadZ--e'ovTqro'qE'rpAYvrr;qo3AAwUA-sG..qqbaNNyyep;blgWVe''tkaoo,ebq               
                             qUAAAAxttmZS.tGlAxxtccZAk'qffhMM;hqcZ                                                 
                             'rvsoAAtqWtt,'MqWtt'qqqQ--zpttttuq3brqtrrha;WW'eq;cqqqqrrhh-ppq;'SSJrhS               
                             YSJqg'',asqqAhdqbv'?Bqqqb',fqSqt'QqAAWAAqqQQQttttIffvqeWYY--?MfSpppMttt               
                             tBBM'KK..

[07/27/25 11:25:49] INFO     Saving checkpoint to:                                                   trainer.py:733
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example

                    INFO     Saving model to:                                                        trainer.py:734
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example/model.pth

                    INFO     Appending                                                               configs.py:141
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example to                                                        
                             /Users/samforeman/projects/saforem2/wordplay/src/ckpts/checkpoints.log

                    INFO     step=660 loss=4.17238 dt=0.0189814 dtf=0.0186691 dtb=0.000131375        trainer.py:850
                             sps=52.6832 sps_per_gpu=52.6832 tps=6743.45 tps_per_gpu=6743.45                       
                             mfu=0.14385

[07/27/25 11:25:50] INFO     step=670 loss=4.33205 dt=0.0193104 dtf=0.0189986 dtb=0.000128042        trainer.py:850
                             sps=51.7856 sps_per_gpu=51.7856 tps=6628.56 tps_per_gpu=6628.56                       
                             mfu=0.143789

                    INFO     step=680 loss=4.17701 dt=0.0183742 dtf=0.0180271 dtb=0.000151375        trainer.py:850
                             sps=54.4241 sps_per_gpu=54.4241 tps=6966.29 tps_per_gpu=6966.29                       
                             mfu=0.144463

                    INFO     step=690 loss=4.23023 dt=0.0177905 dtf=0.0175473 dtb=9.91249e-05        trainer.py:850
                             sps=56.2098 sps_per_gpu=56.2098 tps=7194.85 tps_per_gpu=7194.85                       
                             mfu=0.145564

                    INFO     step=700 loss=4.19011 dt=0.0194102 dtf=0.0188519 dtb=0.000118375        trainer.py:850
                             sps=51.5194 sps_per_gpu=51.5194 tps=6594.48 tps_per_gpu=6594.48                       
                             mfu=0.145257

[07/27/25 11:25:52] INFO     ['prompt']: 'What is an LLM?'                                           trainer.py:790

                    INFO     ['response']:                                                           trainer.py:794
                                                                                                                   
                             What is an LLM?lrvqqrafQEsA,hrccZZ;'rrkf'c x'Xxqad.SSxtaV!XQUxv;a.'g                  
                             Zto..herovV-qA'K;aZs3ecAq                                                             
                             vqq.!c'fos,ssAAcqfop-;AA.Ag.WYYvvqttxW,,eq;;..Mww';QtMMgqeeqYYppppp;;..               
                             MW'tqYf.ff';ccWYrrS'SAsSohegQrr'rhWSASpgj'.A;;.eqqqqqeWWofYQYtcb'Q;;;tt               
                             tuqcgk;.t3tSbYhhouI;ppp;tSfvgQSuSq

[07/27/25 11:25:55] INFO     Saving checkpoint to:                                                   trainer.py:733
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example

                    INFO     Saving model to:                                                        trainer.py:734
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example/model.pth

                    INFO     Appending                                                               configs.py:141
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example to                                                        
                             /Users/samforeman/projects/saforem2/wordplay/src/ckpts/checkpoints.log

                    INFO     step=710 loss=4.25752 dt=0.0197687 dtf=0.0193927 dtb=0.000144125        trainer.py:850
                             sps=50.585 sps_per_gpu=50.585 tps=6474.88 tps_per_gpu=6474.88                         
                             mfu=0.144723

[07/27/25 11:25:56] INFO     step=720 loss=4.22592 dt=0.0186651 dtf=0.0175268 dtb=0.0001345          trainer.py:850
                             sps=53.5759 sps_per_gpu=53.5759 tps=6857.71 tps_per_gpu=6857.71                       
                             mfu=0.14507

                    INFO     step=730 loss=4.18346 dt=0.0178852 dtf=0.017587 dtb=0.000127            trainer.py:850
                             sps=55.9123 sps_per_gpu=55.9123 tps=7156.77 tps_per_gpu=7156.77                       
                             mfu=0.146028

                    INFO     step=740 loss=4.22937 dt=0.018805 dtf=0.0184613 dtb=0.000150958         trainer.py:850
                             sps=53.1772 sps_per_gpu=53.1772 tps=6806.69 tps_per_gpu=6806.69                       
                             mfu=0.146133

                    INFO     step=750 loss=4.22004 dt=0.0185913 dtf=0.0181662 dtb=0.000108125        trainer.py:850
                             sps=53.7887 sps_per_gpu=53.7887 tps=6884.96 tps_per_gpu=6884.96                       
                             mfu=0.146398

[07/27/25 11:25:58] INFO     ['prompt']: 'What is an LLM?'                                           trainer.py:790

                    INFO     ['response']:                                                           trainer.py:794
                                                                                                                   
                             What is an LLM?.AvexhjjsAxx3AAAAffyyY'rr.AxZZpaff.yykfAqYEZ                           
                             'koBf''3YYo.hzA,aaqbbZ                                                                
                             ttQhhxkeQU'qhqqoqq!!'ffor'f.aZPeG'qW.ttvafA-b??fffvfvYrcL.bWtSS??qtLtQu               
                             tohdyyppu''rrSqYqc'KKye''''gjjQq'fgJq;;.'gYqrkssW'tp;bqqf.qowqoMM'qQQSq               
                             qWssgyttu?qoo'ff''kkSSffAr.MggesgIIBBYeeWqqqqg

[07/27/25 11:26:01] INFO     Saving checkpoint to:                                                   trainer.py:733
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example

                    INFO     Saving model to:                                                        trainer.py:734
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example/model.pth

                    INFO     Appending                                                               configs.py:141
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example to                                                        
                             /Users/samforeman/projects/saforem2/wordplay/src/ckpts/checkpoints.log

                    INFO     step=760 loss=4.16349 dt=0.0194697 dtf=0.0191296 dtb=0.000143083        trainer.py:850
                             sps=51.3619 sps_per_gpu=51.3619 tps=6574.33 tps_per_gpu=6574.33                       
                             mfu=0.145964

                    INFO     step=770 loss=4.22062 dt=0.0193039 dtf=0.018953 dtb=0.0001385           trainer.py:850
                             sps=51.803 sps_per_gpu=51.803 tps=6630.78 tps_per_gpu=6630.78                         
                             mfu=0.145696

[07/27/25 11:26:02] INFO     step=780 loss=4.16916 dt=0.0171542 dtf=0.0168228 dtb=0.000155208        trainer.py:850
                             sps=58.2949 sps_per_gpu=58.2949 tps=7461.74 tps_per_gpu=7461.74                       
                             mfu=0.147251

                    INFO     step=790 loss=4.21405 dt=0.0176518 dtf=0.0173884 dtb=0.000118           trainer.py:850
                             sps=56.6515 sps_per_gpu=56.6515 tps=7251.39 tps_per_gpu=7251.39                       
                             mfu=0.148195

                    INFO     step=800 loss=4.23569 dt=0.037451 dtf=0.0371191 dtb=0.000127167         trainer.py:850
                             sps=26.7016 sps_per_gpu=26.7016 tps=3417.8 tps_per_gpu=3417.8                         
                             mfu=0.140761

[07/27/25 11:26:04] INFO     ['prompt']: 'What is an LLM?'                                           trainer.py:790

                    INFO     ['response']:                                                           trainer.py:794
                                                                                                                   
                             What is an                                                                            
                             LLM??.ahoskZqeofpQe'v;.p..hqYwqaarswbbc.ahwbkkA''KyhvX.yp'Vc3;oseo.xeee               
                             aa'WQqfhKKfYqqqf.x33xx--;;;.egMcc-qaaovvKKOsvSpwesfgI;;wwerpMgtcgQsb;uQ               
                             tggyyptokyy';QCy;;asoW,,Jr''''',AkkfYoAAAAAS::::;;.bWttqeqcbA::gYJJbqgj               
                             oBhopwe;.s''ggkk'qk.qkGWYYyqqe;''Sbs'MM;;.qqqqQ

[07/27/25 11:26:07] INFO     Saving checkpoint to:                                                   trainer.py:733
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example

                    INFO     Saving model to:                                                        trainer.py:734
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example/model.pth

                    INFO     Appending                                                               configs.py:141
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example to                                                        
                             /Users/samforeman/projects/saforem2/wordplay/src/ckpts/checkpoints.log

                    INFO     step=810 loss=4.22317 dt=0.0203105 dtf=0.0199397 dtb=0.000126916        trainer.py:850
                             sps=49.2356 sps_per_gpu=49.2356 tps=6302.16 tps_per_gpu=6302.16                       
                             mfu=0.140303

[07/27/25 11:26:08] INFO     step=820 loss=4.24584 dt=0.0213863 dtf=0.0210762 dtb=0.000128834        trainer.py:850
                             sps=46.7589 sps_per_gpu=46.7589 tps=5985.14 tps_per_gpu=5985.14                       
                             mfu=0.139206

                    INFO     step=830 loss=4.1855 dt=0.0176513 dtf=0.0172706 dtb=0.000152417         trainer.py:850
                             sps=56.6529 sps_per_gpu=56.6529 tps=7251.58 tps_per_gpu=7251.58                       
                             mfu=0.140955

                    INFO     step=840 loss=4.24083 dt=0.018392 dtf=0.0180307 dtb=0.0001385           trainer.py:850
                             sps=54.3716 sps_per_gpu=54.3716 tps=6959.56 tps_per_gpu=6959.56                       
                             mfu=0.141898

                    INFO     step=850 loss=4.23785 dt=0.0192448 dtf=0.0189111 dtb=0.000127           trainer.py:850
                             sps=51.9622 sps_per_gpu=51.9622 tps=6651.16 tps_per_gpu=6651.16                       
                             mfu=0.142081

[07/27/25 11:26:10] INFO     ['prompt']: 'What is an LLM?'                                           trainer.py:790

                    INFO     ['response']:                                                           trainer.py:794
                                                                                                                   
                             What is an                                                                            
                             LLM?A;QfqrqQ'xxx'aa.hh3vv''wwossqZse'rxfQsseh'.evrpMq''.xxTUeQ'''rqqaxf               
                             xtcbqcf3qq3jZbvcepwA,,,ff'hpqcpcA-A'rv::errrvbbZ:pc-qycSScWlbQYhhwwAA-S               
                             QCgl;bbrpbSrrrrqqqqq''rWqqtcAkYyqgYtxttttbkkqQWWqaqqqkkk,'qqexrrWSSqyyY               
                             j'SyyQYQQ,q''p'---p''tcqzhhhpqWfs.p'foBqqQt::eu

[07/27/25 11:26:13] INFO     Saving checkpoint to:                                                   trainer.py:733
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example

                    INFO     Saving model to:                                                        trainer.py:734
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example/model.pth

[07/27/25 11:26:14] INFO     Appending                                                               configs.py:141
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example to                                                        
                             /Users/samforeman/projects/saforem2/wordplay/src/ckpts/checkpoints.log

                    INFO     step=860 loss=4.20116 dt=0.0179678 dtf=0.0176636 dtb=0.000124459        trainer.py:850
                             sps=55.655 sps_per_gpu=55.655 tps=7123.84 tps_per_gpu=7123.84                         
                             mfu=0.143267

                    INFO     step=870 loss=4.22428 dt=0.0205305 dtf=0.0186659 dtb=0.000150667        trainer.py:850
                             sps=48.7079 sps_per_gpu=48.7079 tps=6234.61 tps_per_gpu=6234.61                       
                             mfu=0.142412

                    INFO     step=880 loss=4.22977 dt=0.0189898 dtf=0.018688 dtb=0.00011875          trainer.py:850
                             sps=52.6599 sps_per_gpu=52.6599 tps=6740.46 tps_per_gpu=6740.46                       
                             mfu=0.142737

                    INFO     step=890 loss=4.22047 dt=0.0202268 dtf=0.0199305 dtb=0.0001135          trainer.py:850
                             sps=49.4395 sps_per_gpu=49.4395 tps=6328.25 tps_per_gpu=6328.25                       
                             mfu=0.142137

                    INFO     step=900 loss=4.35563 dt=0.019475 dtf=0.0189142 dtb=0.000115833         trainer.py:850
                             sps=51.348 sps_per_gpu=51.348 tps=6572.54 tps_per_gpu=6572.54                         
                             mfu=0.142126

[07/27/25 11:26:16] INFO     ['prompt']: 'What is an LLM?'                                           trainer.py:790

                    INFO     ['response']:                                                           trainer.py:794
                                                                                                                   
                             What is an                                                                            
                             LLM?wwPA'eeew-3ZAjRwqs33eafCq'ax..xcxc''awA',bsettcCvCqqq33A-.bsor.awQf               
                             J$  3a-3b U' Zq3gQQf',,AqGZ                                                           
                             fhhPwU.vfCC.xpqvr.SkkofxsyQrrs';'kGs,rMse''rppb'qqfoktM'qo,qqSqgW,etM'M               
                             ??Z;auYfSSo??gg'sSvSQQqfftcb;;;;pWQSffttqgQSSSkllbrqqaw,'SqqYQ;;;pqqtpB               
                             heW;;;.hn'qYyMMesgl

[07/27/25 11:26:19] INFO     Saving checkpoint to:                                                   trainer.py:733
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example

                    INFO     Saving model to:                                                        trainer.py:734
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example/model.pth

[07/27/25 11:26:21] INFO     Appending                                                               configs.py:141
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example to                                                        
                             /Users/samforeman/projects/saforem2/wordplay/src/ckpts/checkpoints.log

                    INFO     step=910 loss=4.19569 dt=0.0184239 dtf=0.0181274 dtb=0.000126584        trainer.py:850
                             sps=54.2774 sps_per_gpu=54.2774 tps=6947.51 tps_per_gpu=6947.51                       
                             mfu=0.142926

                    INFO     step=920 loss=4.23206 dt=0.0189052 dtf=0.0186322 dtb=0.00011175         trainer.py:850
                             sps=52.8955 sps_per_gpu=52.8955 tps=6770.62 tps_per_gpu=6770.62                       
                             mfu=0.143264

[07/27/25 11:26:22] INFO     step=930 loss=4.29058 dt=0.0204312 dtf=0.0200622 dtb=0.0001525          trainer.py:850
                             sps=48.9446 sps_per_gpu=48.9446 tps=6264.91 tps_per_gpu=6264.91                       
                             mfu=0.142476

                    INFO     step=940 loss=4.211 dt=0.0308806 dtf=0.0188316 dtb=0.000154834          trainer.py:850
                             sps=32.3828 sps_per_gpu=32.3828 tps=4145 tps_per_gpu=4145 mfu=0.137185

                    INFO     step=950 loss=4.18626 dt=0.0178002 dtf=0.0175009 dtb=0.000114584        trainer.py:850
                             sps=56.179 sps_per_gpu=56.179 tps=7190.91 tps_per_gpu=7190.91                         
                             mfu=0.139005

[07/27/25 11:26:24] INFO     ['prompt']: 'What is an LLM?'                                           trainer.py:790

                    INFO     ['response']:                                                           trainer.py:794
                                                                                                                   
                             What is an                                                                            
                             LLM?YfQooooRx3xccaHCvj3gllexpjGG,wUxe'oOf.smxxxrq-jj'kxxrkc3fkkeQZZe''Y               
                             R'JhrZZAcowccpqA,QUJZpcAkkGGGqp--.v'appbYYbeeqbbZrk'MBfq-srksqYee'QQt'J               
                             ',qWqt;qkGWbrrtqJ-'pa'ggjJSq--'sf'..;''aqfpfx'Sbbq3tooMbb?',AA-AW'MqAAk               
                             ;ccAGqQqaA;WQhMSq;cffho,eWohpWott3jj---s;?ggIIS

[07/27/25 11:26:27] INFO     Saving checkpoint to:                                                   trainer.py:733
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example

                    INFO     Saving model to:                                                        trainer.py:734
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example/model.pth

                    INFO     Appending                                                               configs.py:141
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example to                                                        
                             /Users/samforeman/projects/saforem2/wordplay/src/ckpts/checkpoints.log

                    INFO     step=960 loss=4.225 dt=0.0210933 dtf=0.0207466 dtb=0.00012575           trainer.py:850
                             sps=47.4083 sps_per_gpu=47.4083 tps=6068.27 tps_per_gpu=6068.27                       
                             mfu=0.138218

[07/27/25 11:26:28] INFO     step=970 loss=4.17741 dt=0.0178491 dtf=0.0175596 dtb=0.000125458        trainer.py:850
                             sps=56.0252 sps_per_gpu=56.0252 tps=7171.22 tps_per_gpu=7171.22                       
                             mfu=0.139892

                    INFO     step=980 loss=4.1707 dt=0.0166487 dtf=0.0163776 dtb=0.000111583         trainer.py:850
                             sps=60.0647 sps_per_gpu=60.0647 tps=7688.28 tps_per_gpu=7688.28                       
                             mfu=0.142516

                    INFO     step=990 loss=4.1891 dt=0.0180315 dtf=0.0177192 dtb=0.000119167         trainer.py:850
                             sps=55.4585 sps_per_gpu=55.4585 tps=7098.69 tps_per_gpu=7098.69                       
                             mfu=0.143604

                    INFO     step=1000 loss=4.2423 dt=0.022806 dtf=0.0224982 dtb=0.000120917         trainer.py:850
                             sps=43.8482 sps_per_gpu=43.8482 tps=5612.57 tps_per_gpu=5612.57                       
                             mfu=0.141372

Evaluate Model

import time

query = "What is an LLM?"
t0 = time.perf_counter()
outputs = trainer.evaluate(
    query,
    num_samples=1,
    max_new_tokens=256,
    top_k=2,
    display=False
)
logger.info(f'took: {time.perf_counter() - t0:.4f}s')
logger.info(f"['prompt']: '{query}'")
logger.info("['response']:\n\n" + fr"{outputs['0']['raw']}")

[07/27/25 11:26:31] INFO     took: 1.7435s                                                          582817405.py:12

                    INFO     ['prompt']: 'What is an LLM?'                                          582817405.py:13

                    INFO     ['response']:                                                          582817405.py:14
                                                                                                                   
                             What is an                                                                            
                             LLM?ZxxA---'aaaaeeewAAAAA'''qqqqqqqqqqqqaeeqqqqqq''333qqAAA33akkk''qqq                
                             qqorrrrrrrrrrqqqqqqq.qe333aaaqqqqqf..qqqqqqq3333333-qqqqbbb''ggSSpMMMq                
                             qqqMMqqqqqqqqWW;?;?;?;???;;??MMMM;;;;;;??;;;;;;;;''''';??qqqqqqqW;;'''                
                             '''''''''';;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;'tttttMM

---
title: '[`wordplay` 🎮 💬](https://github.com/saforem2/wordplay): Shakespeare ✍️'
jupyter: python3
---


We will be using the [Shakespeare dataset](https://github.com/saforem2/wordplay/blob/main/data/shakespeare/readme.md) to train a (~ small) 10M param LLM _from scratch_.

<div>

<div align="center" style="text-align:center;">

<img src="https://github.com/saforem2/wordplay/blob/main/assets/shakespeare.jpeg?raw=true" width="45%" align="center" /><br>

Image generated from [stabilityai/stable-diffusion](https://huggingface.co/spaces/stabilityai/stable-diffusion) on [🤗 Spaces](https://huggingface.co/spaces).<br>

</div>

<details closed><summary>Prompt Details</summary>

<ul>
<li>Prompt:</li>
<t><q>
Shakespeare himself, dressed in full Shakespearean garb,
writing code at a modern workstation with multiple monitors, hacking away profusely,
backlit, high quality for publication
</q></t>

<li>Negative Prompt:</li>
<t><q>
low quality, 3d, photorealistic, ugly
</q></t>
</ul>

</details>

</div>


## Install / Setup

<div class="alert alert-block alert-warning">
<b>Warning!</b><br>  

**IF YOU ARE EXECUTING ON GOOGLE COLAB**:  

You will need to restart your runtime (`Runtime` $\rightarrow\,$ `Restart runtime`)  
_after_ executing the following cell:

</div>

quarto-executable-code-5450563D

```python
#| colab: {base_uri: https://localhost:8080/}
#| execution: {iopub.execute_input: '2023-11-30T16:35:09.547786Z', iopub.status.busy: '2023-11-30T16:35:09.547243Z', iopub.status.idle: '2023-11-30T16:35:09.697821Z', shell.execute_reply: '2023-11-30T16:35:09.697442Z', shell.execute_reply.started: '2023-11-30T16:35:09.547769Z'}
#| jupyter: {outputs_hidden: false, source_hidden: false}
%%bash

python3 -c 'import wordplay; print(wordplay.__file__)' 2> '/dev/null'

if [[ $? -eq 0 ]]; then
    echo "Has wordplay installed. Nothing to do."
else
    echo "Does not have wordplay installed. Installing..."
    git clone 'https://github.com/saforem2/wordplay'
    python3 wordplay/data/shakespeare_char/prepare.py
    python3 wordplay/data/shakespeare/prepare.py
    python3 -m pip install deepspeed
    python3 -m pip install -e wordplay
fi
```

## Post Install

If installed correctly, you should be able to:

```python
>>> import wordplay
>>> wordplay.__file__
'/path/to/wordplay/src/wordplay/__init__.py'
```

quarto-executable-code-5450563D

```python
#| colab: {base_uri: https://localhost:8080/}
#| execution: {iopub.execute_input: '2023-11-30T16:35:09.818095Z', iopub.status.busy: '2023-11-30T16:35:09.817884Z', iopub.status.idle: '2023-11-30T16:35:10.193029Z', shell.execute_reply: '2023-11-30T16:35:10.192647Z', shell.execute_reply.started: '2023-11-30T16:35:09.818079Z'}
#| jupyter: {outputs_hidden: false, source_hidden: false}
%load_ext autoreload
%autoreload 2
import os
import sys
import ezpz

os.environ['COLORTERM'] = 'truecolor'
if sys.platform == 'darwin':
    # If running on MacOS:
    # os.environ['PYTORCH_ENABLE_MPS_FALLBACK'] = '1'
    os.environ['TORCH_DEVICE'] = 'cpu'
# -----------------------------------------------

logger = ezpz.get_logger()

import wordplay
logger.info(wordplay.__file__)
```

## Build Trainer

Explicitly, we:

1. `setup_torch(...)`
2. Build `cfg: DictConfig = get_config(...)`
3. Instnatiate `config: ExperimentConfig = instantiate(cfg)`
4. Build `trainer = Trainer(config)`

quarto-executable-code-5450563D

```python
#| colab: {base_uri: https://localhost:8080/, height: 481}
#| jupyter: {outputs_hidden: false, source_hidden: false}
import os
import numpy as np
from ezpz import setup
from hydra.utils import instantiate
from wordplay.configs import get_config, PROJECT_ROOT
from wordplay.trainer import Trainer

HF_DATASETS_CACHE = PROJECT_ROOT.joinpath('.cache', 'huggingface')
HF_DATASETS_CACHE.mkdir(exist_ok=True, parents=True)

os.environ['HF_DATASETS_CACHE'] = HF_DATASETS_CACHE.as_posix()

BACKEND = 'DDP'

rank = setup(
    framework='pytorch',
    backend=BACKEND,
    seed=1234,
)

cfg = get_config(
    [
        'data=shakespeare',
        'model=shakespeare',
        'model.batch_size=1',
        'model.block_size=128',
        'optimizer=shakespeare',
        'train=shakespeare',
        f'train.backend={BACKEND}',
        'train.compile=false',
        'train.dtype=bfloat16',
        'train.max_iters=500',
        'train.log_interval=10',
        'train.eval_interval=50',
    ]
)
config = instantiate(cfg)
```

### Build `Trainer` object

quarto-executable-code-5450563D

```python
#| colab: {base_uri: https://localhost:8080/, height: 113}
#| execution: {iopub.execute_input: '2023-11-30T16:35:10.194033Z', iopub.status.busy: '2023-11-30T16:35:10.193752Z', iopub.status.idle: '2023-11-30T16:36:28.944914Z', shell.execute_reply: '2023-11-30T16:36:28.943625Z', shell.execute_reply.started: '2023-11-30T16:35:10.194017Z'}
#| jupyter: {outputs_hidden: false, source_hidden: false}
trainer = Trainer(config)
```

## Prompt (**prior** to training)

quarto-executable-code-5450563D

```python
#| colab: {base_uri: https://localhost:8080/, height: 97}
#| jupyter: {outputs_hidden: false, source_hidden: false}
query = "What is an LLM?"
outputs = trainer.evaluate(
    query,
    num_samples=1,
    max_new_tokens=256,
    top_k=16,
    display=False
)
logger.info(f"['prompt']: '{query}'")
logger.info("['response']:\n\n" + fr"{outputs['0']['raw']}")
```

## Train Model

|  name  |       description            |
|:------:|:----------------------------:|
| `step` | Current training step        |
| `loss` | Loss value                   |
| `dt`   | Time per step (in **ms**)    |
| `sps`  | Samples per second           |
| `mtps` | (million) Tokens per sec     |
| `mfu`  | Model Flops utilization[^1]  |
^legend: #tbl-legend

[^1]: in units of A100 `bfloat16` peak FLOPS

quarto-executable-code-5450563D

```python
#| colab: {base_uri: https://localhost:8080/, height: 35}
#| jupyter: {outputs_hidden: false, source_hidden: false}
trainer.config.device_type
```

quarto-executable-code-5450563D

```python
#| colab: {base_uri: https://localhost:8080/, height: 449}
from rich import print

print(trainer.model)
```

## (partial) Training:

We'll first train for 500 iterations and then evaluate the models performance on the same prompt:

> What is an LLM?

quarto-executable-code-5450563D

```python
#| colab: {base_uri: https://localhost:8080/, height: 625}
#| execution: {iopub.status.busy: '2023-11-30T16:36:28.946773Z', iopub.status.idle: '2023-11-30T16:36:28.946965Z', shell.execute_reply: '2023-11-30T16:36:28.946874Z', shell.execute_reply.started: '2023-11-30T16:36:28.946865Z'}
#| jupyter: {outputs_hidden: false, source_hidden: false}
trainer.train(train_iters=500)
```

quarto-executable-code-5450563D

```python
#| colab: {base_uri: https://localhost:8080/, height: 321}
import time

query = "What is an LLM?"
t0 = time.perf_counter()
outputs = trainer.evaluate(
    query,
    num_samples=1,
    max_new_tokens=256,
    top_k=16,
    display=False
)
logger.info(f'took: {time.perf_counter() - t0:.4f}s')
logger.info(f"['prompt']: '{query}'")
logger.info("['response']:\n\n" + fr"{outputs['0']['raw']}")
```

## Resume Training...

quarto-executable-code-5450563D

```python
#| colab: {base_uri: https://localhost:8080/, height: 832}
trainer.train()
```

## Evaluate Model

quarto-executable-code-5450563D

```python
#| colab: {base_uri: https://localhost:8080/, height: 209}
#| jupyter: {outputs_hidden: false, source_hidden: false}
import time

query = "What is an LLM?"
t0 = time.perf_counter()
outputs = trainer.evaluate(
    query,
    num_samples=1,
    max_new_tokens=256,
    top_k=2,
    display=False
)
logger.info(f'took: {time.perf_counter() - t0:.4f}s')
logger.info(f"['prompt']: '{query}'")
logger.info("['response']:\n\n" + fr"{outputs['0']['raw']}")
```

in units of A100 bfloat16 peak FLOPS↩︎