wordplay ๐ŸŽฎ ๐Ÿ’ฌ: Shakespeare โœ๏ธ

Author
Affiliation
Published

July 22, 2025

Modified

July 27, 2025

We will be using the Shakespeare dataset to train a (~ small) 10M param LLM from scratch.


Image generated from stabilityai/stable-diffusion on ๐Ÿค— Spaces.

Prompt Details
  • Prompt:
  • Shakespeare himself, dressed in full Shakespearean garb, writing code at a modern workstation with multiple monitors, hacking away profusely, backlit, high quality for publication

  • Negative Prompt:
  • low quality, 3d, photorealistic, ugly

Install / Setup

Warning!

IF YOU ARE EXECUTING ON GOOGLE COLAB:

You will need to restart your runtime (Runtime \rightarrow\, Restart runtime)
after executing the following cell:

%%bash

python3 -c 'import wordplay; print(wordplay.__file__)' 2> '/dev/null'

if [[ $? -eq 0 ]]; then
    echo "Has wordplay installed. Nothing to do."
else
    echo "Does not have wordplay installed. Installing..."
    git clone 'https://github.com/saforem2/wordplay'
    python3 wordplay/data/shakespeare_char/prepare.py
    python3 wordplay/data/shakespeare/prepare.py
    python3 -m pip install deepspeed
    python3 -m pip install -e wordplay
fi
/Users/samforeman/projects/saforem2/wordplay/src/wordplay/__init__.py
Has wordplay installed. Nothing to do.

Post Install

If installed correctly, you should be able to:

>>> import wordplay
>>> wordplay.__file__
'/path/to/wordplay/src/wordplay/__init__.py'
%load_ext autoreload
%autoreload 2
import os
import sys
import ezpz

os.environ['COLORTERM'] = 'truecolor'
if sys.platform == 'darwin':
    # If running on MacOS:
    # os.environ['PYTORCH_ENABLE_MPS_FALLBACK'] = '1'
    os.environ['TORCH_DEVICE'] = 'cpu'
# -----------------------------------------------

logger = ezpz.get_logger()

import wordplay
logger.info(wordplay.__file__)
[07/27/25 11:24:20] INFO     Setting logging level to 'INFO' on 'RANK == 0'                         __init__.py:265
                    INFO     Setting logging level to 'CRITICAL' on all others 'RANK != 0'          __init__.py:266
[07/27/25 11:24:20] INFO     /Users/samforeman/projects/saforem2/wordplay/src/wordplay/__init__.py 2338663768.py:17

Build Trainer

Explicitly, we:

  1. setup_torch(...)
  2. Build cfg: DictConfig = get_config(...)
  3. Instnatiate config: ExperimentConfig = instantiate(cfg)
  4. Build trainer = Trainer(config)
import os
import numpy as np
from ezpz import setup
from hydra.utils import instantiate
from wordplay.configs import get_config, PROJECT_ROOT
from wordplay.trainer import Trainer

HF_DATASETS_CACHE = PROJECT_ROOT.joinpath('.cache', 'huggingface')
HF_DATASETS_CACHE.mkdir(exist_ok=True, parents=True)

os.environ['HF_DATASETS_CACHE'] = HF_DATASETS_CACHE.as_posix()

BACKEND = 'DDP'

rank = setup(
    framework='pytorch',
    backend=BACKEND,
    seed=1234,
)

cfg = get_config(
    [
        'data=shakespeare',
        'model=shakespeare',
        'model.batch_size=1',
        'model.block_size=128',
        'optimizer=shakespeare',
        'train=shakespeare',
        f'train.backend={BACKEND}',
        'train.compile=false',
        'train.dtype=bfloat16',
        'train.max_iters=500',
        'train.log_interval=10',
        'train.eval_interval=50',
    ]
)
config = instantiate(cfg)
                    INFO     Setting HF_DATASETS_CACHE to                                             configs.py:81
                             /Users/samforeman/projects/saforem2/wordplay/.cache/huggingface/datasets              
                    WARNING  Caught TORCH_DEVICE=cpu from environment!                                  dist.py:639
                    WARNING  Caught TORCH_DEVICE=cpu from environment!                                  dist.py:639
                    WARNING  Caught TORCH_DEVICE=cpu from environment!                                  dist.py:639
                    INFO     Using fw='ddp' with torch_{device,backend}= {cpu, gloo}                   dist.py:1159
                    INFO     Caught MASTER_PORT=57747 from environment!                                dist.py:1026
                    INFO     Using torch.distributed.init_process_group with                           dist.py:1042
                             - master_addr='Sams-MacBook-Pro-2.local'                                              
                             - master_port='57747'                                                                 
                             - world_size=1                                                                        
                             - rank=0                                                                              
                             - local_rank=0                                                                        
                             - timeout=datetime.timedelta(seconds=3600)                                            
                             - backend='gloo'                                                                      
                    INFO     Calling torch.distributed.init_process_group_with: rank=0 world_size=1     dist.py:759
                             backend=gloo                                                                          
                    WARNING  Caught TORCH_DEVICE=cpu from environment!                                  dist.py:639
                    WARNING  Caught TORCH_DEVICE=cpu from environment!                                  dist.py:639
                    INFO     Using device='cpu' with backend='gloo' + 'gloo' for distributed training. dist.py:1377
                    INFO     ['Sams-MacBook-Pro-2.local'][0/0]                                         dist.py:1422
                    INFO     Loading train from                                                      configs.py:317
                             /Users/samforeman/projects/saforem2/wordplay/data/shakespeare_char/trai               
                             n.bin                                                                                 
                    INFO     Loading val from                                                        configs.py:317
                             /Users/samforeman/projects/saforem2/wordplay/data/shakespeare_char/val.               
                             bin                                                                                   
                    INFO     Tokens per iteration: 128                                               configs.py:442
                    WARNING  Caught TORCH_DEVICE=cpu from environment!                                  dist.py:639
                    INFO     Using self.ptdtype=torch.bfloat16 on self.device_type='cpu'             configs.py:465
                    INFO     Initializing a new model from scratch                                   configs.py:471

Build Trainer object

trainer = Trainer(config)
                    INFO     Initializing a new model from scratch                                   trainer.py:235
                    INFO     number of parameters: 10.65M                                              model.py:255
                    INFO     Model size: num_params=10646784                                         trainer.py:252
                    INFO     num decayed parameter tensors: 26, with 10,690,944 parameters             model.py:445
                    INFO     num non-decayed parameter tensors: 13, with 4,992 parameters              model.py:449
                    INFO     using fused AdamW: False                                                  model.py:465
                    CRITICAL "devid='cpu:0'"                                                         trainer.py:308
                    INFO     โ€ข self.model=GPT(                                                       trainer.py:347
                               (transformer): ModuleDict(                                                          
                                 (wte): Embedding(65, 384)                                                         
                                 (wpe): Embedding(128, 384)                                                        
                                 (drop): Dropout(p=0.2, inplace=False)                                             
                                 (h): ModuleList(                                                                  
                                   (0-5): 6 x Block(                                                               
                                     (ln_1): LayerNorm()                                                           
                                     (attn): CausalSelfAttention(                                                  
                                       (c_attn): Linear(in_features=384, out_features=1152,                        
                             bias=False)                                                                           
                                       (c_proj): Linear(in_features=384, out_features=384,                         
                             bias=False)                                                                           
                                       (attn_dropout): Dropout(p=0.2, inplace=False)                               
                                       (resid_dropout): Dropout(p=0.2, inplace=False)                              
                                     )                                                                             
                                     (ln_2): LayerNorm()                                                           
                                     (mlp): MLP(                                                                   
                                       (c_fc): Linear(in_features=384, out_features=1536,                          
                             bias=False)                                                                           
                                       (act_fn): GELU(approximate='none')                                          
                                       (c_proj): Linear(in_features=1536, out_features=384,                        
                             bias=False)                                                                           
                                       (dropout): Dropout(p=0.2, inplace=False)                                    
                                     )                                                                             
                                   )                                                                               
                                 )                                                                                 
                                 (ln_f): LayerNorm()                                                               
                               )                                                                                   
                               (lm_head): Linear(in_features=384, out_features=65, bias=False)                     
                             )                                                                                     
                    INFO     โ€ข self.grad_scaler=None                                                 trainer.py:348
                    INFO     โ€ข self.model_engine=GPT(                                                trainer.py:349
                               (transformer): ModuleDict(                                                          
                                 (wte): Embedding(65, 384)                                                         
                                 (wpe): Embedding(128, 384)                                                        
                                 (drop): Dropout(p=0.2, inplace=False)                                             
                                 (h): ModuleList(                                                                  
                                   (0-5): 6 x Block(                                                               
                                     (ln_1): LayerNorm()                                                           
                                     (attn): CausalSelfAttention(                                                  
                                       (c_attn): Linear(in_features=384, out_features=1152,                        
                             bias=False)                                                                           
                                       (c_proj): Linear(in_features=384, out_features=384,                         
                             bias=False)                                                                           
                                       (attn_dropout): Dropout(p=0.2, inplace=False)                               
                                       (resid_dropout): Dropout(p=0.2, inplace=False)                              
                                     )                                                                             
                                     (ln_2): LayerNorm()                                                           
                                     (mlp): MLP(                                                                   
                                       (c_fc): Linear(in_features=384, out_features=1536,                          
                             bias=False)                                                                           
                                       (act_fn): GELU(approximate='none')                                          
                                       (c_proj): Linear(in_features=1536, out_features=384,                        
                             bias=False)                                                                           
                                       (dropout): Dropout(p=0.2, inplace=False)                                    
                                     )                                                                             
                                   )                                                                               
                                 )                                                                                 
                                 (ln_f): LayerNorm()                                                               
                               )                                                                                   
                               (lm_head): Linear(in_features=384, out_features=65, bias=False)                     
                             )                                                                                     
                    INFO     โ€ข self.optimizer=AdamW (                                                trainer.py:350
                             Parameter Group 0                                                                     
                                 amsgrad: False                                                                    
                                 betas: (0.9, 0.99)                                                                
                                 capturable: False                                                                 
                                 decoupled_weight_decay: True                                                      
                                 differentiable: False                                                             
                                 eps: 1e-08                                                                        
                                 foreach: None                                                                     
                                 fused: None                                                                       
                                 lr: 0.001                                                                         
                                 maximize: False                                                                   
                                 weight_decay: 0.1                                                                 
                                                                                                                   
                             Parameter Group 1                                                                     
                                 amsgrad: False                                                                    
                                 betas: (0.9, 0.99)                                                                
                                 capturable: False                                                                 
                                 decoupled_weight_decay: True                                                      
                                 differentiable: False                                                             
                                 eps: 1e-08                                                                        
                                 foreach: None                                                                     
                                 fused: None                                                                       
                                 lr: 0.001                                                                         
                                 maximize: False                                                                   
                                 weight_decay: 0.0                                                                 
                             )                                                                                     

Prompt (prior to training)

query = "What is an LLM?"
outputs = trainer.evaluate(
    query,
    num_samples=1,
    max_new_tokens=256,
    top_k=16,
    display=False
)
logger.info(f"['prompt']: '{query}'")
logger.info("['response']:\n\n" + fr"{outputs['0']['raw']}")
[07/27/25 11:24:22] INFO     ['prompt']: 'What is an LLM?'                                          3496000222.py:9
                    INFO     ['response']:                                                         3496000222.py:10
                                                                                                                   
                             What is an                                                                            
                             LLM?A,,osy'exx.ff.fpppxv;;'vt3QjYhhvvYAhowQwwQ,eqeqG;X.YqqQSZQWLsyccc                 
                             cj:ZhaooxkkcfkZ                                                                       
                             ffop- f,hqWl                                                                          
                             oocpppUqAQ;cc''bQqcWAttrqerrwyqqsrqttqYeqWQs'tottcqestbqbbrpWbWYApppp                 
                             BqfhcqqYqqM?qttqQU'gYe?A..'S'rtppW'fJf;??qn.pwrrrqqfA;!!A,,,AtqqqqbW;                 
                             bSoW;;?;;;qQ;;cIA.'M;''g                                                              

Train Model

name description
step Current training step
loss Loss value
dt Time per step (in ms)
sps Samples per second
mtps (million) Tokens per sec
mfu Model Flops utilization1

^legend: #tbl-legend

trainer.config.device_type
'cpu'
from rich import print

print(trainer.model)
GPT(
  (transformer): ModuleDict(
    (wte): Embedding(65, 384)
    (wpe): Embedding(128, 384)
    (drop): Dropout(p=0.2, inplace=False)
    (h): ModuleList(
      (0-5): 6 x Block(
        (ln_1): LayerNorm()
        (attn): CausalSelfAttention(
          (c_attn): Linear(in_features=384, out_features=1152, bias=False)
          (c_proj): Linear(in_features=384, out_features=384, bias=False)
          (attn_dropout): Dropout(p=0.2, inplace=False)
          (resid_dropout): Dropout(p=0.2, inplace=False)
        )
        (ln_2): LayerNorm()
        (mlp): MLP(
          (c_fc): Linear(in_features=384, out_features=1536, bias=False)
          (act_fn): GELU(approximate='none')
          (c_proj): Linear(in_features=1536, out_features=384, bias=False)
          (dropout): Dropout(p=0.2, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm()
  )
  (lm_head): Linear(in_features=384, out_features=65, bias=False)
)

(partial) Training:

Weโ€™ll first train for 500 iterations and then evaluate the models performance on the same prompt:

What is an LLM?

trainer.train(train_iters=500)
                Training Legend                 
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ        abbr โ”ƒ desc                           โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚        step โ”‚ Current training iteration     โ”‚
โ”‚        loss โ”‚ Loss value                     โ”‚
โ”‚          dt โ”‚ Elapsed time per training step โ”‚
โ”‚         dtf โ”‚ Elapsed time per forward step  โ”‚
โ”‚         dtb โ”‚ Elapsed time per backward step โ”‚
โ”‚         sps โ”‚ Samples per second             โ”‚
โ”‚ sps_per_gpu โ”‚ Samples per second (per GPU)   โ”‚
โ”‚         tps โ”‚ Tokens per second              โ”‚
โ”‚ tps_per_gpu โ”‚ Tokens per second (per GPU)    โ”‚
โ”‚         mfu โ”‚ Model flops utilization        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
[07/27/25 11:24:24] INFO     ['prompt']: 'What is an LLM?'                                           trainer.py:790
                    INFO     ['response']:                                                           trainer.py:794
                                                                                                                   
                             What is an                                                                            
                             LLM?wCw'.AAAfxo..'yfAQfppyybvFYerr.MfYZAcLyQQCkkexx-3lllrpMqxkko-rZx3b'               
                             3j-ffSSoqq3hhdf'Q''aq'wqqsoKZb'ec3ZAAA;;o,qff..'fArttgbYtturcbcSYrS-Fff               
                             'wwwerwPgJ;.e;yY-SpuyeexqYqgQtpMSYqYgbtQqq''';pfsw,';oA;qqeqcckSAo,,roo               
                             MgyQha'''fAA..gg;;'ggtSvrupptkeweqqcqqkk-SvYYIv                                       
[07/27/25 11:24:28] INFO     step=10 loss=4.28757 dt=0.0185177 dtf=0.0181899 dtb=0.000141292         trainer.py:850
                             sps=54.0022 sps_per_gpu=54.0022 tps=6912.29 tps_per_gpu=6912.29                       
                             mfu=0.149367                                                                          
[07/27/25 11:24:29] INFO     step=20 loss=4.28569 dt=0.019256 dtf=0.0186655 dtb=0.000153666          trainer.py:850
                             sps=51.932 sps_per_gpu=51.932 tps=6647.29 tps_per_gpu=6647.29                         
                             mfu=0.148794                                                                          
                    INFO     step=30 loss=4.19012 dt=0.0191065 dtf=0.018786 dtb=0.000126125          trainer.py:850
                             sps=52.3382 sps_per_gpu=52.3382 tps=6699.29 tps_per_gpu=6699.29                       
                             mfu=0.148391                                                                          
                    INFO     step=40 loss=4.26634 dt=0.0181073 dtf=0.0177951 dtb=0.00012475          trainer.py:850
                             sps=55.2262 sps_per_gpu=55.2262 tps=7068.96 tps_per_gpu=7068.96                       
                             mfu=0.148827                                                                          
                    INFO     step=50 loss=4.22804 dt=0.0180129 dtf=0.0176396 dtb=0.0001405           trainer.py:850
                             sps=55.5158 sps_per_gpu=55.5158 tps=7106.03 tps_per_gpu=7106.03                       
                             mfu=0.1493                                                                            
[07/27/25 11:24:31] INFO     ['prompt']: 'What is an LLM?'                                           trainer.py:790
                    INFO     ['response']:                                                           trainer.py:794
                                                                                                                   
                             What is an LLM?fwxx yY'eyffpCx?ZZZ.eevfeesxqQQYoqapxxxsZ                              
                             vrvb'oZ3qoh33roArW;aafAA''f''QYqAob.aqo.Qyyegg'VcqqYbq3AaFskkcAkfvjb'QQ               
                             tqQfArWA;Qp'k'goWoq;bbrppfQSYy,,,qqqqMsQuAQ'qgoowqqstSpgli-gggggjGG;ctt               
                             SAA.pYYIoMSYu;QQSv;?gjJf'eQQQ;yg'Mgo-b';ccIffQSqAA'rqqcII?;;'ecWWllc;''               
                             ;                                                                                     
[07/27/25 11:24:34] INFO     Saving checkpoint to:                                                   trainer.py:733
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example                                                           
                    INFO     Saving model to:                                                        trainer.py:734
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example/model.pth                                                 
                    INFO     Appending                                                               configs.py:141
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example to                                                        
                             /Users/samforeman/projects/saforem2/wordplay/src/ckpts/checkpoints.log                
                    INFO     step=60 loss=4.20216 dt=0.0180862 dtf=0.0177977 dtb=0.000120792         trainer.py:850
                             sps=55.2906 sps_per_gpu=55.2906 tps=7077.2 tps_per_gpu=7077.2                         
                             mfu=0.149663                                                                          
[07/27/25 11:24:35] INFO     step=70 loss=4.20029 dt=0.0178861 dtf=0.0175673 dtb=0.000132417         trainer.py:850
                             sps=55.9093 sps_per_gpu=55.9093 tps=7156.39 tps_per_gpu=7156.39                       
                             mfu=0.150161                                                                          
                    INFO     step=80 loss=4.14463 dt=0.0184771 dtf=0.0181706 dtb=0.000118916         trainer.py:850
                             sps=54.1211 sps_per_gpu=54.1211 tps=6927.5 tps_per_gpu=6927.5                         
                             mfu=0.150114                                                                          
                    INFO     step=90 loss=4.14377 dt=0.0182619 dtf=0.0179472 dtb=0.000122042         trainer.py:850
                             sps=54.7588 sps_per_gpu=54.7588 tps=7009.12 tps_per_gpu=7009.12                       
                             mfu=0.150249                                                                          
                    INFO     step=100 loss=4.24105 dt=0.0204619 dtf=0.0201264 dtb=0.000129           trainer.py:850
                             sps=48.8714 sps_per_gpu=48.8714 tps=6255.54 tps_per_gpu=6255.54                       
                             mfu=0.148741                                                                          
[07/27/25 11:24:37] INFO     ['prompt']: 'What is an LLM?'                                           trainer.py:790
                    INFO     ['response']:                                                           trainer.py:794
                                                                                                                   
                             What is an                                                                            
                             LLM?f'xfAhf.qYEZQyyoo--AA,QQAAstpMfYhjc'c..MAj'FF,a33lx.adbssxvVhfsMwyQ               
                             Yosoooc'hzgSSrq.vZZZcq33Sk                                                            
                             ''vaq.w3AmA'..aYjye'ksr'gbvv,,hqb'eSJJm',rSeqfvrrrW;;bZSS:SqeWtttuYgJvk               
                             oBggSA'wst:Sur'txx'rSSqbb;;Qq-;.MsooowbqqqnSpBqSosgggtoo'e;''kG;'g-bWWo               
                             qetQ''os'q'tptSSSYe;                                                                  
[07/27/25 11:24:40] INFO     Saving checkpoint to:                                                   trainer.py:733
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example                                                           
                    INFO     Saving model to:                                                        trainer.py:734
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example/model.pth                                                 
                    INFO     Appending                                                               configs.py:141
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example to                                                        
                             /Users/samforeman/projects/saforem2/wordplay/src/ckpts/checkpoints.log                
                    INFO     step=110 loss=4.30091 dt=0.0177751 dtf=0.0174488 dtb=0.000141917        trainer.py:850
                             sps=56.2584 sps_per_gpu=56.2584 tps=7201.07 tps_per_gpu=7201.07                       
                             mfu=0.149428                                                                          
[07/27/25 11:24:41] INFO     step=120 loss=4.23854 dt=0.018823 dtf=0.0184793 dtb=0.000144208         trainer.py:850
                             sps=53.1265 sps_per_gpu=53.1265 tps=6800.19 tps_per_gpu=6800.19                       
                             mfu=0.149179                                                                          
                    INFO     step=130 loss=4.21194 dt=0.0217986 dtf=0.0213945 dtb=0.000134209        trainer.py:850
                             sps=45.8745 sps_per_gpu=45.8745 tps=5871.93 tps_per_gpu=5871.93                       
                             mfu=0.14695                                                                           
                    INFO     step=140 loss=4.30343 dt=0.018252 dtf=0.0179529 dtb=0.000112917         trainer.py:850
                             sps=54.7886 sps_per_gpu=54.7886 tps=7012.95 tps_per_gpu=7012.95                       
                             mfu=0.147409                                                                          
                    INFO     step=150 loss=4.25562 dt=0.019516 dtf=0.0191642 dtb=0.000136958         trainer.py:850
                             sps=51.2401 sps_per_gpu=51.2401 tps=6558.74 tps_per_gpu=6558.74                       
                             mfu=0.146841                                                                          
[07/27/25 11:24:43] INFO     ['prompt']: 'What is an LLM?'                                           trainer.py:790
                    INFO     ['response']:                                                           trainer.py:794
                                                                                                                   
                             What is an                                                                            
                             LLM?vXvZoQQoLqQewerA'-''.qqQtXxx'V333jo'gQUoojxttYyfQOCCAASc-sseS                     
                                                                                                                   
                             r.GexS-                                                                               
                             Dv'acQqjpwptxxqqZ!!fqzAAf.v3aag;vYgg'fqY:n;QsrkoBQhbYYQQgoMbZg;;cLf..WS               
                             SJhppMSkggkkkkooqWWQ'';xheuAA;pppcSQQqq;??ZppBkqeQsgb'SpWbrr;.gSbbqq;;f               
                             .t'gIBq;;WtgbW,rWWYAAqttMA''ggQQQnxrrrrh;;!                                           
[07/27/25 11:24:46] INFO     Saving checkpoint to:                                                   trainer.py:733
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example                                                           
                    INFO     Saving model to:                                                        trainer.py:734
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example/model.pth                                                 
                    INFO     Appending                                                               configs.py:141
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example to                                                        
                             /Users/samforeman/projects/saforem2/wordplay/src/ckpts/checkpoints.log                
                    INFO     step=160 loss=4.22457 dt=0.0185443 dtf=0.0182662 dtb=0.000111875        trainer.py:850
                             sps=53.9248 sps_per_gpu=53.9248 tps=6902.38 tps_per_gpu=6902.38                       
                             mfu=0.147072                                                                          
                    INFO     step=170 loss=4.20268 dt=0.0178489 dtf=0.0175266 dtb=0.000147375        trainer.py:850
                             sps=56.0259 sps_per_gpu=56.0259 tps=7171.32 tps_per_gpu=7171.32                       
                             mfu=0.147861                                                                          
[07/27/25 11:24:47] INFO     step=180 loss=4.23688 dt=0.0191321 dtf=0.0187527 dtb=0.000175709        trainer.py:850
                             sps=52.2681 sps_per_gpu=52.2681 tps=6690.32 tps_per_gpu=6690.32                       
                             mfu=0.147532                                                                          
                    INFO     step=190 loss=4.28941 dt=0.0229258 dtf=0.0225994 dtb=0.00012675         trainer.py:850
                             sps=43.6189 sps_per_gpu=43.6189 tps=5583.22 tps_per_gpu=5583.22                       
                             mfu=0.144844                                                                          
                    INFO     step=200 loss=4.25317 dt=0.0195566 dtf=0.019196 dtb=0.000148084         trainer.py:850
                             sps=51.1336 sps_per_gpu=51.1336 tps=6545.1 tps_per_gpu=6545.1                         
                             mfu=0.144503                                                                          
[07/27/25 11:24:49] INFO     ['prompt']: 'What is an LLM?'                                           trainer.py:790
                    INFO     ['response']:                                                           trainer.py:794
                                                                                                                   
                             What is an                                                                            
                             LLM?qervyyf.af3VAAowAoooooBQecAAqrxxxtXptxGQUVVcNYhhhck;;ooc'DaVqLZZZcP               
                             '''GGl..ooosZppV!333QqYYfQSYUUoofkm.tpcq'e''3esseeqqe;;!f'sx'MBfQttopp,               
                             qccQn3tgQSk-sffQnpSoo'gYpqqQn';qqecAAS'?AAASYf';pMt??pSSpptSbbYj-tWWYQY               
                             ?gYIfkqg.nn'gqqc'gtqqtS??A'tu?MBBp???qq;;??A,,,                                       
[07/27/25 11:24:52] INFO     Saving checkpoint to:                                                   trainer.py:733
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example                                                           
                    INFO     Saving model to:                                                        trainer.py:734
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example/model.pth                                                 
                    INFO     Appending                                                               configs.py:141
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example to                                                        
                             /Users/samforeman/projects/saforem2/wordplay/src/ckpts/checkpoints.log                
                    INFO     step=210 loss=4.22371 dt=0.017668 dtf=0.0173218 dtb=0.00013425          trainer.py:850
                             sps=56.5996 sps_per_gpu=56.5996 tps=7244.75 tps_per_gpu=7244.75                       
                             mfu=0.145707                                                                          
                    INFO     step=220 loss=4.23227 dt=0.018459 dtf=0.0181719 dtb=0.000110958         trainer.py:850
                             sps=54.1741 sps_per_gpu=54.1741 tps=6934.29 tps_per_gpu=6934.29                       
                             mfu=0.146121                                                                          
[07/27/25 11:24:53] INFO     step=230 loss=4.22308 dt=0.0179255 dtf=0.0176206 dtb=0.000137           trainer.py:850
                             sps=55.7864 sps_per_gpu=55.7864 tps=7140.67 tps_per_gpu=7140.67                       
                             mfu=0.146939                                                                          
                    INFO     step=240 loss=4.23777 dt=0.0191189 dtf=0.0187767 dtb=0.000147041        trainer.py:850
                             sps=52.3043 sps_per_gpu=52.3043 tps=6694.95 tps_per_gpu=6694.95                       
                             mfu=0.146712                                                                          
                    INFO     step=250 loss=4.24408 dt=0.0223422 dtf=0.0219412 dtb=0.000145083        trainer.py:850
                             sps=44.7583 sps_per_gpu=44.7583 tps=5729.07 tps_per_gpu=5729.07                       
                             mfu=0.144421                                                                          
[07/27/25 11:24:55] INFO     ['prompt']: 'What is an LLM?'                                           trainer.py:790
                    INFO     ['response']:                                                           trainer.py:794
                                                                                                                   
                             What is an LLM?.rooffAA                                                               
                             rW,,aAA'GoA,aUVVcCoGhvZZcd.QEcNAgxvwYa'haccX.aqo?rrQQ;;QbZ                            
                             '''fc3FqqWk.'oceQ-h!?Yvs'rw--Qc'333-.hq3AwvvcLq','J-w'''rhqWo--;hSQgSqq               
                             ;?rqYygAA,asso;q33AA'rbv,J-fof'g'SJJ,;ttcqq;'wgybqppaqttof;;;'''qtqaJpu               
                             uYf;paeyfhqg''''qWWbwAA-bbQyg'Sqqos''qYrM;a;??                                        
[07/27/25 11:24:58] INFO     Saving checkpoint to:                                                   trainer.py:733
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example                                                           
                    INFO     Saving model to:                                                        trainer.py:734
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example/model.pth                                                 
                    INFO     Appending                                                               configs.py:141
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example to                                                        
                             /Users/samforeman/projects/saforem2/wordplay/src/ckpts/checkpoints.log                
                    INFO     step=260 loss=4.2759 dt=0.0181479 dtf=0.0178296 dtb=0.000118833         trainer.py:850
                             sps=55.1029 sps_per_gpu=55.1029 tps=7053.17 tps_per_gpu=7053.17                       
                             mfu=0.14522                                                                           
[07/27/25 11:24:59] INFO     step=270 loss=4.31702 dt=0.0175359 dtf=0.0172119 dtb=0.000134667        trainer.py:850
                             sps=57.026 sps_per_gpu=57.026 tps=7299.32 tps_per_gpu=7299.32                         
                             mfu=0.146471                                                                          
                    INFO     step=280 loss=4.20612 dt=0.0180766 dtf=0.0177583 dtb=0.000122875        trainer.py:850
                             sps=55.3202 sps_per_gpu=55.3202 tps=7080.98 tps_per_gpu=7080.98                       
                             mfu=0.147125                                                                          
                    INFO     step=290 loss=4.22943 dt=0.0187801 dtf=0.0184775 dtb=0.000117416        trainer.py:850
                             sps=53.2478 sps_per_gpu=53.2478 tps=6815.72 tps_per_gpu=6815.72                       
                             mfu=0.14714                                                                           
                    INFO     step=300 loss=4.11928 dt=0.022491 dtf=0.0219909 dtb=0.000262625         trainer.py:850
                             sps=44.4622 sps_per_gpu=44.4622 tps=5691.17 tps_per_gpu=5691.17                       
                             mfu=0.144724                                                                          
[07/27/25 11:25:01] INFO     ['prompt']: 'What is an LLM?'                                           trainer.py:790
                    INFO     ['response']:                                                           trainer.py:794
                                                                                                                   
                             What is an                                                                            
                             LLM?L3slghC33vfJQO-eBBBv.Y.Sffs,'gxEUAUCQeswPv,ettLWClrrqeZAtLA.''3NsG.               
                             .''.sAAmebbqYrv''-                                                                    
                             hTkcxhqqVUvvvfv,lxxlAc..3Zpq''Qsk'st;xlneQssssxS;'tt;cb;??rSQ'k--'t::qq               
                             npYbc;nn;WWqqexSe''ftMqYYttttook;;pgSQQcLgycA;;qqbb''aakqrAAk.h''gYbcLL               
                             oopqs:sSSAgZQtiAA.'MMsWllpMt                                                          
[07/27/25 11:25:04] INFO     Saving checkpoint to:                                                   trainer.py:733
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example                                                           
                    INFO     Saving model to:                                                        trainer.py:734
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example/model.pth                                                 
                    INFO     Appending                                                               configs.py:141
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example to                                                        
                             /Users/samforeman/projects/saforem2/wordplay/src/ckpts/checkpoints.log                
                    INFO     step=310 loss=4.23252 dt=0.0201538 dtf=0.0198303 dtb=0.000123166        trainer.py:850
                             sps=49.6185 sps_per_gpu=49.6185 tps=6351.16 tps_per_gpu=6351.16                       
                             mfu=0.143976                                                                          
[07/27/25 11:25:05] INFO     step=320 loss=4.23608 dt=0.0227338 dtf=0.0224232 dtb=0.000121542        trainer.py:850
                             sps=43.9875 sps_per_gpu=43.9875 tps=5630.4 tps_per_gpu=5630.4                         
                             mfu=0.141745                                                                          
                    INFO     step=330 loss=4.25042 dt=0.0215552 dtf=0.0211734 dtb=0.000124791        trainer.py:850
                             sps=46.3925 sps_per_gpu=46.3925 tps=5938.24 tps_per_gpu=5938.24                       
                             mfu=0.140402                                                                          
                    INFO     step=340 loss=4.19956 dt=0.0196884 dtf=0.0193743 dtb=0.000119708        trainer.py:850
                             sps=50.7913 sps_per_gpu=50.7913 tps=6501.28 tps_per_gpu=6501.28                       
                             mfu=0.140411                                                                          
                    INFO     step=350 loss=4.2746 dt=0.021332 dtf=0.0210193 dtb=0.000116208          trainer.py:850
                             sps=46.8778 sps_per_gpu=46.8778 tps=6000.36 tps_per_gpu=6000.36                       
                             mfu=0.139336                                                                          
[07/27/25 11:25:07] INFO     ['prompt']: 'What is an LLM?'                                           trainer.py:790
                    INFO     ['response']:                                                           trainer.py:794
                                                                                                                   
                             What is an LLM?llBZexQZ wwwZrrxxxcqWa                                                 
                             vqqxtqK..aHqQqqqecaask..--'Ve'll3fh3k..ttesscU''aUxhSpepBqqepp                        
                             'QQ-;AqfwetpM                                                                         
                             vSQwbrrZQqa.CAA,,axqbQu''seyex...'';yyfw'gk:SSWQtrrqW''KKpp?ZQU'''tcb?;               
                             ;;WufBWbb;f'ggYQttSk;?;;;?fA..Sbt;n''rrWqqMeeq;b'k'eMwQQtpufAAqQYAWASSe               
                             'qSpqqtLgWoqSk                                                                        
[07/27/25 11:25:10] INFO     Saving checkpoint to:                                                   trainer.py:733
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example                                                           
                    INFO     Saving model to:                                                        trainer.py:734
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example/model.pth                                                 
[07/27/25 11:25:11] INFO     Appending                                                               configs.py:141
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example to                                                        
                             /Users/samforeman/projects/saforem2/wordplay/src/ckpts/checkpoints.log                
                    INFO     step=360 loss=4.3276 dt=0.0184409 dtf=0.0180754 dtb=0.000134208         trainer.py:850
                             sps=54.2272 sps_per_gpu=54.2272 tps=6941.09 tps_per_gpu=6941.09                       
                             mfu=0.140401                                                                          
                    INFO     step=370 loss=4.15959 dt=0.0190562 dtf=0.0186968 dtb=0.000137458        trainer.py:850
                             sps=52.4762 sps_per_gpu=52.4762 tps=6716.96 tps_per_gpu=6716.96                       
                             mfu=0.140876                                                                          
                    INFO     step=380 loss=4.21489 dt=0.0178422 dtf=0.0175323 dtb=0.000121917        trainer.py:850
                             sps=56.0469 sps_per_gpu=56.0469 tps=7174 tps_per_gpu=7174 mfu=0.14229                 
                    INFO     step=390 loss=4.18483 dt=0.0188368 dtf=0.0185477 dtb=0.000112584        trainer.py:850
                             sps=53.0875 sps_per_gpu=53.0875 tps=6795.2 tps_per_gpu=6795.2                         
                             mfu=0.142745                                                                          
[07/27/25 11:25:12] INFO     step=400 loss=4.2439 dt=0.0201257 dtf=0.019817 dtb=0.000122166          trainer.py:850
                             sps=49.6876 sps_per_gpu=49.6876 tps=6360.01 tps_per_gpu=6360.01                       
                             mfu=0.142214                                                                          
[07/27/25 11:25:13] INFO     ['prompt']: 'What is an LLM?'                                           trainer.py:790
                    INFO     ['response']:                                                           trainer.py:794
                                                                                                                   
                             What is an                                                                            
                             LLM?.3YZfxsaskoRbawwqW3fkYfVUB33emX3cxeQ;XAA,E;hqqqAA,VqYoqep.3-S'eh3cP               
                             e''bqqQAh                                                                             
                             fSpppp;!cbWA'fff3feNhaAo,Ax.tqq33-33--fCttppaww-gkttttt,,oWbb'glQWb'WWb               
                             ZexG?b'sWl'tqt?qqQ'M'rhWlfMMe;tc-eqnnfCqYq;'?;t'Mwhqqq'..oooA,rqqfooWkk               
                             jGqqqqqq;fs;QYbWkkf',,.SSSbqqqbqeeqff                                                 
[07/27/25 11:25:16] INFO     Saving checkpoint to:                                                   trainer.py:733
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example                                                           
                    INFO     Saving model to:                                                        trainer.py:734
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example/model.pth                                                 
[07/27/25 11:25:17] INFO     Appending                                                               configs.py:141
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example to                                                        
                             /Users/samforeman/projects/saforem2/wordplay/src/ckpts/checkpoints.log                
                    INFO     step=410 loss=4.23287 dt=0.0214269 dtf=0.0210555 dtb=0.000134084        trainer.py:850
                             sps=46.6704 sps_per_gpu=46.6704 tps=5973.81 tps_per_gpu=5973.81                       
                             mfu=0.140901                                                                          
                    INFO     step=420 loss=4.27257 dt=0.019262 dtf=0.0189329 dtb=0.000124833         trainer.py:850
                             sps=51.9156 sps_per_gpu=51.9156 tps=6645.19 tps_per_gpu=6645.19                       
                             mfu=0.14117                                                                           
                    INFO     step=430 loss=4.18557 dt=0.0198845 dtf=0.0194948 dtb=0.00011475         trainer.py:850
                             sps=50.2904 sps_per_gpu=50.2904 tps=6437.17 tps_per_gpu=6437.17                       
                             mfu=0.140963                                                                          
                    INFO     step=440 loss=4.21616 dt=0.0235005 dtf=0.0231488 dtb=0.000149416        trainer.py:850
                             sps=42.5522 sps_per_gpu=42.5522 tps=5446.68 tps_per_gpu=5446.68                       
                             mfu=0.138637                                                                          
[07/27/25 11:25:18] INFO     step=450 loss=4.23928 dt=0.0193989 dtf=0.0190223 dtb=0.000140167        trainer.py:850
                             sps=51.5494 sps_per_gpu=51.5494 tps=6598.32 tps_per_gpu=6598.32                       
                             mfu=0.139031                                                                          
[07/27/25 11:25:20] INFO     ['prompt']: 'What is an LLM?'                                           trainer.py:790
                    INFO     ['response']:                                                           trainer.py:794
                                                                                                                   
                             What is an                                                                            
                             LLM?weeQQQ''QQ'evfhQQ;K.AEsWqb..CfC.h;vvx''bTopBe'gWvXffv3ebssW.;?ptdee               
                             ep                                                                                    
                             vrr..CCfkqcptyhpwTssWqsAxrqqqehmuZqZ:qeqGGGGauyfxrrAtgSrqWQ,,t;;ppMMgye               
                             qfvfAAqcWYtqqoopepwySkkqggt3bZMqqq;;yybkSJcSQuuurruqqQtttoo''fAqq;;vSJZ               
                             ZZtM''qqM???gWWAAAt??MYYYe;yglAg;up'exuqqWtu                                          
[07/27/25 11:25:23] INFO     Saving checkpoint to:                                                   trainer.py:733
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example                                                           
                    INFO     Saving model to:                                                        trainer.py:734
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example/model.pth                                                 
                    INFO     Appending                                                               configs.py:141
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example to                                                        
                             /Users/samforeman/projects/saforem2/wordplay/src/ckpts/checkpoints.log                
                    INFO     step=460 loss=4.24269 dt=0.0180308 dtf=0.0177195 dtb=0.000118291        trainer.py:850
                             sps=55.4608 sps_per_gpu=55.4608 tps=7098.98 tps_per_gpu=7098.98                       
                             mfu=0.140468                                                                          
                    INFO     step=470 loss=4.26877 dt=0.0187917 dtf=0.0184404 dtb=0.000135           trainer.py:850
                             sps=53.215 sps_per_gpu=53.215 tps=6811.51 tps_per_gpu=6811.51                         
                             mfu=0.14114                                                                           
                    INFO     step=480 loss=4.19188 dt=0.0186805 dtf=0.0183477 dtb=0.00012775         trainer.py:850
                             sps=53.5318 sps_per_gpu=53.5318 tps=6852.07 tps_per_gpu=6852.07                       
                             mfu=0.141833                                                                          
[07/27/25 11:25:24] INFO     step=490 loss=4.22611 dt=0.0206145 dtf=0.0202806 dtb=0.000136666        trainer.py:850
                             sps=48.5095 sps_per_gpu=48.5095 tps=6209.22 tps_per_gpu=6209.22                       
                             mfu=0.141067                                                                          
                    INFO     step=500 loss=4.21804 dt=0.0178817 dtf=0.0175614 dtb=0.000115875        trainer.py:850
                             sps=55.9232 sps_per_gpu=55.9232 tps=7158.17 tps_per_gpu=7158.17                       
                             mfu=0.142428                                                                          
import time

query = "What is an LLM?"
t0 = time.perf_counter()
outputs = trainer.evaluate(
    query,
    num_samples=1,
    max_new_tokens=256,
    top_k=16,
    display=False
)
logger.info(f'took: {time.perf_counter() - t0:.4f}s')
logger.info(f"['prompt']: '{query}'")
logger.info("['response']:\n\n" + fr"{outputs['0']['raw']}")
[07/27/25 11:25:26] INFO     took: 1.7500s                                                         1425179755.py:12
                    INFO     ['prompt']: 'What is an LLM?'                                         1425179755.py:13
                    INFO     ['response']:                                                         1425179755.py:14
                                                                                                                   
                             What is an LLM?fwll                                                                   
                                                                                                                   
                             b3afqbZZI,r                                                                           
                             oppq3A33QoUUye-fwC'3b3.',A'.hhPlVXXqeQyCCC;xfssc;wTTTTcdGoeehQOCXXXB'                 
                             KZ--qehoF3AqfqqW                                                                      
                             cQAcceffGG,'fSJpppww,txMgQs;;;?qf'fSSrpcg?s,A'rr,aso?''o'MtQrrSSgqftt                 
                             ggSc''Wb'qA,.Apcbb???;pYYySQ'agggScWQgbqWfqYroffSYSYhqfk''qfAA,sgWlnZ                 
                             :pt,JynS'gJZes                                                                        

Resume Trainingโ€ฆ

trainer.train()
[07/27/25 11:25:28] INFO     ['prompt']: 'What is an LLM?'                                           trainer.py:790
                    INFO     ['response']:                                                           trainer.py:794
                                                                                                                   
                             What is an                                                                            
                             LLM?wZbbbT'3weew,'foBB.qWWlpwes.qqQevFAA.bbvFF-AkacWWfYhx3fooB'''';vvee               
                             sppWW                                                                                 
                             eeWA3ZZppPZe;dCCvres                                                                  
                             ;ecc--Ws'cqor,JZVVVCCeepfqqWxApBBBBhh;;JeQhMMss,,wshrhW?BiMWYqqwwwAASSw               
                             rrroo,rqtWseMq.Ak'ofA,,'t,,..hh;xx'?sAq';cqxrqWkeMqt'gzAAxhrpqt'g't;?bt               
                             oseq-pqq'qAtttt,eqrM                                                                  
[07/27/25 11:25:31] INFO     Saving checkpoint to:                                                   trainer.py:733
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example                                                           
                    INFO     Saving model to:                                                        trainer.py:734
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example/model.pth                                                 
                    INFO     Appending                                                               configs.py:141
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example to                                                        
                             /Users/samforeman/projects/saforem2/wordplay/src/ckpts/checkpoints.log                
                    INFO     step=510 loss=4.25518 dt=0.0193091 dtf=0.0189602 dtb=0.000122416        trainer.py:850
                             sps=51.789 sps_per_gpu=51.789 tps=6628.99 tps_per_gpu=6628.99                         
                             mfu=0.143245                                                                          
                    INFO     step=520 loss=4.20906 dt=0.0182869 dtf=0.0179924 dtb=0.000112625        trainer.py:850
                             sps=54.684 sps_per_gpu=54.684 tps=6999.56 tps_per_gpu=6999.56                         
                             mfu=0.144046                                                                          
[07/27/25 11:25:32] INFO     step=530 loss=4.22394 dt=0.0183378 dtf=0.0179662 dtb=0.000141666        trainer.py:850
                             sps=54.5322 sps_per_gpu=54.5322 tps=6980.12 tps_per_gpu=6980.12                       
                             mfu=0.144724                                                                          
                    INFO     step=540 loss=4.23923 dt=0.018275 dtf=0.0179809 dtb=0.000123958         trainer.py:850
                             sps=54.7196 sps_per_gpu=54.7196 tps=7004.1 tps_per_gpu=7004.1                         
                             mfu=0.145387                                                                          
                    INFO     step=550 loss=4.24928 dt=0.0200772 dtf=0.0197448 dtb=0.000128708        trainer.py:850
                             sps=49.8077 sps_per_gpu=49.8077 tps=6375.39 tps_per_gpu=6375.39                       
                             mfu=0.144625                                                                          
[07/27/25 11:25:34] INFO     ['prompt']: 'What is an LLM?'                                           trainer.py:790
                    INFO     ['response']:                                                           trainer.py:794
                                                                                                                   
                             What is an                                                                            
                             LLM?wboG',ZZswPZZhsf'V.h;QrppwAfAa''qWWYYfOOx33fvkkfQ'elccB3kkkm....swe               
                             vfsssoAkfQss                                                                          
                             'f;ehewqs3--seuCeerqfQA,XXqooU;?';QhdI'M;;astc;W;?A;p;p',,'''gosS;;WW?'               
                             errs'fwwr''qqWW,w'l;''www''tppwbQWWseSSqYtLtSbQQQ'q;qqM'tbqW,s'r.AAtcbb               
                             q-'ttuuA,;;;Q'S;;;ttMglqYetqeSS;Wq                                                    
[07/27/25 11:25:37] INFO     Saving checkpoint to:                                                   trainer.py:733
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example                                                           
                    INFO     Saving model to:                                                        trainer.py:734
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example/model.pth                                                 
                    INFO     Appending                                                               configs.py:141
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example to                                                        
                             /Users/samforeman/projects/saforem2/wordplay/src/ckpts/checkpoints.log                
                    INFO     step=560 loss=4.21979 dt=0.0185737 dtf=0.0182987 dtb=0.000109708        trainer.py:850
                             sps=53.8395 sps_per_gpu=53.8395 tps=6891.46 tps_per_gpu=6891.46                       
                             mfu=0.145054                                                                          
[07/27/25 11:25:38] INFO     step=570 loss=4.27896 dt=0.018959 dtf=0.0185998 dtb=0.000151583         trainer.py:850
                             sps=52.7454 sps_per_gpu=52.7454 tps=6751.41 tps_per_gpu=6751.41                       
                             mfu=0.145138                                                                          
                    INFO     step=580 loss=4.25036 dt=0.0188471 dtf=0.0184447 dtb=0.00018775         trainer.py:850
                             sps=53.0586 sps_per_gpu=53.0586 tps=6791.5 tps_per_gpu=6791.5                         
                             mfu=0.1453                                                                            
                    INFO     step=590 loss=4.30325 dt=0.021447 dtf=0.0210627 dtb=0.0001295           trainer.py:850
                             sps=46.6266 sps_per_gpu=46.6266 tps=5968.2 tps_per_gpu=5968.2                         
                             mfu=0.143666                                                                          
                    INFO     step=600 loss=4.24977 dt=0.0181719 dtf=0.0174561 dtb=0.000136083        trainer.py:850
                             sps=55.03 sps_per_gpu=55.03 tps=7043.84 tps_per_gpu=7043.84 mfu=0.14452               
[07/27/25 11:25:40] INFO     ['prompt']: 'What is an LLM?'                                           trainer.py:790
                    INFO     ['response']:                                                           trainer.py:794
                                                                                                                   
                             What is an LLM?LQ3vvye! wePZ                                                          
                             ewbAII''QYUfY.vTcaQlccCfhsZblYe''vS'xqosfoxCx'q33ckkxpppcecZZ-caqAb''fQ               
                             -eqb'.AGGGZZ?--s..h.ttppMq3ZQs,e';pwsf..se;;pqtcenr'.nxnqqgbqQYtttM'fSb               
                             ttcqqqqgYYjjrqfAkkSSSuQqoh'''S;SYYYAG;SSSo'QQQuu;'QSfqo'.tgSggkqWYYbbvq               
                             qtuiqrhS;QC'QSrSbWWSJJeuuiWYu                                                         
[07/27/25 11:25:43] INFO     Saving checkpoint to:                                                   trainer.py:733
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example                                                           
                    INFO     Saving model to:                                                        trainer.py:734
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example/model.pth                                                 
                    INFO     Appending                                                               configs.py:141
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example to                                                        
                             /Users/samforeman/projects/saforem2/wordplay/src/ckpts/checkpoints.log                
                    INFO     step=610 loss=4.27699 dt=0.0194192 dtf=0.019049 dtb=0.000122208         trainer.py:850
                             sps=51.4955 sps_per_gpu=51.4955 tps=6591.43 tps_per_gpu=6591.43                       
                             mfu=0.144312                                                                          
                    INFO     step=620 loss=4.2417 dt=0.0203904 dtf=0.0201204 dtb=0.000116084         trainer.py:850
                             sps=49.0427 sps_per_gpu=49.0427 tps=6277.47 tps_per_gpu=6277.47                       
                             mfu=0.143445                                                                          
[07/27/25 11:25:44] INFO     step=630 loss=4.1949 dt=0.0202023 dtf=0.0199125 dtb=0.000115            trainer.py:850
                             sps=49.4992 sps_per_gpu=49.4992 tps=6335.9 tps_per_gpu=6335.9                         
                             mfu=0.142792                                                                          
                    INFO     step=640 loss=4.21554 dt=0.0184285 dtf=0.0181117 dtb=0.000119542        trainer.py:850
                             sps=54.2639 sps_per_gpu=54.2639 tps=6945.78 tps_per_gpu=6945.78                       
                             mfu=0.143522                                                                          
                    INFO     step=650 loss=4.26643 dt=0.0191115 dtf=0.018803 dtb=0.000116417         trainer.py:850
                             sps=52.3245 sps_per_gpu=52.3245 tps=6697.54 tps_per_gpu=6697.54                       
                             mfu=0.143642                                                                          
[07/27/25 11:25:46] INFO     ['prompt']: 'What is an LLM?'                                           trainer.py:790
                    INFO     ['response']:                                                           trainer.py:794
                                                                                                                   
                             What is an                                                                            
                             LLM?qadZ--e'ovTqro'qE'rpAYvrr;qo3AAwUA-sG..qqbaNNyyep;blgWVe''tkaoo,ebq               
                             qUAAAAxttmZS.tGlAxxtccZAk'qffhMM;hqcZ                                                 
                             'rvsoAAtqWtt,'MqWtt'qqqQ--zpttttuq3brqtrrha;WW'eq;cqqqqrrhh-ppq;'SSJrhS               
                             YSJqg'',asqqAhdqbv'?Bqqqb',fqSqt'QqAAWAAqqQQQttttIffvqeWYY--?MfSpppMttt               
                             tBBM'KK..                                                                             
[07/27/25 11:25:49] INFO     Saving checkpoint to:                                                   trainer.py:733
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example                                                           
                    INFO     Saving model to:                                                        trainer.py:734
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example/model.pth                                                 
                    INFO     Appending                                                               configs.py:141
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example to                                                        
                             /Users/samforeman/projects/saforem2/wordplay/src/ckpts/checkpoints.log                
                    INFO     step=660 loss=4.17238 dt=0.0189814 dtf=0.0186691 dtb=0.000131375        trainer.py:850
                             sps=52.6832 sps_per_gpu=52.6832 tps=6743.45 tps_per_gpu=6743.45                       
                             mfu=0.14385                                                                           
[07/27/25 11:25:50] INFO     step=670 loss=4.33205 dt=0.0193104 dtf=0.0189986 dtb=0.000128042        trainer.py:850
                             sps=51.7856 sps_per_gpu=51.7856 tps=6628.56 tps_per_gpu=6628.56                       
                             mfu=0.143789                                                                          
                    INFO     step=680 loss=4.17701 dt=0.0183742 dtf=0.0180271 dtb=0.000151375        trainer.py:850
                             sps=54.4241 sps_per_gpu=54.4241 tps=6966.29 tps_per_gpu=6966.29                       
                             mfu=0.144463                                                                          
                    INFO     step=690 loss=4.23023 dt=0.0177905 dtf=0.0175473 dtb=9.91249e-05        trainer.py:850
                             sps=56.2098 sps_per_gpu=56.2098 tps=7194.85 tps_per_gpu=7194.85                       
                             mfu=0.145564                                                                          
                    INFO     step=700 loss=4.19011 dt=0.0194102 dtf=0.0188519 dtb=0.000118375        trainer.py:850
                             sps=51.5194 sps_per_gpu=51.5194 tps=6594.48 tps_per_gpu=6594.48                       
                             mfu=0.145257                                                                          
[07/27/25 11:25:52] INFO     ['prompt']: 'What is an LLM?'                                           trainer.py:790
                    INFO     ['response']:                                                           trainer.py:794
                                                                                                                   
                             What is an LLM?lrvqqrafQEsA,hrccZZ;'rrkf'c x'Xxqad.SSxtaV!XQUxv;a.'g                  
                             Zto..herovV-qA'K;aZs3ecAq                                                             
                             vqq.!c'fos,ssAAcqfop-;AA.Ag.WYYvvqttxW,,eq;;..Mww';QtMMgqeeqYYppppp;;..               
                             MW'tqYf.ff';ccWYrrS'SAsSohegQrr'rhWSASpgj'.A;;.eqqqqqeWWofYQYtcb'Q;;;tt               
                             tuqcgk;.t3tSbYhhouI;ppp;tSfvgQSuSq                                                    
[07/27/25 11:25:55] INFO     Saving checkpoint to:                                                   trainer.py:733
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example                                                           
                    INFO     Saving model to:                                                        trainer.py:734
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example/model.pth                                                 
                    INFO     Appending                                                               configs.py:141
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example to                                                        
                             /Users/samforeman/projects/saforem2/wordplay/src/ckpts/checkpoints.log                
                    INFO     step=710 loss=4.25752 dt=0.0197687 dtf=0.0193927 dtb=0.000144125        trainer.py:850
                             sps=50.585 sps_per_gpu=50.585 tps=6474.88 tps_per_gpu=6474.88                         
                             mfu=0.144723                                                                          
[07/27/25 11:25:56] INFO     step=720 loss=4.22592 dt=0.0186651 dtf=0.0175268 dtb=0.0001345          trainer.py:850
                             sps=53.5759 sps_per_gpu=53.5759 tps=6857.71 tps_per_gpu=6857.71                       
                             mfu=0.14507                                                                           
                    INFO     step=730 loss=4.18346 dt=0.0178852 dtf=0.017587 dtb=0.000127            trainer.py:850
                             sps=55.9123 sps_per_gpu=55.9123 tps=7156.77 tps_per_gpu=7156.77                       
                             mfu=0.146028                                                                          
                    INFO     step=740 loss=4.22937 dt=0.018805 dtf=0.0184613 dtb=0.000150958         trainer.py:850
                             sps=53.1772 sps_per_gpu=53.1772 tps=6806.69 tps_per_gpu=6806.69                       
                             mfu=0.146133                                                                          
                    INFO     step=750 loss=4.22004 dt=0.0185913 dtf=0.0181662 dtb=0.000108125        trainer.py:850
                             sps=53.7887 sps_per_gpu=53.7887 tps=6884.96 tps_per_gpu=6884.96                       
                             mfu=0.146398                                                                          
[07/27/25 11:25:58] INFO     ['prompt']: 'What is an LLM?'                                           trainer.py:790
                    INFO     ['response']:                                                           trainer.py:794
                                                                                                                   
                             What is an LLM?.AvexhjjsAxx3AAAAffyyY'rr.AxZZpaff.yykfAqYEZ                           
                             'koBf''3YYo.hzA,aaqbbZ                                                                
                             ttQhhxkeQU'qhqqoqq!!'ffor'f.aZPeG'qW.ttvafA-b??fffvfvYrcL.bWtSS??qtLtQu               
                             tohdyyppu''rrSqYqc'KKye''''gjjQq'fgJq;;.'gYqrkssW'tp;bqqf.qowqoMM'qQQSq               
                             qWssgyttu?qoo'ff''kkSSffAr.MggesgIIBBYeeWqqqqg                                        
[07/27/25 11:26:01] INFO     Saving checkpoint to:                                                   trainer.py:733
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example                                                           
                    INFO     Saving model to:                                                        trainer.py:734
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example/model.pth                                                 
                    INFO     Appending                                                               configs.py:141
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example to                                                        
                             /Users/samforeman/projects/saforem2/wordplay/src/ckpts/checkpoints.log                
                    INFO     step=760 loss=4.16349 dt=0.0194697 dtf=0.0191296 dtb=0.000143083        trainer.py:850
                             sps=51.3619 sps_per_gpu=51.3619 tps=6574.33 tps_per_gpu=6574.33                       
                             mfu=0.145964                                                                          
                    INFO     step=770 loss=4.22062 dt=0.0193039 dtf=0.018953 dtb=0.0001385           trainer.py:850
                             sps=51.803 sps_per_gpu=51.803 tps=6630.78 tps_per_gpu=6630.78                         
                             mfu=0.145696                                                                          
[07/27/25 11:26:02] INFO     step=780 loss=4.16916 dt=0.0171542 dtf=0.0168228 dtb=0.000155208        trainer.py:850
                             sps=58.2949 sps_per_gpu=58.2949 tps=7461.74 tps_per_gpu=7461.74                       
                             mfu=0.147251                                                                          
                    INFO     step=790 loss=4.21405 dt=0.0176518 dtf=0.0173884 dtb=0.000118           trainer.py:850
                             sps=56.6515 sps_per_gpu=56.6515 tps=7251.39 tps_per_gpu=7251.39                       
                             mfu=0.148195                                                                          
                    INFO     step=800 loss=4.23569 dt=0.037451 dtf=0.0371191 dtb=0.000127167         trainer.py:850
                             sps=26.7016 sps_per_gpu=26.7016 tps=3417.8 tps_per_gpu=3417.8                         
                             mfu=0.140761                                                                          
[07/27/25 11:26:04] INFO     ['prompt']: 'What is an LLM?'                                           trainer.py:790
                    INFO     ['response']:                                                           trainer.py:794
                                                                                                                   
                             What is an                                                                            
                             LLM??.ahoskZqeofpQe'v;.p..hqYwqaarswbbc.ahwbkkA''KyhvX.yp'Vc3;oseo.xeee               
                             aa'WQqfhKKfYqqqf.x33xx--;;;.egMcc-qaaovvKKOsvSpwesfgI;;wwerpMgtcgQsb;uQ               
                             tggyyptokyy';QCy;;asoW,,Jr''''',AkkfYoAAAAAS::::;;.bWttqeqcbA::gYJJbqgj               
                             oBhopwe;.s''ggkk'qk.qkGWYYyqqe;''Sbs'MM;;.qqqqQ                                       
[07/27/25 11:26:07] INFO     Saving checkpoint to:                                                   trainer.py:733
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example                                                           
                    INFO     Saving model to:                                                        trainer.py:734
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example/model.pth                                                 
                    INFO     Appending                                                               configs.py:141
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example to                                                        
                             /Users/samforeman/projects/saforem2/wordplay/src/ckpts/checkpoints.log                
                    INFO     step=810 loss=4.22317 dt=0.0203105 dtf=0.0199397 dtb=0.000126916        trainer.py:850
                             sps=49.2356 sps_per_gpu=49.2356 tps=6302.16 tps_per_gpu=6302.16                       
                             mfu=0.140303                                                                          
[07/27/25 11:26:08] INFO     step=820 loss=4.24584 dt=0.0213863 dtf=0.0210762 dtb=0.000128834        trainer.py:850
                             sps=46.7589 sps_per_gpu=46.7589 tps=5985.14 tps_per_gpu=5985.14                       
                             mfu=0.139206                                                                          
                    INFO     step=830 loss=4.1855 dt=0.0176513 dtf=0.0172706 dtb=0.000152417         trainer.py:850
                             sps=56.6529 sps_per_gpu=56.6529 tps=7251.58 tps_per_gpu=7251.58                       
                             mfu=0.140955                                                                          
                    INFO     step=840 loss=4.24083 dt=0.018392 dtf=0.0180307 dtb=0.0001385           trainer.py:850
                             sps=54.3716 sps_per_gpu=54.3716 tps=6959.56 tps_per_gpu=6959.56                       
                             mfu=0.141898                                                                          
                    INFO     step=850 loss=4.23785 dt=0.0192448 dtf=0.0189111 dtb=0.000127           trainer.py:850
                             sps=51.9622 sps_per_gpu=51.9622 tps=6651.16 tps_per_gpu=6651.16                       
                             mfu=0.142081                                                                          
[07/27/25 11:26:10] INFO     ['prompt']: 'What is an LLM?'                                           trainer.py:790
                    INFO     ['response']:                                                           trainer.py:794
                                                                                                                   
                             What is an                                                                            
                             LLM?A;QfqrqQ'xxx'aa.hh3vv''wwossqZse'rxfQsseh'.evrpMq''.xxTUeQ'''rqqaxf               
                             xtcbqcf3qq3jZbvcepwA,,,ff'hpqcpcA-A'rv::errrvbbZ:pc-qycSScWlbQYhhwwAA-S               
                             QCgl;bbrpbSrrrrqqqqq''rWqqtcAkYyqgYtxttttbkkqQWWqaqqqkkk,'qqexrrWSSqyyY               
                             j'SyyQYQQ,q''p'---p''tcqzhhhpqWfs.p'foBqqQt::eu                                       
[07/27/25 11:26:13] INFO     Saving checkpoint to:                                                   trainer.py:733
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example                                                           
                    INFO     Saving model to:                                                        trainer.py:734
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example/model.pth                                                 
[07/27/25 11:26:14] INFO     Appending                                                               configs.py:141
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example to                                                        
                             /Users/samforeman/projects/saforem2/wordplay/src/ckpts/checkpoints.log                
                    INFO     step=860 loss=4.20116 dt=0.0179678 dtf=0.0176636 dtb=0.000124459        trainer.py:850
                             sps=55.655 sps_per_gpu=55.655 tps=7123.84 tps_per_gpu=7123.84                         
                             mfu=0.143267                                                                          
                    INFO     step=870 loss=4.22428 dt=0.0205305 dtf=0.0186659 dtb=0.000150667        trainer.py:850
                             sps=48.7079 sps_per_gpu=48.7079 tps=6234.61 tps_per_gpu=6234.61                       
                             mfu=0.142412                                                                          
                    INFO     step=880 loss=4.22977 dt=0.0189898 dtf=0.018688 dtb=0.00011875          trainer.py:850
                             sps=52.6599 sps_per_gpu=52.6599 tps=6740.46 tps_per_gpu=6740.46                       
                             mfu=0.142737                                                                          
                    INFO     step=890 loss=4.22047 dt=0.0202268 dtf=0.0199305 dtb=0.0001135          trainer.py:850
                             sps=49.4395 sps_per_gpu=49.4395 tps=6328.25 tps_per_gpu=6328.25                       
                             mfu=0.142137                                                                          
                    INFO     step=900 loss=4.35563 dt=0.019475 dtf=0.0189142 dtb=0.000115833         trainer.py:850
                             sps=51.348 sps_per_gpu=51.348 tps=6572.54 tps_per_gpu=6572.54                         
                             mfu=0.142126                                                                          
[07/27/25 11:26:16] INFO     ['prompt']: 'What is an LLM?'                                           trainer.py:790
                    INFO     ['response']:                                                           trainer.py:794
                                                                                                                   
                             What is an                                                                            
                             LLM?wwPA'eeew-3ZAjRwqs33eafCq'ax..xcxc''awA',bsettcCvCqqq33A-.bsor.awQf               
                             J$  3a-3b U' Zq3gQQf',,AqGZ                                                           
                             fhhPwU.vfCC.xpqvr.SkkofxsyQrrs';'kGs,rMse''rppb'qqfoktM'qo,qqSqgW,etM'M               
                             ??Z;auYfSSo??gg'sSvSQQqfftcb;;;;pWQSffttqgQSSSkllbrqqaw,'SqqYQ;;;pqqtpB               
                             heW;;;.hn'qYyMMesgl                                                                   
[07/27/25 11:26:19] INFO     Saving checkpoint to:                                                   trainer.py:733
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example                                                           
                    INFO     Saving model to:                                                        trainer.py:734
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example/model.pth                                                 
[07/27/25 11:26:21] INFO     Appending                                                               configs.py:141
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example to                                                        
                             /Users/samforeman/projects/saforem2/wordplay/src/ckpts/checkpoints.log                
                    INFO     step=910 loss=4.19569 dt=0.0184239 dtf=0.0181274 dtb=0.000126584        trainer.py:850
                             sps=54.2774 sps_per_gpu=54.2774 tps=6947.51 tps_per_gpu=6947.51                       
                             mfu=0.142926                                                                          
                    INFO     step=920 loss=4.23206 dt=0.0189052 dtf=0.0186322 dtb=0.00011175         trainer.py:850
                             sps=52.8955 sps_per_gpu=52.8955 tps=6770.62 tps_per_gpu=6770.62                       
                             mfu=0.143264                                                                          
[07/27/25 11:26:22] INFO     step=930 loss=4.29058 dt=0.0204312 dtf=0.0200622 dtb=0.0001525          trainer.py:850
                             sps=48.9446 sps_per_gpu=48.9446 tps=6264.91 tps_per_gpu=6264.91                       
                             mfu=0.142476                                                                          
                    INFO     step=940 loss=4.211 dt=0.0308806 dtf=0.0188316 dtb=0.000154834          trainer.py:850
                             sps=32.3828 sps_per_gpu=32.3828 tps=4145 tps_per_gpu=4145 mfu=0.137185                
                    INFO     step=950 loss=4.18626 dt=0.0178002 dtf=0.0175009 dtb=0.000114584        trainer.py:850
                             sps=56.179 sps_per_gpu=56.179 tps=7190.91 tps_per_gpu=7190.91                         
                             mfu=0.139005                                                                          
[07/27/25 11:26:24] INFO     ['prompt']: 'What is an LLM?'                                           trainer.py:790
                    INFO     ['response']:                                                           trainer.py:794
                                                                                                                   
                             What is an                                                                            
                             LLM?YfQooooRx3xccaHCvj3gllexpjGG,wUxe'oOf.smxxxrq-jj'kxxrkc3fkkeQZZe''Y               
                             R'JhrZZAcowccpqA,QUJZpcAkkGGGqp--.v'appbYYbeeqbbZrk'MBfq-srksqYee'QQt'J               
                             ',qWqt;qkGWbrrtqJ-'pa'ggjJSq--'sf'..;''aqfpfx'Sbbq3tooMbb?',AA-AW'MqAAk               
                             ;ccAGqQqaA;WQhMSq;cffho,eWohpWott3jj---s;?ggIIS                                       
[07/27/25 11:26:27] INFO     Saving checkpoint to:                                                   trainer.py:733
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example                                                           
                    INFO     Saving model to:                                                        trainer.py:734
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example/model.pth                                                 
                    INFO     Appending                                                               configs.py:141
                             /Users/samforeman/projects/saforem2/intro-hpc-bootcamp-2025/content/02-               
                             llms/07-shakespeare-example to                                                        
                             /Users/samforeman/projects/saforem2/wordplay/src/ckpts/checkpoints.log                
                    INFO     step=960 loss=4.225 dt=0.0210933 dtf=0.0207466 dtb=0.00012575           trainer.py:850
                             sps=47.4083 sps_per_gpu=47.4083 tps=6068.27 tps_per_gpu=6068.27                       
                             mfu=0.138218                                                                          
[07/27/25 11:26:28] INFO     step=970 loss=4.17741 dt=0.0178491 dtf=0.0175596 dtb=0.000125458        trainer.py:850
                             sps=56.0252 sps_per_gpu=56.0252 tps=7171.22 tps_per_gpu=7171.22                       
                             mfu=0.139892                                                                          
                    INFO     step=980 loss=4.1707 dt=0.0166487 dtf=0.0163776 dtb=0.000111583         trainer.py:850
                             sps=60.0647 sps_per_gpu=60.0647 tps=7688.28 tps_per_gpu=7688.28                       
                             mfu=0.142516                                                                          
                    INFO     step=990 loss=4.1891 dt=0.0180315 dtf=0.0177192 dtb=0.000119167         trainer.py:850
                             sps=55.4585 sps_per_gpu=55.4585 tps=7098.69 tps_per_gpu=7098.69                       
                             mfu=0.143604                                                                          
                    INFO     step=1000 loss=4.2423 dt=0.022806 dtf=0.0224982 dtb=0.000120917         trainer.py:850
                             sps=43.8482 sps_per_gpu=43.8482 tps=5612.57 tps_per_gpu=5612.57                       
                             mfu=0.141372                                                                          

Evaluate Model

import time

query = "What is an LLM?"
t0 = time.perf_counter()
outputs = trainer.evaluate(
    query,
    num_samples=1,
    max_new_tokens=256,
    top_k=2,
    display=False
)
logger.info(f'took: {time.perf_counter() - t0:.4f}s')
logger.info(f"['prompt']: '{query}'")
logger.info("['response']:\n\n" + fr"{outputs['0']['raw']}")
[07/27/25 11:26:31] INFO     took: 1.7435s                                                          582817405.py:12
                    INFO     ['prompt']: 'What is an LLM?'                                          582817405.py:13
                    INFO     ['response']:                                                          582817405.py:14
                                                                                                                   
                             What is an                                                                            
                             LLM?ZxxA---'aaaaeeewAAAAA'''qqqqqqqqqqqqaeeqqqqqq''333qqAAA33akkk''qqq                
                             qqorrrrrrrrrrqqqqqqq.qe333aaaqqqqqf..qqqqqqq3333333-qqqqbbb''ggSSpMMMq                
                             qqqMMqqqqqqqqWW;?;?;?;???;;??MMMM;;;;;;??;;;;;;;;''''';??qqqqqqqW;;'''                
                             '''''''''';;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;'tttttMM                                    
---
title: '[`wordplay` ๐ŸŽฎ ๐Ÿ’ฌ](https://github.com/saforem2/wordplay): Shakespeare โœ๏ธ'
jupyter: python3
---


We will be using the [Shakespeare dataset](https://github.com/saforem2/wordplay/blob/main/data/shakespeare/readme.md) to train a (~ small) 10M param LLM _from scratch_.

<div>

<div align="center" style="text-align:center;">

<img src="https://github.com/saforem2/wordplay/blob/main/assets/shakespeare.jpeg?raw=true" width="45%" align="center" /><br>

Image generated from [stabilityai/stable-diffusion](https://huggingface.co/spaces/stabilityai/stable-diffusion) on [๐Ÿค— Spaces](https://huggingface.co/spaces).<br>

</div>

<details closed><summary>Prompt Details</summary>

<ul>
<li>Prompt:</li>
<t><q>
Shakespeare himself, dressed in full Shakespearean garb,
writing code at a modern workstation with multiple monitors, hacking away profusely,
backlit, high quality for publication
</q></t>

<li>Negative Prompt:</li>
<t><q>
low quality, 3d, photorealistic, ugly
</q></t>
</ul>

</details>

</div>


## Install / Setup

<div class="alert alert-block alert-warning">
<b>Warning!</b><br>  

**IF YOU ARE EXECUTING ON GOOGLE COLAB**:  

You will need to restart your runtime (`Runtime` $\rightarrow\,$ `Restart runtime`)  
_after_ executing the following cell:

</div>

quarto-executable-code-5450563D

```python
#| colab: {base_uri: https://localhost:8080/}
#| execution: {iopub.execute_input: '2023-11-30T16:35:09.547786Z', iopub.status.busy: '2023-11-30T16:35:09.547243Z', iopub.status.idle: '2023-11-30T16:35:09.697821Z', shell.execute_reply: '2023-11-30T16:35:09.697442Z', shell.execute_reply.started: '2023-11-30T16:35:09.547769Z'}
#| jupyter: {outputs_hidden: false, source_hidden: false}
%%bash

python3 -c 'import wordplay; print(wordplay.__file__)' 2> '/dev/null'

if [[ $? -eq 0 ]]; then
    echo "Has wordplay installed. Nothing to do."
else
    echo "Does not have wordplay installed. Installing..."
    git clone 'https://github.com/saforem2/wordplay'
    python3 wordplay/data/shakespeare_char/prepare.py
    python3 wordplay/data/shakespeare/prepare.py
    python3 -m pip install deepspeed
    python3 -m pip install -e wordplay
fi
```

## Post Install

If installed correctly, you should be able to:

```python
>>> import wordplay
>>> wordplay.__file__
'/path/to/wordplay/src/wordplay/__init__.py'
```

quarto-executable-code-5450563D

```python
#| colab: {base_uri: https://localhost:8080/}
#| execution: {iopub.execute_input: '2023-11-30T16:35:09.818095Z', iopub.status.busy: '2023-11-30T16:35:09.817884Z', iopub.status.idle: '2023-11-30T16:35:10.193029Z', shell.execute_reply: '2023-11-30T16:35:10.192647Z', shell.execute_reply.started: '2023-11-30T16:35:09.818079Z'}
#| jupyter: {outputs_hidden: false, source_hidden: false}
%load_ext autoreload
%autoreload 2
import os
import sys
import ezpz

os.environ['COLORTERM'] = 'truecolor'
if sys.platform == 'darwin':
    # If running on MacOS:
    # os.environ['PYTORCH_ENABLE_MPS_FALLBACK'] = '1'
    os.environ['TORCH_DEVICE'] = 'cpu'
# -----------------------------------------------

logger = ezpz.get_logger()

import wordplay
logger.info(wordplay.__file__)
```

## Build Trainer

Explicitly, we:

1. `setup_torch(...)`
2. Build `cfg: DictConfig = get_config(...)`
3. Instnatiate `config: ExperimentConfig = instantiate(cfg)`
4. Build `trainer = Trainer(config)`

quarto-executable-code-5450563D

```python
#| colab: {base_uri: https://localhost:8080/, height: 481}
#| jupyter: {outputs_hidden: false, source_hidden: false}
import os
import numpy as np
from ezpz import setup
from hydra.utils import instantiate
from wordplay.configs import get_config, PROJECT_ROOT
from wordplay.trainer import Trainer

HF_DATASETS_CACHE = PROJECT_ROOT.joinpath('.cache', 'huggingface')
HF_DATASETS_CACHE.mkdir(exist_ok=True, parents=True)

os.environ['HF_DATASETS_CACHE'] = HF_DATASETS_CACHE.as_posix()

BACKEND = 'DDP'

rank = setup(
    framework='pytorch',
    backend=BACKEND,
    seed=1234,
)

cfg = get_config(
    [
        'data=shakespeare',
        'model=shakespeare',
        'model.batch_size=1',
        'model.block_size=128',
        'optimizer=shakespeare',
        'train=shakespeare',
        f'train.backend={BACKEND}',
        'train.compile=false',
        'train.dtype=bfloat16',
        'train.max_iters=500',
        'train.log_interval=10',
        'train.eval_interval=50',
    ]
)
config = instantiate(cfg)
```

### Build `Trainer` object

quarto-executable-code-5450563D

```python
#| colab: {base_uri: https://localhost:8080/, height: 113}
#| execution: {iopub.execute_input: '2023-11-30T16:35:10.194033Z', iopub.status.busy: '2023-11-30T16:35:10.193752Z', iopub.status.idle: '2023-11-30T16:36:28.944914Z', shell.execute_reply: '2023-11-30T16:36:28.943625Z', shell.execute_reply.started: '2023-11-30T16:35:10.194017Z'}
#| jupyter: {outputs_hidden: false, source_hidden: false}
trainer = Trainer(config)
```

## Prompt (**prior** to training)

quarto-executable-code-5450563D

```python
#| colab: {base_uri: https://localhost:8080/, height: 97}
#| jupyter: {outputs_hidden: false, source_hidden: false}
query = "What is an LLM?"
outputs = trainer.evaluate(
    query,
    num_samples=1,
    max_new_tokens=256,
    top_k=16,
    display=False
)
logger.info(f"['prompt']: '{query}'")
logger.info("['response']:\n\n" + fr"{outputs['0']['raw']}")
```

## Train Model

|  name  |       description            |
|:------:|:----------------------------:|
| `step` | Current training step        |
| `loss` | Loss value                   |
| `dt`   | Time per step (in **ms**)    |
| `sps`  | Samples per second           |
| `mtps` | (million) Tokens per sec     |
| `mfu`  | Model Flops utilization[^1]  |
^legend: #tbl-legend

[^1]: in units of A100 `bfloat16` peak FLOPS

quarto-executable-code-5450563D

```python
#| colab: {base_uri: https://localhost:8080/, height: 35}
#| jupyter: {outputs_hidden: false, source_hidden: false}
trainer.config.device_type
```

quarto-executable-code-5450563D

```python
#| colab: {base_uri: https://localhost:8080/, height: 449}
from rich import print

print(trainer.model)
```

## (partial) Training:

We'll first train for 500 iterations and then evaluate the models performance on the same prompt:

> What is an LLM?

quarto-executable-code-5450563D

```python
#| colab: {base_uri: https://localhost:8080/, height: 625}
#| execution: {iopub.status.busy: '2023-11-30T16:36:28.946773Z', iopub.status.idle: '2023-11-30T16:36:28.946965Z', shell.execute_reply: '2023-11-30T16:36:28.946874Z', shell.execute_reply.started: '2023-11-30T16:36:28.946865Z'}
#| jupyter: {outputs_hidden: false, source_hidden: false}
trainer.train(train_iters=500)
```

quarto-executable-code-5450563D

```python
#| colab: {base_uri: https://localhost:8080/, height: 321}
import time

query = "What is an LLM?"
t0 = time.perf_counter()
outputs = trainer.evaluate(
    query,
    num_samples=1,
    max_new_tokens=256,
    top_k=16,
    display=False
)
logger.info(f'took: {time.perf_counter() - t0:.4f}s')
logger.info(f"['prompt']: '{query}'")
logger.info("['response']:\n\n" + fr"{outputs['0']['raw']}")
```

## Resume Training...

quarto-executable-code-5450563D

```python
#| colab: {base_uri: https://localhost:8080/, height: 832}
trainer.train()
```

## Evaluate Model

quarto-executable-code-5450563D

```python
#| colab: {base_uri: https://localhost:8080/, height: 209}
#| jupyter: {outputs_hidden: false, source_hidden: false}
import time

query = "What is an LLM?"
t0 = time.perf_counter()
outputs = trainer.evaluate(
    query,
    num_samples=1,
    max_new_tokens=256,
    top_k=2,
    display=False
)
logger.info(f'took: {time.perf_counter() - t0:.4f}s')
logger.info(f"['prompt']: '{query}'")
logger.info("['response']:\n\n" + fr"{outputs['0']['raw']}")
```

  1. in units of A100 bfloat16 peak FLOPSโ†ฉ๏ธŽ