nanoGPT

Sam Foreman

Shakespeare

Sam Foreman

November 15, 2023

nanoGPT
Install / Setup
First Time Running
Post Install
Build Trainer
Prompt (prior to training)
Train Model
Evaluate Model

Install / Setup

First Time Running

We need to install ngpt and setup the Shakespeare dataset

This will need to be ran the first time you are running this notebook.

Following the

!python3 -m pip install nanoGPT

you will need to restart your runtime (Runtime -> Restart runtime)

After this, you should be able to

>>> import ngpt
>>> ngpt.__file__
'/content/nanoGPT/src/ngpt/__init__.py'

%%bash

python3 -c 'import ngpt; print(ngpt.__file__)' 2> '/dev/null'

if [[ $? -eq 0 ]]; then
    echo "Has ngpt installed. Nothing to do."
else
    echo "Does not have ngpt installed. Installing..."
    git clone 'https://github.com/saforem2/nanoGPT'
    python3 nanoGPT/data/shakespeare_char/prepare.py
    python3 -m pip install -e nanoGPT -vvv
fi

/lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/ngpt/__init__.py
Has ngpt installed. Nothing to do.

Post Install

If installed correctly, you should be able to:

>>> import ngpt
>>> ngpt.__file__
'/path/to/nanoGPT/src/ngpt/__init__.py'

%load_ext autoreload
%autoreload 2

import ngpt
from rich import print
print(ngpt.__file__)

    The autoreload extension is already loaded. To reload it, use:
      %reload_ext autoreload

    /lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/ngpt/__init__.py

Build Trainer

Explicitly, we:

setup_torch(...)
Build cfg: DictConfig = get_config(...)
Instnatiate config: ExperimentConfig = instantiate(cfg)
Build trainer = Trainer(config)

import os
import numpy as np
from ezpz import setup_torch
from hydra.utils import instantiate
from ngpt.configs import get_config, PROJECT_ROOT
from ngpt.trainer import Trainer
from enrich.console import get_console

console = get_console()
HF_DATASETS_CACHE = PROJECT_ROOT.joinpath('.cache', 'huggingface')
HF_DATASETS_CACHE.mkdir(exist_ok=True, parents=True)

os.environ['MASTER_PORT'] = '5432'
os.environ['HF_DATASETS_CACHE'] = HF_DATASETS_CACHE.as_posix()

rank = setup_torch('DDP', seed=1234)
cfg = get_config(
    [
        'data=shakespeare',
        'model=shakespeare',
        'optimizer=shakespeare',
        'train=shakespeare',
        'train.dtype=bfloat16',
        'train.max_iters=5000',
        'train.log_interval=250',
        'train.eval_interval=1000',
    ]
)
config = instantiate(cfg)
trainer = Trainer(config)

    --------------------------------------------------------------------------
    WARNING: There was an error initializing an OpenFabrics device.
    
      Local host:   thetagpu23
      Local device: mlx5_0
    --------------------------------------------------------------------------
    2023-11-15 09:33:41.578337: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
    To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

    [2023-11-15 09:33:44][INFO][configs.py:263] - Rescaling GAS -> GAS // WORLD_SIZE = 1 // 1
    [2023-11-15 09:33:44][INFO][configs.py:398] - Tokens per iteration: 16,384
    [2023-11-15 09:33:44][INFO][configs.py:430] - Using <torch.amp.autocast_mode.autocast object at 0x7f588e33ddb0>
    [2023-11-15 09:33:44][INFO][configs.py:436] - Initializing a new model from scratch
    [2023-11-15 09:33:44][INFO][trainer.py:179] - Initializing a new model from scratch
    [2023-11-15 09:33:44][INFO][model.py:160] - number of parameters: 10.65M
    [2023-11-15 09:33:45][INFO][model.py:290] - num decayed parameter tensors: 26, with 10,740,096 parameters
    [2023-11-15 09:33:45][INFO][model.py:291] - num non-decayed parameter tensors: 13, with 4,992 parameters
    [2023-11-15 09:33:46][INFO][model.py:297] - using fused AdamW: True
    [2023-11-11 01:15:48][INFO][trainer.py:179] - Initializing a new model from scratch
    [2023-11-11 01:15:48][INFO][model.py:160] - number of parameters: 10.65M
    [2023-11-11 01:15:50][INFO][model.py:290] - num decayed parameter tensors: 26, with 10,740,096 parameters
    [2023-11-11 01:15:50][INFO][model.py:291] - num non-decayed parameter tensors: 13, with 4,992 parameters
    [2023-11-11 01:15:50][INFO][model.py:297] - using fused AdamW: True

Prompt (prior to training)

query = "What is a supercomputer?"
outputs = trainer.evaluate(query, num_samples=1, display=False)
console.print(fr'\[prompt]: "{query}"')
console.print("\[response]:\n\n" + fr"{outputs['0']['raw']}")

    [prompt]: "What is a supercomputer?"

    [response]:

    What is a supercomputer?CbqA-RN?bnss--iadmsD
    S?qJKEwssDq YMSSSFGPxxnJLDC:cvYfOy
    fiXXe3GQQvYQARdEEbbHHPnWyFp-CwBFrr;g
    WATVAcTZCWWr
    tYCCz,E-wqNbIsMbSvYVONyaQzzcs;Iaa?WOACrnMH'':dXFEQZa-PYkAvV.B.  F$J-nKnEaZ,'vpesXY&y-M.nIcMVV!GYYVVFh-UX.G&Fa?LSPrkXd3eKV?KJjSZOwSbbhwfIYaywrvRUEuuQMnnIAZS-Ja.fXrMAHB&!!eVbUFwMIkkalHbmRhwwfcj$:s
    RlVhRcaVbYcTTihITDUbbTNMHEdnOibdB-ebuiJLLS:yarlFYHHkSWxB!hbN?nVm3-&djw'BA uS,EQJP3bbWe$hs-g
    :3jEEYU
    NkLCetHH lc-IIZEBbb-at
    jyNYmvffVVnERN?LnTM:yS
    sH;are$WRip!jbX'
    e

    pyA-jbwK 'B$O& Fvvac&sEjbIretcX-H

Table 1: Legend
Name	Description
`step`	Current training step
`loss`	Loss value
`dt`	Time per step (in ms)
`sps`	Samples per second
`mtps`	(million) Tokens per sec
`mfu`	Model Flops utilization¹

Train Model

trainer.train()

    [2023-11-15 09:34:03][INFO][trainer.py:516] - step=250 loss=2.064 dt=27.412 sps=36.481 mtps=0.598 mfu=13.594 train_loss=4.299 val_loss=4.291
    [2023-11-15 09:34:10][INFO][trainer.py:516] - step=500 loss=1.610 dt=26.915 sps=37.153 mtps=0.609 mfu=13.619 train_loss=4.299 val_loss=4.291
    [2023-11-15 09:34:17][INFO][trainer.py:516] - step=750 loss=1.432 dt=27.775 sps=36.004 mtps=0.590 mfu=13.598 train_loss=4.299 val_loss=4.291
    [2023-11-15 09:34:24][INFO][trainer.py:516] - step=1000 loss=1.346 dt=26.781 sps=37.340 mtps=0.612 mfu=13.630 train_loss=4.299 val_loss=4.291
    [2023-11-15 09:34:28][INFO][trainer.py:432] - Saving checkpoint to: /lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/notebooks
    [2023-11-15 09:34:28][INFO][trainer.py:433] - Saving model to: /lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/notebooks/model.pth
    [2023-11-15 09:34:28][INFO][configs.py:129] - Appending /lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/notebooks to /lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/ngpt/ckpts/checkpoints.log
    [2023-11-15 09:34:35][INFO][trainer.py:516] - step=1250 loss=1.309 dt=27.473 sps=36.400 mtps=0.596 mfu=13.623 train_loss=1.271 val_loss=1.520
    [2023-11-15 09:34:42][INFO][trainer.py:516] - step=1500 loss=1.225 dt=27.261 sps=36.682 mtps=0.601 mfu=13.628 train_loss=1.271 val_loss=1.520
    [2023-11-15 09:34:49][INFO][trainer.py:516] - step=1750 loss=1.176 dt=26.890 sps=37.188 mtps=0.609 mfu=13.651 train_loss=1.271 val_loss=1.520
    [2023-11-15 09:34:56][INFO][trainer.py:516] - step=2000 loss=1.163 dt=26.727 sps=37.415 mtps=0.613 mfu=13.680 train_loss=1.271 val_loss=1.520
    [2023-11-15 09:35:00][INFO][trainer.py:432] - Saving checkpoint to: /lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/notebooks
    [2023-11-15 09:35:00][INFO][trainer.py:433] - Saving model to: /lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/notebooks/model.pth
    [2023-11-15 09:35:00][INFO][configs.py:129] - Appending /lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/notebooks to /lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/ngpt/ckpts/checkpoints.log
    [2023-11-15 09:35:07][INFO][trainer.py:516] - step=2250 loss=1.120 dt=26.733 sps=37.407 mtps=0.613 mfu=13.706 train_loss=1.052 val_loss=1.471
    [2023-11-15 09:35:14][INFO][trainer.py:516] - step=2500 loss=1.068 dt=27.096 sps=36.905 mtps=0.605 mfu=13.710 train_loss=1.052 val_loss=1.471
    [2023-11-15 09:35:21][INFO][trainer.py:516] - step=2750 loss=1.027 dt=26.879 sps=37.204 mtps=0.610 mfu=13.726 train_loss=1.052 val_loss=1.471
    [2023-11-15 09:35:27][INFO][trainer.py:516] - step=3000 loss=1.002 dt=27.375 sps=36.530 mtps=0.599 mfu=13.714 train_loss=1.052 val_loss=1.471
    [2023-11-15 09:35:32][INFO][trainer.py:432] - Saving checkpoint to: /lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/notebooks
    [2023-11-15 09:35:32][INFO][trainer.py:433] - Saving model to: /lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/notebooks/model.pth
    [2023-11-15 09:35:32][INFO][configs.py:129] - Appending /lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/notebooks to /lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/ngpt/ckpts/checkpoints.log
    [2023-11-15 09:35:39][INFO][trainer.py:516] - step=3250 loss=0.950 dt=26.866 sps=37.222 mtps=0.610 mfu=13.730 train_loss=0.864 val_loss=1.531
    [2023-11-15 09:35:45][INFO][trainer.py:516] - step=3500 loss=0.926 dt=27.330 sps=36.590 mtps=0.599 mfu=13.720 train_loss=0.864 val_loss=1.531
    [2023-11-15 09:35:52][INFO][trainer.py:516] - step=3750 loss=0.916 dt=27.203 sps=36.761 mtps=0.602 mfu=13.718 train_loss=0.864 val_loss=1.531
    [2023-11-15 09:35:59][INFO][trainer.py:516] - step=4000 loss=0.901 dt=27.394 sps=36.504 mtps=0.598 mfu=13.706 train_loss=0.864 val_loss=1.531
    [2023-11-15 09:36:03][INFO][trainer.py:432] - Saving checkpoint to: /lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/notebooks
    [2023-11-15 09:36:03][INFO][trainer.py:433] - Saving model to: /lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/notebooks/model.pth
    [2023-11-15 09:36:03][INFO][configs.py:129] - Appending /lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/notebooks to /lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/ngpt/ckpts/checkpoints.log
    [2023-11-15 09:36:10][INFO][trainer.py:516] - step=4250 loss=0.840 dt=26.814 sps=37.293 mtps=0.611 mfu=13.725 train_loss=0.703 val_loss=1.615
    [2023-11-15 09:36:17][INFO][trainer.py:516] - step=4500 loss=0.850 dt=27.402 sps=36.494 mtps=0.598 mfu=13.713 train_loss=0.703 val_loss=1.615
    [2023-11-15 09:36:24][INFO][trainer.py:516] - step=4750 loss=0.824 dt=26.811 sps=37.298 mtps=0.611 mfu=13.731 train_loss=0.703 val_loss=1.615
    [2023-11-15 09:36:30][INFO][trainer.py:516] - step=5000 loss=0.819 dt=27.435 sps=36.450 mtps=0.597 mfu=13.716 train_loss=0.703 val_loss=1.615
    [2023-11-11 01:16:27][INFO][trainer.py:516] - step=1000 loss=1.332 dt=26.899 sps=37.176 mtps=0.609 mfu=13.642 train_loss=4.299 val_loss=4.291
    [2023-11-11 01:16:34][INFO][trainer.py:516] - step=1250 loss=1.277 dt=27.229 sps=36.725 mtps=0.602 mfu=13.647 train_loss=4.299 val_loss=4.291
    [2023-11-11 01:16:40][INFO][trainer.py:516] - step=1500 loss=1.234 dt=26.878 sps=37.205 mtps=0.610 mfu=13.668 train_loss=4.299 val_loss=4.291
    [2023-11-11 01:16:47][INFO][trainer.py:516] - step=1750 loss=1.175 dt=27.460 sps=36.417 mtps=0.597 mfu=13.659 train_loss=4.299 val_loss=4.291
    [2023-11-11 01:16:54][INFO][trainer.py:516] - step=2000 loss=1.140 dt=26.889 sps=37.190 mtps=0.609 mfu=13.678 train_loss=4.299 val_loss=4.291
    [2023-11-11 01:16:58][INFO][trainer.py:432] - Saving checkpoint to: /lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/ngpt
    [2023-11-11 01:16:58][INFO][trainer.py:433] - Saving model to: /lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/ngpt/model.pth
    [2023-11-11 01:16:58][INFO][configs.py:129] - Appending /lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/ngpt to /lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/ngpt/ckpts/checkpoints.log
    [2023-11-11 01:17:05][INFO][trainer.py:516] - step=2250 loss=1.121 dt=27.308 sps=36.619 mtps=0.600 mfu=13.675 train_loss=1.050 val_loss=1.474
    [2023-11-11 01:17:12][INFO][trainer.py:516] - step=2500 loss=1.067 dt=26.838 sps=37.261 mtps=0.610 mfu=13.696 train_loss=1.050 val_loss=1.474
    [2023-11-11 01:17:19][INFO][trainer.py:516] - step=2750 loss=1.034 dt=27.360 sps=36.550 mtps=0.599 mfu=13.688 train_loss=1.050 val_loss=1.474
    [2023-11-11 01:17:26][INFO][trainer.py:516] - step=3000 loss=1.009 dt=26.237 sps=38.114 mtps=0.624 mfu=13.740 train_loss=1.050 val_loss=1.474
    [2023-11-11 01:17:33][INFO][trainer.py:516] - step=3250 loss=0.940 dt=26.991 sps=37.050 mtps=0.607 mfu=13.746 train_loss=1.050 val_loss=1.474
    [2023-11-11 01:17:39][INFO][trainer.py:516] - step=3500 loss=0.947 dt=26.261 sps=38.080 mtps=0.624 mfu=13.791 train_loss=1.050 val_loss=1.474
    [2023-11-11 01:17:46][INFO][trainer.py:516] - step=3750 loss=0.885 dt=37.216 sps=26.870 mtps=0.440 mfu=13.413 train_loss=1.050 val_loss=1.474
    [2023-11-11 01:17:53][INFO][trainer.py:516] - step=4000 loss=0.866 dt=26.241 sps=38.108 mtps=0.624 mfu=13.492 train_loss=1.050 val_loss=1.474
    [2023-11-11 01:17:57][INFO][trainer.py:432] - Saving checkpoint to: /lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/ngpt
    [2023-11-11 01:17:57][INFO][trainer.py:433] - Saving model to: /lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/ngpt/model.pth
    [2023-11-11 01:17:57][INFO][configs.py:129] - Appending /lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/ngpt to /lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/ngpt/ckpts/checkpoints.log
    [2023-11-11 01:18:04][INFO][trainer.py:516] - step=4250 loss=0.847 dt=27.228 sps=36.728 mtps=0.602 mfu=13.511 train_loss=0.696 val_loss=1.637
    [2023-11-11 01:18:11][INFO][trainer.py:516] - step=4500 loss=0.835 dt=26.215 sps=38.147 mtps=0.625 mfu=13.581 train_loss=0.696 val_loss=1.637
    [2023-11-11 01:18:18][INFO][trainer.py:516] - step=4750 loss=0.822 dt=26.657 sps=37.513 mtps=0.615 mfu=13.621 train_loss=0.696 val_loss=1.637
    [2023-11-11 01:18:24][INFO][trainer.py:516] - step=5000 loss=0.808 dt=26.635 sps=37.544 mtps=0.615 mfu=13.658 train_loss=0.696 val_loss=1.637
    [2023-11-11 01:18:31][INFO][trainer.py:516] - step=5250 loss=0.811 dt=26.267 sps=38.071 mtps=0.624 mfu=13.711 train_loss=0.696 val_loss=1.637
    [2023-11-11 01:18:38][INFO][trainer.py:516] - step=5500 loss=0.769 dt=26.406 sps=37.870 mtps=0.620 mfu=13.751 train_loss=0.696 val_loss=1.637
    [2023-11-11 01:18:44][INFO][trainer.py:516] - step=5750 loss=0.780 dt=26.239 sps=38.111 mtps=0.624 mfu=13.796 train_loss=0.696 val_loss=1.637
    [2023-11-11 01:18:51][INFO][trainer.py:516] - step=6000 loss=0.767 dt=26.682 sps=37.478 mtps=0.614 mfu=13.813 train_loss=0.696 val_loss=1.637
    [2023-11-11 01:18:55][INFO][trainer.py:432] - Saving checkpoint to: /lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/ngpt
    [2023-11-11 01:18:55][INFO][trainer.py:433] - Saving model to: /lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/ngpt/model.pth
    [2023-11-11 01:18:56][INFO][configs.py:129] - Appending /lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/ngpt to /lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/ngpt/ckpts/checkpoints.log
    [2023-11-11 01:19:02][INFO][trainer.py:516] - step=6250 loss=0.773 dt=31.104 sps=32.151 mtps=0.527 mfu=13.629 train_loss=0.556 val_loss=1.755
    [2023-11-11 01:19:09][INFO][trainer.py:516] - step=6500 loss=0.759 dt=27.142 sps=36.843 mtps=0.604 mfu=13.639 train_loss=0.556 val_loss=1.755
    [2023-11-11 01:19:16][INFO][trainer.py:516] - step=6750 loss=0.753 dt=26.712 sps=37.437 mtps=0.613 mfu=13.670 train_loss=0.556 val_loss=1.755
    [2023-11-11 01:19:22][INFO][trainer.py:516] - step=7000 loss=0.745 dt=26.871 sps=37.215 mtps=0.610 mfu=13.690 train_loss=0.556 val_loss=1.755
    [2023-11-11 01:19:29][INFO][trainer.py:516] - step=7250 loss=0.733 dt=26.266 sps=38.072 mtps=0.624 mfu=13.740 train_loss=0.556 val_loss=1.755
    [2023-11-11 01:19:36][INFO][trainer.py:516] - step=7500 loss=0.723 dt=26.817 sps=37.289 mtps=0.611 mfu=13.755 train_loss=0.556 val_loss=1.755
    [2023-11-11 01:19:43][INFO][trainer.py:516] - step=7750 loss=0.747 dt=26.461 sps=37.791 mtps=0.619 mfu=13.788 train_loss=0.556 val_loss=1.755
    [2023-11-11 01:19:49][INFO][trainer.py:516] - step=8000 loss=0.729 dt=29.348 sps=34.074 mtps=0.558 mfu=13.679 train_loss=0.556 val_loss=1.755
    [2023-11-11 01:19:53][INFO][trainer.py:432] - Saving checkpoint to: /lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/ngpt
    [2023-11-11 01:19:53][INFO][trainer.py:433] - Saving model to: /lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/ngpt/model.pth
    [2023-11-11 01:19:54][INFO][configs.py:129] - Appending /lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/ngpt to /lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/ngpt/ckpts/checkpoints.log
    [2023-11-11 01:20:01][INFO][trainer.py:516] - step=8250 loss=0.718 dt=26.464 sps=37.787 mtps=0.619 mfu=13.719 train_loss=0.473 val_loss=1.840
    [2023-11-11 01:20:07][INFO][trainer.py:516] - step=8500 loss=0.705 dt=27.051 sps=36.967 mtps=0.606 mfu=13.725 train_loss=0.473 val_loss=1.840
    [2023-11-11 01:20:14][INFO][trainer.py:516] - step=8750 loss=0.704 dt=26.298 sps=38.026 mtps=0.623 mfu=13.769 train_loss=0.473 val_loss=1.840
    [2023-11-11 01:20:21][INFO][trainer.py:516] - step=9000 loss=0.694 dt=27.131 sps=36.858 mtps=0.604 mfu=13.766 train_loss=0.473 val_loss=1.840
    [2023-11-11 01:20:27][INFO][trainer.py:516] - step=9250 loss=0.700 dt=26.291 sps=38.036 mtps=0.623 mfu=13.806 train_loss=0.473 val_loss=1.840
    [2023-11-11 01:20:34][INFO][trainer.py:516] - step=9500 loss=0.668 dt=27.353 sps=36.560 mtps=0.599 mfu=13.788 train_loss=0.473 val_loss=1.840
    [2023-11-11 01:20:41][INFO][trainer.py:516] - step=9750 loss=0.658 dt=26.422 sps=37.847 mtps=0.620 mfu=13.819 train_loss=0.473 val_loss=1.840
    [2023-11-11 01:20:48][INFO][trainer.py:516] - step=10000 loss=0.678 dt=26.887 sps=37.192 mtps=0.609 mfu=13.823 train_loss=0.473 val_loss=1.840

Evaluate Model

query = "What is a supercomputer?"
outputs = trainer.evaluate(query, num_samples=1, display=False)
console.print(fr'\[prompt]: "{query}"')
console.print("\[response]:\n\n" + fr"{outputs['0']['raw']}")

    [prompt]: "What is a supercomputer?"

    [response]:

    What is a supercomputer?
    How now, now! what news?
    Have thy sons?

    Messenger:
    The queen is his noble consul;
    The man I am a lord's, and he received:
    Therefore, consider and the hand of death.

    SIR STEPHEN SCROOP:
    Peace, hope, my lord; I am not thy name;
    For I have need of this cause is so long.

    BISHOP OF ELY:
    Believe me, I will practise your majesty.
    Be remember thy thoughts: give me and brothers,
    And towards London, till I were common all.

    BUCKINGHAM:
    Northumberland, so proud weighing to fight.

    GLOUCESTER:
    Relent, e

Back to top

Footnotes

in units of A100 bfloat16 peak FLOPS↩︎

Citation

BibTeX citation:

@online{foreman2023,
  author = {Foreman, Sam},
  title = {nanoGPT},
  date = {2023-11-15},
  url = {https://saforem2.github.io/nanoGPT},
  langid = {en}
}

For attribution, please cite this work as:

Foreman, Sam. 2023. “nanoGPT.” November 15, 2023. https://saforem2.github.io/nanoGPT.