nanoGPT

Sam Foreman

GPT-2 XL

Sam Foreman

November 15, 2023

Install / Setup

First Time Running

We need to install ngpt and setup the Shakespeare dataset

This will need to be ran the first time you are running this notebook.

Following the

!python3 -m pip install nanoGPT

you will need to restart your runtime (Runtime -> Restart runtime)

After this, you should be able to

>>> import ngpt
>>> ngpt.__file__
'/content/nanoGPT/src/ngpt/__init__.py'

%%bash

python3 -c 'import ngpt; print(ngpt.__file__)' 2> '/dev/null'

if [[ $? -eq 0 ]]; then
    echo "Has ngpt installed. Nothing to do."
else
    echo "Does not have ngpt installed. Installing..."
    git clone 'https://github.com/saforem2/nanoGPT'
    python3 nanoGPT/data/shakespeare_char/prepare.py
    python3 -m pip install -e nanoGPT -vvv
fi

/lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/ngpt/__init__.py
Has ngpt installed. Nothing to do.

Post Install

If installed correctly, you should be able to:

>>> import ngpt
>>> ngpt.__file__
'/path/to/nanoGPT/src/ngpt/__init__.py'

%load_ext autoreload
%autoreload 2

import ngpt
from rich import print
print(ngpt.__file__)

    The autoreload extension is already loaded. To reload it, use:
      %reload_ext autoreload

    /lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/ngpt/__init__.py

Build Trainer

Explicitly, we:

setup_torch(...)
Build cfg: DictConfig = get_config(...)
Instnatiate config: ExperimentConfig = instantiate(cfg)
Build trainer = Trainer(config)

import os
import numpy as np
from ezpz import setup_torch
from hydra.utils import instantiate
from ngpt.configs import get_config, PROJECT_ROOT
from ngpt.trainer import Trainer
from enrich.console import get_console

console = get_console()
HF_DATASETS_CACHE = PROJECT_ROOT.joinpath('.cache', 'huggingface')
HF_DATASETS_CACHE.mkdir(exist_ok=True, parents=True)

os.environ['MASTER_PORT'] = '5127'
os.environ['HF_DATASETS_CACHE'] = HF_DATASETS_CACHE.as_posix()

SEED = np.random.randint(2**32)
console.print(f'SEED: {SEED}')

rank = setup_torch('DDP', seed=1234)
cfg = get_config(
    [
        'data=owt',
        'model=gpt2_xl',
        'optimizer=gpt2_xl',
        'train=gpt2_xl',
        'train.init_from=gpt2-xl',
        'train.max_iters=100',
        'train.dtype=bfloat16',
    ]
)
config = instantiate(cfg)
trainer = Trainer(config)

    --------------------------------------------------------------------------
    WARNING: There was an error initializing an OpenFabrics device.
    
      Local host:   thetagpu24
      Local device: mlx5_0
    --------------------------------------------------------------------------

    SEED: 125313342
    RANK: 0 / 0
    [2023-11-10 17:36:01][WARNING][configs.py:298] - No meta.pkl found, assuming GPT-2 encodings...
    [2023-11-10 17:36:01][INFO][configs.py:264] - Rescaling GAS -> GAS // WORLD_SIZE = 1 // 1
    [2023-11-10 17:36:01][INFO][configs.py:399] - Tokens per iteration: 1,024
    [2023-11-10 17:36:01][INFO][configs.py:431] - Using <torch.amp.autocast_mode.autocast object at 0x7f98e0139660>
    [2023-11-10 17:36:01][INFO][trainer.py:184] - Initializing from OpenAI GPT-2 Weights: gpt2-xl
    2023-11-10 17:36:01.777923: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
    To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
    [2023-11-10 17:36:05,925] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
    [2023-11-10 17:36:06][INFO][model.py:225] - loading weights from pretrained gpt: gpt2-xl
    [2023-11-10 17:36:06][INFO][model.py:234] - forcing vocab_size=50257, block_size=1024, bias=True
    [2023-11-10 17:36:06][INFO][model.py:240] - overriding dropout rate to 0.0
    [2023-11-10 17:36:29][INFO][model.py:160] - number of parameters: 1555.97M
    [2023-11-10 17:36:56][INFO][model.py:290] - num decayed parameter tensors: 194, with 1,556,609,600 parameters
    [2023-11-10 17:36:56][INFO][model.py:291] - num non-decayed parameter tensors: 386, with 1,001,600 parameters
    [2023-11-10 17:36:56][INFO][model.py:297] - using fused AdamW: True

Prompt (prior to training)

query = "What is a supercomputer?"
outputs = trainer.evaluate(query, num_samples=1, display=False)
console.print(fr'\[prompt]: "{query}"')
console.print("\[response]:\n\n" + fr"{outputs['0']['raw']}")

    [prompt]: "What is a supercomputer?"




    [response]:

    What is a supercomputer? When it comes to massive computing, a supercomputer is simply a large computer system that has the ability to perform many calculations at once. This can be the result of using many different processing cores, or memory, or operating at a high clock speed. Supercomputers are often used to crack complex calculations and research problems.

    Image credit: Wikipedia

    Image credit: Wikipedia

    Image credit: Wikipedia

    Image credit: Wikipedia

    Image credit: Wikipedia

    Image credit: Wikipedia

    Image credit: Wikipedia

    Image credit: Wikipedia

    Image credit: Wikipedia

    Image credit: Wikipedia

    Image credit: Wikipedia

    Image credit: Wikipedia

    Image credit: Wikipedia

    Image credit: Wikipedia

    Image credit: Wikipedia

    Image credit: Wikipedia

    Image credit: Wikipedia

    Image credit: Wikipedia

    Image credit: Wikipedia

    Image credit: Wikipedia

    Image credit: Wikipedia

    On a larger scale, these massive computers are used to solve tough mathematical equations and solve hard scientific problems. They are very powerful enough to emulate the workings of the human brain and simulate a human intelligence in a virtual world.

    Image credit: Wikipedia

    Image credit: Wikipedia

    Image credit: Wikipedia

    Image credit: Wikipedia

    Image credit: Wikipedia

    In 1992, IBM's NeXTStep supercomputer was the largest and most powerful supercomputer in the world. It was released in 1995 and did not continue to live up to its original promises, because its capabilities were quickly surpassed by its competitors.

    Image credit: Wikipedia

    Image credit: Wikipedia

    Image credit: Wikipedia

    Image credit: Wikipedia

    Image credit: Wikipedia

    Image credit: Wikipedia

    Image credit: Wikipedia

    Image credit: Wikipedia<|endoftext|>Editor's note: Dan De Luce is the author of "When the Going Gets Tough: The New Survival Guide for College Students and Your Health and Well-Being."

    College has never been more expensive. But with so many choices and so many choices of where to go, it's harder than ever for prospective students to find a college that fits their lifestyle.

    This is a problem—not just because it can be a hassle to find a college that doesn't require a large amount of financial aid. It's a problem because it can be costly for students to stay in college.

    So I created this list of colleges with the highest tuition where

Table 1: Legend
Name	Description
`step`	Current training step
`loss`	Loss value
`dt`	Time per step (in ms)
`sps`	Samples per second
`mtps`	(million) Tokens per sec
`mfu`	Model Flops utilization¹

Train Model

trainer.model.module.train()
trainer.train()

    [2023-11-10 17:41:58][INFO][trainer.py:540] - step=100 loss=2.505 dt=922.295 sps=1.084 mtps=0.001 mfu=43.897 train_loss=2.555 val_loss=2.558

Evaluate Model

query = "What is a supercomputer?"
outputs = trainer.evaluate(query, num_samples=1, display=False)

from rich.text import Text
from enrich.console import get_console
console = get_console()

console.print(fr'\[prompt]: "{query}"')
console.print("\[response]:\n\n" + fr"{outputs['0']['raw']}")

[prompt]: "What is a supercomputer?"
[response]:

What is a supercomputer? A supercomputer is a machine that is exponentially more powerful than previous computing models while being far more energy efficient.

What is an artificial neural network? An artificial neural network (ANN) is an order of magnitude more powerful than previous computational models, but has the same energy efficiency.

For this article I will be using a machine learning technique called Backward-Compatible Neural Networks (BCNNs) to represent the biological brain.

The BCNNs model is very similar to the neural networks utilized in deep learning, but has the added bonus of being able to 'decouple' the learning from the final results.

BCNN for Machine Learning

In order to make the transition from neural networks to BCNNs we will follow the same basic principles as we did with neural networks.

However, instead of the neurons in neural networks that represent the data being represented, BCNNs work with nodes instead. This is because the nodes are the data, while the neurons are the information.

In case you aren’t familiar with the term node, it is a symbol representing any type of data. For instance, it could be a datum in a neural network model.

Another way to think of them is as symbols.

The basic idea of nodes and connections is that a node can have many connections to other nodes, with each node linked to a connection to a larger entity.

For instance, a node might have a target, which is just a point in space. A connection might have a value, which is just a number between 0 and 1.

Something like this:

Node Value -0.1 0.1 0.1 0.1

The important thing to note, is that the value is a number between 0 and 1.

When we are given a list of data and an input, we will move forward through the data, connected nodes, and the resulting output.

In the case of neural networks, this would look like:

Neural Network

A neural network is just a collection of nodes, connected to each other through connections.

For example, let’s look at the ConvNet model from Wikipedia.

Pretty simple. It has multiple layers of neurons, with each neuron being assigned one of the above variables.

The neurons work with the data given as an input (remember, it’s a

Back to top

Footnotes

in units of A100 bfloat16 peak FLOPS↩︎

Citation

BibTeX citation:

@online{foreman2023,
  author = {Foreman, Sam},
  title = {nanoGPT},
  date = {2023-11-15},
  url = {https://saforem2.github.io/nanoGPT},
  langid = {en}
}

For attribution, please cite this work as:

Foreman, Sam. 2023. “nanoGPT.” November 15, 2023. https://saforem2.github.io/nanoGPT.