GPT-2 XL
Install / Setup
First Time Running
We need to install ngpt
and setup the Shakespeare dataset
This will need to be ran the first time you are running this notebook.
Following the
!python3 -m pip install nanoGPT
you will need to restart your runtime (Runtime -> Restart runtime)
After this, you should be able to
>>> import ngpt
>>> ngpt.__file__
'/content/nanoGPT/src/ngpt/__init__.py'
%%bash
python3 -c 'import ngpt; print(ngpt.__file__)' 2> '/dev/null'
if [[ $? -eq 0 ]]; then
echo "Has ngpt installed. Nothing to do."
else
echo "Does not have ngpt installed. Installing..."
git clone 'https://github.com/saforem2/nanoGPT'
python3 nanoGPT/data/shakespeare_char/prepare.py
python3 -m pip install -e nanoGPT -vvv
fi
/lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/ngpt/__init__.py
Has ngpt installed. Nothing to do.
Post Install
If installed correctly, you should be able to:
>>> import ngpt
>>> ngpt.__file__
'/path/to/nanoGPT/src/ngpt/__init__.py'
%load_ext autoreload
%autoreload 2
import ngpt
from rich import print
print(ngpt.__file__)
The autoreload extension is already loaded. To reload it, use: %reload_ext autoreload /lus/grand/projects/datascience/foremans/locations/thetaGPU/projects/saforem2/nanoGPT/src/ngpt/__init__.py
Build Trainer
Explicitly, we:
setup_torch(...)
- Build
cfg: DictConfig = get_config(...)
- Instnatiate
config: ExperimentConfig = instantiate(cfg)
- Build
trainer = Trainer(config)
import os
import numpy as np
from ezpz import setup_torch
from hydra.utils import instantiate
from ngpt.configs import get_config, PROJECT_ROOT
from ngpt.trainer import Trainer
from enrich.console import get_console
= get_console()
console = PROJECT_ROOT.joinpath('.cache', 'huggingface')
HF_DATASETS_CACHE =True, parents=True)
HF_DATASETS_CACHE.mkdir(exist_ok
'MASTER_PORT'] = '5127'
os.environ['HF_DATASETS_CACHE'] = HF_DATASETS_CACHE.as_posix()
os.environ[
= np.random.randint(2**32)
SEED print(f'SEED: {SEED}')
console.
= setup_torch('DDP', seed=1234)
rank = get_config(
cfg
['data=owt',
'model=gpt2_xl',
'optimizer=gpt2_xl',
'train=gpt2_xl',
'train.init_from=gpt2-xl',
'train.max_iters=100',
'train.dtype=bfloat16',
]
)= instantiate(cfg)
config = Trainer(config) trainer
-------------------------------------------------------------------------- WARNING: There was an error initializing an OpenFabrics device. Local host: thetagpu24 Local device: mlx5_0 -------------------------------------------------------------------------- SEED: 125313342 RANK: 0 / 0 [2023-11-10 17:36:01][WARNING][configs.py:298] - No meta.pkl found, assuming GPT-2 encodings... [2023-11-10 17:36:01][INFO][configs.py:264] - Rescaling GAS -> GAS // WORLD_SIZE = 1 // 1 [2023-11-10 17:36:01][INFO][configs.py:399] - Tokens per iteration: 1,024 [2023-11-10 17:36:01][INFO][configs.py:431] - Using <torch.amp.autocast_mode.autocast object at 0x7f98e0139660> [2023-11-10 17:36:01][INFO][trainer.py:184] - Initializing from OpenAI GPT-2 Weights: gpt2-xl 2023-11-10 17:36:01.777923: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. [2023-11-10 17:36:05,925] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2023-11-10 17:36:06][INFO][model.py:225] - loading weights from pretrained gpt: gpt2-xl [2023-11-10 17:36:06][INFO][model.py:234] - forcing vocab_size=50257, block_size=1024, bias=True [2023-11-10 17:36:06][INFO][model.py:240] - overriding dropout rate to 0.0 [2023-11-10 17:36:29][INFO][model.py:160] - number of parameters: 1555.97M [2023-11-10 17:36:56][INFO][model.py:290] - num decayed parameter tensors: 194, with 1,556,609,600 parameters [2023-11-10 17:36:56][INFO][model.py:291] - num non-decayed parameter tensors: 386, with 1,001,600 parameters [2023-11-10 17:36:56][INFO][model.py:297] - using fused AdamW: True
Prompt (prior to training)
= "What is a supercomputer?"
query = trainer.evaluate(query, num_samples=1, display=False)
outputs print(fr'\[prompt]: "{query}"')
console.print("\[response]:\n\n" + fr"{outputs['0']['raw']}") console.
[prompt]: "What is a supercomputer?" [response]: What is a supercomputer? When it comes to massive computing, a supercomputer is simply a large computer system that has the ability to perform many calculations at once. This can be the result of using many different processing cores, or memory, or operating at a high clock speed. Supercomputers are often used to crack complex calculations and research problems. Image credit: Wikipedia Image credit: Wikipedia Image credit: Wikipedia Image credit: Wikipedia Image credit: Wikipedia Image credit: Wikipedia Image credit: Wikipedia Image credit: Wikipedia Image credit: Wikipedia Image credit: Wikipedia Image credit: Wikipedia Image credit: Wikipedia Image credit: Wikipedia Image credit: Wikipedia Image credit: Wikipedia Image credit: Wikipedia Image credit: Wikipedia Image credit: Wikipedia Image credit: Wikipedia Image credit: Wikipedia Image credit: Wikipedia On a larger scale, these massive computers are used to solve tough mathematical equations and solve hard scientific problems. They are very powerful enough to emulate the workings of the human brain and simulate a human intelligence in a virtual world. Image credit: Wikipedia Image credit: Wikipedia Image credit: Wikipedia Image credit: Wikipedia Image credit: Wikipedia In 1992, IBM's NeXTStep supercomputer was the largest and most powerful supercomputer in the world. It was released in 1995 and did not continue to live up to its original promises, because its capabilities were quickly surpassed by its competitors. Image credit: Wikipedia Image credit: Wikipedia Image credit: Wikipedia Image credit: Wikipedia Image credit: Wikipedia Image credit: Wikipedia Image credit: Wikipedia Image credit: Wikipedia<|endoftext|>Editor's note: Dan De Luce is the author of "When the Going Gets Tough: The New Survival Guide for College Students and Your Health and Well-Being." College has never been more expensive. But with so many choices and so many choices of where to go, it's harder than ever for prospective students to find a college that fits their lifestyle. This is a problem—not just because it can be a hassle to find a college that doesn't require a large amount of financial aid. It's a problem because it can be costly for students to stay in college. So I created this list of colleges with the highest tuition where
Name | Description |
---|---|
step |
Current training step |
loss |
Loss value |
dt |
Time per step (in ms) |
sps |
Samples per second |
mtps |
(million) Tokens per sec |
mfu |
Model Flops utilization1 |
Train Model
trainer.model.module.train() trainer.train()
[2023-11-10 17:41:58][INFO][trainer.py:540] - step=100 loss=2.505 dt=922.295 sps=1.084 mtps=0.001 mfu=43.897 train_loss=2.555 val_loss=2.558
Evaluate Model
= "What is a supercomputer?"
query = trainer.evaluate(query, num_samples=1, display=False) outputs
from rich.text import Text
from enrich.console import get_console
= get_console()
console
print(fr'\[prompt]: "{query}"')
console.print("\[response]:\n\n" + fr"{outputs['0']['raw']}") console.
[prompt]: "What is a supercomputer?" [response]: What is a supercomputer? A supercomputer is a machine that is exponentially more powerful than previous computing models while being far more energy efficient. What is an artificial neural network? An artificial neural network (ANN) is an order of magnitude more powerful than previous computational models, but has the same energy efficiency. For this article I will be using a machine learning technique called Backward-Compatible Neural Networks (BCNNs) to represent the biological brain. The BCNNs model is very similar to the neural networks utilized in deep learning, but has the added bonus of being able to 'decouple' the learning from the final results. BCNN for Machine Learning In order to make the transition from neural networks to BCNNs we will follow the same basic principles as we did with neural networks. However, instead of the neurons in neural networks that represent the data being represented, BCNNs work with nodes instead. This is because the nodes are the data, while the neurons are the information. In case you aren’t familiar with the term node, it is a symbol representing any type of data. For instance, it could be a datum in a neural network model. Another way to think of them is as symbols. The basic idea of nodes and connections is that a node can have many connections to other nodes, with each node linked to a connection to a larger entity. For instance, a node might have a target, which is just a point in space. A connection might have a value, which is just a number between 0 and 1. Something like this: Node Value -0.1 0.1 0.1 0.1 The important thing to note, is that the value is a number between 0 and 1. When we are given a list of data and an input, we will move forward through the data, connected nodes, and the resulting output. In the case of neural networks, this would look like: Neural Network A neural network is just a collection of nodes, connected to each other through connections. For example, let’s look at the ConvNet model from Wikipedia. Pretty simple. It has multiple layers of neurons, with each neuron being assigned one of the above variables. The neurons work with the data given as an input (remember, it’s a
Footnotes
in units of A100
bfloat16
peak FLOPS↩︎
Citation
BibTeX citation:
@online{foreman2023,
author = {Foreman, Sam},
title = {nanoGPT},
date = {2023-11-15},
url = {https://saforem2.github.io/nanoGPT},
langid = {en}
}
For attribution, please cite this work as:
Foreman, Sam. 2023. “nanoGPT.” November 15, 2023. https://saforem2.github.io/nanoGPT.