karpathy/nanoGPT issues and pull requests

#97 - High validation loss when fine-tuning Shakespeare on gpt-xl?

Issue - State: open - Opened by tombenj over 1 year ago - 1 comment

#96 - added `star-history`

Pull Request - State: closed - Opened by hemangjoshi37a over 1 year ago

#95 - Add Common Crawl Dataset

Issue - State: open - Opened by DrissiReda over 1 year ago

#94 - Training on "Shakespeare" dataset is faster by using MacBook Air (M2)

Issue - State: open - Opened by xiningnlp over 1 year ago - 7 comments

#93 - GPU specs for finetuning gpt2-xl

Issue - State: open - Opened by yogi-miraje over 1 year ago - 2 comments

#92 - Making nano chatgpt

Issue - State: open - Opened by nebyu08 over 1 year ago - 8 comments

#91 - SparceGPT + nanoGPT

Issue - State: open - Opened by Grabber over 1 year ago - 1 comment

#90 - Out of Memory

Issue - State: open - Opened by kevinsaner91 over 1 year ago

#89 - Dataset load

Issue - State: open - Opened by thremilien over 1 year ago - 5 comments

#88 - Cuda out of Memory

Issue - State: open - Opened by hanfluid over 1 year ago - 3 comments

#87 - Use GradScaler in model only if dtype is float16

Pull Request - State: closed - Opened by johnwildauer over 1 year ago - 2 comments

#86 - Fix python builtin redefined

Pull Request - State: closed - Opened by tpaviot over 1 year ago - 2 comments

#85 - Google Coral

Issue - State: open - Opened by Helvio88 over 1 year ago

#84 - my gpu only supports float16, how do i train a model?

Issue - State: closed - Opened by breadbrowser over 1 year ago - 5 comments

#83 - token_embedding and pos_embedding

Issue - State: closed - Opened by pure-water over 1 year ago - 2 comments

#82 - Missed two spots while relative pathing

Pull Request - State: closed - Opened by danielgross over 1 year ago - 1 comment

#81 - GPT with UNet architecture gets the loss down to ~1.0 with no significant computation costs.

Issue - State: closed - Opened by englertbruno over 1 year ago - 15 comments

#80 - Fix Issue with running prepare.py (modified repos/nanoGPT/data/openwebtext/prepare.py)

Pull Request - State: closed - Opened by pierrebhat over 1 year ago - 5 comments

#79 - Fix Issue with running prepare.py Description: This PR fixes an issue with running `python prepare.py` by modifying files in the repos/nanoGPT/data directory.

Pull Request - State: closed - Opened by pierrebhat over 1 year ago

#78 - Fix Issue with running prepare.py in the nanoGPT repo Description: Fixes an issue with running `python prepare.py` that results in a `DatasetGenerationError` by modifying the files ['repos/nanoGPT/data/openwebtext/prepare.py'].

Pull Request - State: closed - Opened by pierrebhat over 1 year ago

#77 - Fix Issue with running prepare.py - Modify prepare.py, shakespeare/prepare.py & shakespeare_char/prepare.py

Pull Request - State: closed - Opened by pierrebhat over 1 year ago

#76 - Cache the KV projection history when generating

Pull Request - State: closed - Opened by dfyz over 1 year ago - 12 comments

#75 - create an openwebtext for non-english language

Issue - State: open - Opened by toozande over 1 year ago - 1 comment

#74 - Small fix to decode fn in shakespeare_char/prepare.py

Pull Request - State: closed - Opened by venusatuluri over 1 year ago

#73 - Use relative paths

Pull Request - State: closed - Opened by danielgross over 1 year ago - 1 comment

#72 - replace copy add with inplace add in the Block

Pull Request - State: closed - Opened by KucicM over 1 year ago - 1 comment

#71 - Zero-grad more aggressively to save memory

Pull Request - State: closed - Opened by cchan over 1 year ago - 8 comments

#70 - OpenWebTextCorpus DataLoader

Issue - State: open - Opened by vgoklani over 1 year ago - 3 comments

#69 - A question on getting garbage in sample.py (Generator)

Issue - State: open - Opened by hka3rs over 1 year ago

#68 - Add motivation for why to use Fabric

Pull Request - State: closed - Opened by awaelchli over 1 year ago - 1 comment

#67 - Just a question

Issue - State: open - Opened by jpbruneton over 1 year ago - 2 comments

#66 - fix typo ( params -> tokens)

Pull Request - State: closed - Opened by PWhiddy over 1 year ago

#65 - More a question - is there an easy way to test generation?

Issue - State: closed - Opened by fblissjr over 1 year ago - 1 comment

#64 - Proposal for a slightly improved minimal configuration system

Issue - State: open - Opened by adonath over 1 year ago

#63 - Support TensorFlow 2

Issue - State: closed - Opened by pure-rgb over 1 year ago - 1 comment

#62 - Error when using Pytorch 2.0 (Compile=False)

Issue - State: open - Opened by hanfluid over 1 year ago - 1 comment

#61 - CUDA out of memory

Issue - State: closed - Opened by hanfluid over 1 year ago - 3 comments

#60 - checkpoints don't seem to be working

Issue - State: closed - Opened by eniompw over 1 year ago - 2 comments

#59 - Why using learnable position embedding just like token embedding?

Issue - State: open - Opened by tiendung over 1 year ago - 3 comments

#58 - what is the main speed up trick for nanoGPT?

Issue - State: open - Opened by brando90 over 1 year ago - 3 comments

#57 - Improve readability of huge numbers

Pull Request - State: closed - Opened by ryouze over 1 year ago - 1 comment

#55 - DDP on multinode [not yet working]

Pull Request - State: closed - Opened by karpathy over 1 year ago - 3 comments

#54 - Give tqdm some love :)

Pull Request - State: closed - Opened by MicroPanda123 over 1 year ago - 3 comments

#53 - Please add a pakcage manager and requirements

Issue - State: open - Opened by muddi900 over 1 year ago - 1 comment

#52 - Finetune code translation tasks

Issue - State: open - Opened by edgarriba over 1 year ago

#51 - implements torch sdpa

Pull Request - State: closed - Opened by LucasLLC over 1 year ago

#50 - Issue with running prepare.py

Issue - State: open - Opened by torial over 1 year ago - 3 comments

#49 - Got stucked at the "dataset = load_dataset("openwebtext")

Issue - State: closed - Opened by hanfluid over 1 year ago - 1 comment

#48 - Another thank you

Issue - State: closed - Opened by greydanus over 1 year ago - 1 comment

#47 - How to load the GPT-2 model

Issue - State: open - Opened by strangeoptics over 1 year ago - 2 comments

#46 - Support for Logging with Comet!

Pull Request - State: closed - Opened by sherpan over 1 year ago - 1 comment

#45 - Doesn't have a CONTRIBUTING.md file

Issue - State: open - Opened by izam-mohammed over 1 year ago - 1 comment

#44 - Corrected some mistakes in README.md file

Pull Request - State: closed - Opened by izam-mohammed over 1 year ago - 2 comments

#43 - Use classes for examples

Pull Request - State: closed - Opened by acheong08 over 1 year ago

#42 - Pluck last token before lm_head(x) during inference?

Issue - State: closed - Opened by jxtps over 1 year ago - 2 comments

#41 - Is it possible: davinci-003?

Issue - State: open - Opened by gameveloster over 1 year ago - 4 comments

#40 - copy model args from checkpint model when resuming the training

Pull Request - State: closed - Opened by yogi-miraje over 1 year ago

#39 - Perhaps another dependency is on the transformers package

Issue - State: closed - Opened by amiramir over 1 year ago - 1 comment

#38 - Make wandb training logs public

Issue - State: open - Opened by tcapelle over 1 year ago - 2 comments

#37 - Hardware requirements for inference?

Issue - State: closed - Opened by jjtolton over 1 year ago - 1 comment

#36 - Stop words?

Issue - State: open - Opened by BoyuanJackChen over 1 year ago - 3 comments

#35 - Thank you

Issue - State: closed - Opened by agamemnonc over 1 year ago - 2 comments

#34 - Add gradient accumulation support

Pull Request - State: closed - Opened by VHellendoorn over 1 year ago - 6 comments

#33 - What is nanoGPT and how to use it?

Issue - State: open - Opened by sudo-sand over 1 year ago - 1 comment

#32 - is there a google colab/ jupyter notebook implimentation of this project ?

Issue - State: open - Opened by SadafShafi over 1 year ago - 2 comments

#31 - Using float16 via Gradscaler

Issue - State: open - Opened by acheong08 over 1 year ago - 1 comment

#30 - Training on AMD Ryzen 5 5600H with Radeon Graphics, 3301 Mhz (RTX 3050 Laptop), 6 Cores, 12 Threads

Issue - State: closed - Opened by ElJaian over 1 year ago - 2 comments

#29 - Use argparse in configurator.py

Pull Request - State: closed - Opened by plotguy over 1 year ago - 3 comments

#28 - Training on M1 "MPS"

Issue - State: open - Opened by okpatil4u over 1 year ago - 45 comments

#27 - Don't hard-code device in autocast

Pull Request - State: closed - Opened by lantiga over 1 year ago - 11 comments

#26 - Argparse but vars remain at global level and minimal boilerplate

Pull Request - State: closed - Opened by murbard over 1 year ago - 5 comments

#25 - I explored the functionalities of prepare.py on my own and prepared a post in Spanish

Issue - State: open - Opened by lzeladam over 1 year ago - 1 comment

#24 - change whitelist to allowlist and blacklist to blocklist

Pull Request - State: closed - Opened by JonathanSum over 1 year ago - 1 comment

#23 - Inefficiencies

Pull Request - State: closed - Opened by Anri-Lombard over 1 year ago - 2 comments

#22 - # note: each worker gets a different seed

Issue - State: closed - Opened by vgoklani over 1 year ago - 2 comments

#21 - Tie LM Head Weight to Token Embedding to match official GPT2 Code

Pull Request - State: closed - Opened by fattorib over 1 year ago - 9 comments

#20 - Make wandb import conditioned to wandb_log=True

Pull Request - State: closed - Opened by lantiga over 1 year ago - 7 comments

#19 - Strip unwanted prefix from state keys when loading model in sample.py

Pull Request - State: closed - Opened by nat over 1 year ago - 1 comment

#18 - Simple ml-collections instrumentation

Pull Request - State: closed - Opened by tcapelle over 1 year ago - 3 comments

#17 - Log the config params to wandb

Pull Request - State: closed - Opened by tcapelle over 1 year ago - 3 comments

#16 - Update README.md

Pull Request - State: closed - Opened by jorahn over 1 year ago - 1 comment

#15 - Requirements & encoding

Pull Request - State: closed - Opened by nil-andreu over 1 year ago - 4 comments

#14 - Requirements & Encoding

Pull Request - State: closed - Opened by nil-andreu over 1 year ago

#13 - PyTorch-nightly dependency chain

Issue - State: open - Opened by nlathia over 1 year ago - 1 comment

#12 - Jax/Flax Rewrite

Issue - State: open - Opened by jenkspt over 1 year ago - 3 comments

#11 - Remove @torch.jit.script decorator when compiling the model?

Issue - State: closed - Opened by vgoklani over 1 year ago - 12 comments

#10 - batch file write

Pull Request - State: closed - Opened by LaihoE over 1 year ago - 3 comments

#9 - cpu support

Pull Request - State: closed - Opened by Ricardicus over 1 year ago - 9 comments

#8 - Running train.py on 2060 GPU

Issue - State: open - Opened by lzeladam over 1 year ago - 6 comments

#7 - Is there an extra charge?

Issue - State: closed - Opened by phonefixnicole over 1 year ago

#6 - README.md

Pull Request - State: closed - Opened by jarede-dev over 1 year ago - 1 comment

#5 - batch and multiprocess file write

Pull Request - State: closed - Opened by LaihoE over 1 year ago - 2 comments

#4 - prepare.py: single-threaded write with mmap only once

Pull Request - State: closed - Opened by proger over 1 year ago - 8 comments

#3 - pytorch gelu tanh approximation

Pull Request - State: closed - Opened by zacwellmer over 1 year ago - 2 comments

#2 - Try using gelu approximate = 'tanh'

Issue - State: closed - Opened by drisspg over 1 year ago - 2 comments

#1 - Minor Frozen GPTConfig

Pull Request - State: closed - Opened by ankandrew over 1 year ago - 1 comment

GitHub / karpathy/nanoGPT issues and pull requests