karpathy/nanoGPT issues and pull requests

#555 - Adding NVIDIA hardware performance detection

Pull Request - State: open - Opened by fparisio 1 day ago

#554 - Pretraining loss explosion

Issue - State: open - Opened by mattgorb 8 days ago

#553 - Add fire finetuning

Pull Request - State: open - Opened by gkielian 15 days ago

#552 - why is the warmup_iters set 2000?

Issue - State: open - Opened by luxunxiansheng 16 days ago

#551 - The Positional Encoding is not using sin / cos?

Issue - State: open - Opened by mw66 17 days ago

#550 - Remove flashattention from model.py

Pull Request - State: closed - Opened by chughtapan 17 days ago

#549 - Implement muP and add code for mup guide blog

Pull Request - State: closed - Opened by ndey96 23 days ago

#548 - Perplexity

Issue - State: open - Opened by Precola 25 days ago

#547 - Progressive training?

Issue - State: open - Opened by immartian 30 days ago - 3 comments

#546 - Add support for 0 temperature

Pull Request - State: open - Opened by jmccrosky about 1 month ago

#545 - torchrun on L40S Error:torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

Issue - State: closed - Opened by Precola about 1 month ago - 1 comment

#544 - Rocm support?

Issue - State: open - Opened by ilovethensa about 1 month ago

#543 - Calculation of Batch Size

Issue - State: closed - Opened by Precola about 2 months ago - 1 comment

#542 - configuration for Macs(apple silicon)

Issue - State: open - Opened by bawsi99 about 2 months ago

#541 - Adding gpt2 training experiment

Pull Request - State: closed - Opened by NewtonSander about 2 months ago

#540 - Use weights_only for loading

Pull Request - State: open - Opened by kit1980 about 2 months ago

#539 - What to change for training on two T4 GPUs ?

Issue - State: open - Opened by noorchauhan about 2 months ago - 1 comment

#538 - Update train.py for more efficiency

Pull Request - State: open - Opened by Jesseonmi 2 months ago

#537 - Simple Use Case Demonstration with Old School Runescape Terminology

Issue - State: open - Opened by Omarch47 2 months ago

#536 - Solution to Exercise 1 from Youtube Lecture (Batching the heads) - Why does it work?

Issue - State: closed - Opened by Andrew-Luo1 2 months ago - 1 comment

#535 - Nano GPT

Issue - State: open - Opened by phanee123 2 months ago

#534 - ddp on macbook CPU

Pull Request - State: closed - Opened by langong347 2 months ago

#533 - free up state_dict variable memory after loading checkpoint

Pull Request - State: open - Opened by adistomar 2 months ago

#532 - FileNotFoundError: [Errno 2] No such file or directory: 'data/openwebtext/train.bin'

Issue - State: open - Opened by HarikrishnanK9 3 months ago - 1 comment

#531 - About the get_batch

Issue - State: open - Opened by leo-young 3 months ago - 1 comment

#530 - Add automatic detection of number of CPU cores

Pull Request - State: open - Opened by Jakobovski 3 months ago - 1 comment

#529 - Data cleaning for openwebtext

Issue - State: open - Opened by zzkzzkjsw 3 months ago

#528 - fix val dataset size code comment

Pull Request - State: open - Opened by vhmth 3 months ago

#527 - fix(train.py): mfu estimation to respect CPU-GPU sync point

Pull Request - State: open - Opened by JasonLiJT 3 months ago

#526 - code gpt v1

Pull Request - State: closed - Opened by shatrugna 3 months ago

#525 - "RuntimeError: Internal Triton PTX codegen error" is raised when I train shakespeare_char with a GPU

Issue - State: open - Opened by shenbb 3 months ago - 5 comments

#524 - Pretraining Divergence

Issue - State: open - Opened by egoetz 3 months ago - 3 comments

#493 - Overfitting of the small GPU model

Issue - State: open - Opened by Bachstelze 3 months ago

#492 - Drop in performance when changing dtype to float32

Issue - State: open - Opened by blaisedelattre 3 months ago - 2 comments

#491 - Improvements to RWKV v5.1

Pull Request - State: closed - Opened by faresobeid 3 months ago

#490 - Update README.md - added alternative running instructions

Pull Request - State: closed - Opened by dnordfors 4 months ago

#489 - What does "prioritize teeth over education" even mean?

Issue - State: open - Opened by dw61 4 months ago - 2 comments

#488 - sign descent seems to do better than adamw?

Pull Request - State: open - Opened by nullonesix 4 months ago

#487 - Update README.md

Pull Request - State: closed - Opened by jellehak 4 months ago

#486 - [Q] Async prefetch next batch while model is doing forward pass

Issue - State: open - Opened by GM-git-dotcom 4 months ago - 1 comment

#485 - Shouldnt the ddp check be on ZERO instead of -1

Issue - State: open - Opened by sajinpgupta 4 months ago

#484 - Hyperparameter Tuning

Issue - State: closed - Opened by SinanCavusoglu 4 months ago

#483 - Index out of range when training on custom dataset

Issue - State: open - Opened by TayTT 4 months ago - 1 comment

#482 - What is the meaning of nh and hs

Issue - State: closed - Opened by Bachstelze 4 months ago - 1 comment

#481 - Fix: conditional use of GradScaler based on device_type and dtype in train.py

Pull Request - State: open - Opened by BRAINIAC2677 5 months ago

#480 - neverMind

Issue - State: closed - Opened by Zemulax 5 months ago

#479 - Implement multi-token prediction option for models

Issue - State: open - Opened by tmostak 5 months ago - 7 comments

#478 - nanoGPT/model.py where `manual implementation of attention`,Is it correct to modify it like I did？

Issue - State: open - Opened by wmx-github 5 months ago - 1 comment

#477 - Training fails on Python 3.12 on either GPU or CPU

Issue - State: closed - Opened by tigran123 5 months ago - 3 comments

#476 - Recommendation for something smaller

Issue - State: open - Opened by diamondfishtools 5 months ago - 1 comment

#475 - [Question] Why use `call` to do forward.

Issue - State: closed - Opened by Felix-Zhenghao 5 months ago - 2 comments

#474 - could nanoGPT be the AI assistant for the development of CAX software?

Issue - State: open - Opened by fengsim 5 months ago - 1 comment

#473 - [Question] The mask size seems wrong?

Issue - State: closed - Opened by Felix-Zhenghao 5 months ago

#472 - [Question] why bias is init to zero?

Issue - State: closed - Opened by michael8090 5 months ago - 1 comment

#471 - Citing this project in research

Issue - State: open - Opened by davmacario 5 months ago - 4 comments

#470 - CUDA error: device-side assert triggered

Issue - State: closed - Opened by ecsfu 6 months ago

#469 - How to Set "vocab_size" and "block_size" for Word Embedding?

Issue - State: open - Opened by haibao-yu 6 months ago - 1 comment

#468 - Is this loss curve normal

Issue - State: open - Opened by banyan-god 6 months ago - 20 comments

#467 - Resume Training

Issue - State: open - Opened by tiredsoul21 6 months ago - 3 comments

#466 - MFU too low in custom GPT-2 training

Issue - State: closed - Opened by eonurk 6 months ago - 2 comments

#465 - nano_gpt

Issue - State: open - Opened by Mihir0567 6 months ago

#464 - fix: h100-mfu-calculation

Pull Request - State: closed - Opened by OrenLeung 6 months ago - 1 comment

#463 - Fixing eval path in README

Pull Request - State: closed - Opened by goswamig 6 months ago

#462 - gabe init

Pull Request - State: closed - Opened by jondestoppeleire 6 months ago

#458 - Torch >= 2.2.0 inference issues on MPS

Issue - State: open - Opened by davmacario 6 months ago - 3 comments

#456 - MFU calculation wrong

Issue - State: open - Opened by lxww302 6 months ago - 2 comments

#455 - dropout is 0.0

Issue - State: open - Opened by dipsivenkatesh 6 months ago - 3 comments

#454 - PyTorch nn.LayerNorm now takes bias arg - removed custom class

Pull Request - State: open - Opened by calmitchell617 7 months ago - 1 comment

#453 - Early stopping

Pull Request - State: open - Opened by derekehyatt 7 months ago - 2 comments

#450 - Implement ROPE positional encodings

Pull Request - State: open - Opened by devinbot 7 months ago - 1 comment

#447 - Why don't we crop attn.weight as well?

Issue - State: open - Opened by muerghq 7 months ago - 1 comment

#440 - nothing has been written into???

Issue - State: open - Opened by BeimingCharles 7 months ago - 1 comment

#439 - AssertionError when trying to run sample.py

Issue - State: open - Opened by RexNecross 7 months ago - 2 comments

#438 - Which Python version can be used

Issue - State: open - Opened by denghuilong-sir 7 months ago - 2 comments

#435 - How to train nanoGPT using TPU's?

Issue - State: closed - Opened by kathir-ks 8 months ago - 1 comment

#423 - 16 GPU per node

Issue - State: open - Opened by spcrobocar 8 months ago - 4 comments

#407 - NumPy will stop allowing conversion of out-of-bound Python integers to integer arrays. The conversion of xxxxxx to uint16 will fail in the future.

Issue - State: closed - Opened by zyklone4096 9 months ago - 1 comment

#398 - Fix a small bug in the attention bias calculation when flash attention is not available

Pull Request - State: open - Opened by tbuthfer 10 months ago

#378 - Newbie Q: is it possible to train n (look ahead) tokens at a time?

Issue - State: open - Opened by mw66 almost 1 year ago - 5 comments

#376 - Flash attention 2.0

Pull Request - State: closed - Opened by glinscott about 1 year ago - 2 comments

#373 - Why using np.int64 instead of int32 in train.py?

Issue - State: open - Opened by mw66 about 1 year ago - 6 comments

#362 - the '=' is unuseable

Pull Request - State: open - Opened by VatsaDev about 1 year ago

#325 - Adding of the prefix '_orig_mod.' while that gives an error while resuming the training.

Issue - State: open - Opened by Trisert about 1 year ago

#320 - Bug - model trained on Xs from two sample texts

Issue - State: open - Opened by Majdoddin about 1 year ago - 5 comments

#303 - Train/Val Loss Issues when training GPT-2 from OWT

Issue - State: open - Opened by JustinKunzi over 1 year ago - 19 comments

#289 - Cannot run train.py on Multiple Gpus

Issue - State: open - Opened by srivassid over 1 year ago - 2 comments

#285 - Why does batch size affect convergence?

Issue - State: open - Opened by 0dB over 1 year ago - 10 comments

#278 - Why LayerNorm before Self-Attention?!

Issue - State: open - Opened by xRootCode over 1 year ago - 2 comments

#273 - KeyError with Sample.py

Issue - State: open - Opened by allenweiss over 1 year ago - 5 comments

#272 - Error in importing custom weights

Issue - State: open - Opened by Maniues over 1 year ago - 4 comments

#242 - Windows not yet supported for torch.compile as of now

Pull Request - State: closed - Opened by mcblooder over 1 year ago - 2 comments

#204 - DistributedSampler

Issue - State: open - Opened by caiodataopshouse over 1 year ago - 4 comments

#182 - What MFU score is to be expected?

Issue - State: open - Opened by yohan-pg over 1 year ago - 6 comments

#167 - Loss becomes nan after training ~6000 iterations

Issue - State: closed - Opened by holyseven over 1 year ago - 27 comments

#116 - Minor change to allow using ddp with exclusive process mode

Pull Request - State: closed - Opened by ramtingh over 1 year ago - 4 comments

GitHub / karpathy/nanoGPT issues and pull requests