karpathy/nanoGPT issues and pull requests

#594 - A GUI version of nanoGPT

Issue - State: open - Opened by ystemsrx 9 days ago

#593 - Refactor: Optimize device handling and DDP setup

Pull Request - State: closed - Opened by metacritical 15 days ago - 1 comment

#592 - Refactor: Optimize device handling and DDP setup

Pull Request - State: closed - Opened by metacritical 15 days ago

#591 - attempt to refactor nanoGPT

Issue - State: open - Opened by tesla-cat 19 days ago

#590 - RoPE implementation with a shakespeare-char-rope test

Pull Request - State: open - Opened by albertvucinovic 26 days ago - 1 comment

#589 - Why seed 1337?

Issue - State: open - Opened by sillymultifora 30 days ago - 1 comment

#588 - Refactored code from different base based on leyan_branch

Pull Request - State: open - Opened by cesposo about 1 month ago

#587 - Add the Quantized model and also a Demo of the Quantized model

Pull Request - State: open - Opened by Ruhaan838 about 1 month ago

#586 - why in transformer we compute for all tokens but then use only the last token for prediction?

Issue - State: open - Opened by Ahmedd-Wahdan about 1 month ago

#585 - Add rotary

Pull Request - State: open - Opened by sunddytwo about 1 month ago

#584 - Dec branch commit

Pull Request - State: open - Opened by cesposo about 2 months ago

#583 - tuh

Pull Request - State: open - Opened by ftDyuthi about 2 months ago

#582 - Create Raja king

Pull Request - State: open - Opened by RockyRajhacker 2 months ago

#581 - Added weight pruning

Pull Request - State: closed - Opened by aswinr19 2 months ago

#580 - added dataset

Pull Request - State: closed - Opened by Gaurav-B-R 2 months ago

#579 - how fix it？

Issue - State: closed - Opened by WhiteSnowGirl 2 months ago - 1 comment

#578 - fix: ensure non-zero learning rate during warmup at iteration 0

Pull Request - State: closed - Opened by silasalberti 2 months ago - 1 comment

#577 - NanoGPT and RTX 4090

Issue - State: open - Opened by ArtHughes 3 months ago

#576 - Test

Pull Request - State: closed - Opened by rkdgmlqja 3 months ago

#575 - Feature/concrete dropout

Pull Request - State: closed - Opened by javiermas 3 months ago

#574 - Merge for comprehension when filtering parameters without grad

Pull Request - State: open - Opened by tsdeng 3 months ago

#574 - Merge for comprehension when filtering parameters without grad

Pull Request - State: open - Opened by tsdeng 3 months ago

#573 - Oren/amd mess

Pull Request - State: closed - Opened by OrenLeung 3 months ago

#572 - Oren/config

Pull Request - State: closed - Opened by OrenLeung 3 months ago

#571 - cancel

Pull Request - State: closed - Opened by Zhao-Yuting 3 months ago

#570 - NaniGpt

Issue - State: open - Opened by ashokkumar272 4 months ago

#570 - NaniGpt

Issue - State: open - Opened by ashokkumar272 4 months ago

#569 - added fix to type comparison to enable fused AdamW

Pull Request - State: open - Opened by seanjudelyons 4 months ago

#568 - Spring cleaning

Pull Request - State: closed - Opened by ckgresla 4 months ago

#567 - How best to implement a differential transformer?

Issue - State: open - Opened by Wilsontomass 4 months ago - 2 comments

#566 - the things

Pull Request - State: closed - Opened by drisspg 4 months ago

#565 - Normalized gpt

Pull Request - State: closed - Opened by santiagoakle 4 months ago - 1 comment

#564 - Ddp do not sync when not needed

Pull Request - State: closed - Opened by OrenLeung 4 months ago

#563 - Refactor to stop inductor mess

Pull Request - State: closed - Opened by OrenLeung 4 months ago

#562 - Moe

Pull Request - State: closed - Opened by hellozmz 4 months ago

#561 - Clean

Pull Request - State: closed - Opened by simran-arora 4 months ago

#560 - Windows 11: FileExistsError: [WinError 183] Cannot create a file when that file already exists

Issue - State: open - Opened by VyBui 5 months ago - 2 comments

#559 - Update README.md

Pull Request - State: closed - Opened by eshwarram 5 months ago

#558 - Updated README.md to include table of contents, why this project is useful, and how to contribute, and added an output for one command

Pull Request - State: open - Opened by arhaque09 5 months ago

#557 - Updated README.md to include table of contents, why this project is useful, and how to contribute

Pull Request - State: closed - Opened by arhaque09 5 months ago

#556 - Updated README.md to include table of contents, why this project is useful, and how to contribute

Pull Request - State: closed - Opened by arhaque09 5 months ago

#555 - Adding NVIDIA hardware performance detection

Pull Request - State: open - Opened by fparisio 5 months ago

#554 - Pretraining loss explosion

Issue - State: open - Opened by mattgorb 5 months ago - 3 comments

#553 - Add fire finetuning

Pull Request - State: open - Opened by gkielian 6 months ago

#552 - why is the warmup_iters set 2000?

Issue - State: open - Opened by luxunxiansheng 6 months ago

#551 - The Positional Encoding is not using sin / cos?

Issue - State: open - Opened by mw66 6 months ago - 1 comment

#550 - Remove flashattention from model.py

Pull Request - State: closed - Opened by chughtapan 6 months ago

#549 - Implement muP and add code for mup guide blog

Pull Request - State: closed - Opened by ndey96 6 months ago

#548 - Perplexity

Issue - State: open - Opened by Precola 6 months ago

#547 - Progressive training?

Issue - State: open - Opened by immartian 6 months ago - 5 comments

#546 - Add support for 0 temperature

Pull Request - State: open - Opened by jmccrosky 6 months ago

#545 - torchrun on L40S Error:torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

Issue - State: closed - Opened by Precola 6 months ago - 1 comment

#544 - Rocm support?

Issue - State: open - Opened by ilovethensa 6 months ago - 1 comment

#543 - Calculation of Batch Size

Issue - State: closed - Opened by Precola 7 months ago - 1 comment

#542 - configuration for Macs(apple silicon)

Issue - State: open - Opened by bawsi99 7 months ago

#541 - Adding gpt2 training experiment

Pull Request - State: closed - Opened by NewtonSander 7 months ago

#540 - Use weights_only for loading

Pull Request - State: open - Opened by kit1980 7 months ago

#539 - What to change for training on two T4 GPUs ?

Issue - State: open - Opened by noorchauhan 7 months ago - 1 comment

#538 - Update train.py for more efficiency

Pull Request - State: open - Opened by Jesseonmi 7 months ago

#537 - Simple Use Case Demonstration with Old School Runescape Terminology

Issue - State: open - Opened by Omarch47 7 months ago

#536 - Solution to Exercise 1 from Youtube Lecture (Batching the heads) - Why does it work?

Issue - State: closed - Opened by Andrew-Luo1 7 months ago - 1 comment

#535 - Nano GPT

Issue - State: open - Opened by phanee123 7 months ago

#534 - ddp on macbook CPU

Pull Request - State: closed - Opened by langong347 7 months ago

#533 - free up state_dict variable memory after loading checkpoint

Pull Request - State: open - Opened by adistomar 8 months ago

#532 - FileNotFoundError: [Errno 2] No such file or directory: 'data/openwebtext/train.bin'

Issue - State: open - Opened by HarikrishnanK9 8 months ago - 1 comment

#531 - About the get_batch

Issue - State: open - Opened by leo-young 8 months ago - 1 comment

#530 - Add automatic detection of number of CPU cores

Pull Request - State: open - Opened by Jakobovski 8 months ago - 1 comment

#529 - Data cleaning for openwebtext

Issue - State: open - Opened by zzkzzkjsw 8 months ago

#528 - fix val dataset size code comment

Pull Request - State: open - Opened by vhmth 8 months ago

#527 - fix(train.py): mfu estimation to respect CPU-GPU sync point

Pull Request - State: open - Opened by JasonLiJT 8 months ago

#526 - code gpt v1

Pull Request - State: closed - Opened by shatrugna 8 months ago

#525 - "RuntimeError: Internal Triton PTX codegen error" is raised when I train shakespeare_char with a GPU

Issue - State: open - Opened by shenbb 8 months ago - 5 comments

#524 - Pretraining Divergence

Issue - State: open - Opened by egoetz 8 months ago - 3 comments

#493 - Overfitting of the small GPU model

Issue - State: open - Opened by Bachstelze 9 months ago

#492 - Drop in performance when changing dtype to float32

Issue - State: open - Opened by blaisedelattre 9 months ago - 2 comments

#491 - Improvements to RWKV v5.1

Pull Request - State: closed - Opened by faresobeid 9 months ago

#490 - Update README.md - added alternative running instructions

Pull Request - State: closed - Opened by dnordfors 9 months ago

#489 - What does "prioritize teeth over education" even mean?

Issue - State: open - Opened by dw61 9 months ago - 2 comments

#488 - sign descent seems to do better than adamw?

Pull Request - State: open - Opened by nullonesix 9 months ago

#487 - Update README.md

Pull Request - State: closed - Opened by jellehak 9 months ago

#486 - [Q] Async prefetch next batch while model is doing forward pass

Issue - State: open - Opened by GM-git-dotcom 9 months ago - 1 comment

#485 - Shouldnt the ddp check be on ZERO instead of -1

Issue - State: open - Opened by sajinpgupta 9 months ago

#484 - Hyperparameter Tuning

Issue - State: closed - Opened by SinanCavusoglu 9 months ago

#483 - Index out of range when training on custom dataset

Issue - State: open - Opened by TayTT 9 months ago - 1 comment

#482 - What is the meaning of nh and hs

Issue - State: closed - Opened by Bachstelze 9 months ago - 1 comment

#481 - Fix: conditional use of GradScaler based on device_type and dtype in train.py

Pull Request - State: open - Opened by BRAINIAC2677 10 months ago

#480 - neverMind

Issue - State: closed - Opened by Zemulax 10 months ago

#479 - Implement multi-token prediction option for models

Issue - State: open - Opened by tmostak 10 months ago - 7 comments

#478 - nanoGPT/model.py where `manual implementation of attention`,Is it correct to modify it like I did？

Issue - State: open - Opened by wmx-github 10 months ago - 1 comment

#477 - Training fails on Python 3.12 on either GPU or CPU

Issue - State: closed - Opened by tigran123 10 months ago - 3 comments

#476 - Recommendation for something smaller

Issue - State: open - Opened by diamondfishtools 10 months ago - 1 comment

#475 - [Question] Why use `call` to do forward.

Issue - State: closed - Opened by Felix-Zhenghao 10 months ago - 2 comments

#474 - could nanoGPT be the AI assistant for the development of CAX software?

Issue - State: open - Opened by fengsim 10 months ago - 1 comment

#473 - [Question] The mask size seems wrong?

Issue - State: closed - Opened by Felix-Zhenghao 10 months ago

#472 - [Question] why bias is init to zero?

Issue - State: closed - Opened by michael8090 10 months ago - 1 comment

#471 - Citing this project in research

Issue - State: open - Opened by davmacario 11 months ago - 4 comments

#470 - CUDA error: device-side assert triggered

Issue - State: closed - Opened by ecsfu 11 months ago

#469 - How to Set "vocab_size" and "block_size" for Word Embedding?

Issue - State: open - Opened by haibao-yu 11 months ago - 1 comment

#468 - Is this loss curve normal

Issue - State: open - Opened by banyan-god 11 months ago - 21 comments

#467 - Resume Training

Issue - State: open - Opened by tiredsoul21 11 months ago - 3 comments

GitHub / karpathy/nanoGPT issues and pull requests