karpathy/build-nanogpt issues and pull requests

#90 - Saving `raw_model.state_dict()` checkpoints

Issue - State: open - Opened by anw-g01 23 days ago - 1 comment

#89 - Why does weights sharing have to be in that direction?

Issue - State: open - Opened by BetterWang about 1 month ago - 1 comment

#88 - Fixed fineweb script to run on Windows.

Pull Request - State: open - Opened by koiker about 1 month ago

#86 - How Can I extract Last Layer Representation?

Issue - State: open - Opened by shantanu778 4 months ago

#83 - Adding Resume Training Functionality

Pull Request - State: open - Opened by yousefelsharkawy 5 months ago

#82 - How is the autoregressive loss handled?

Issue - State: closed - Opened by BabyCNM 5 months ago - 2 comments

#81 - Avoid tiktoken.decode panic on unknown tokens.

Issue - State: open - Opened by IggShaman 6 months ago - 1 comment

#80 - Avoid tiktoken.decode panic on unknown tokens.

Pull Request - State: open - Opened by IggShaman 6 months ago

#79 - torch.compile-d models do not work with example generation and hellaswag eval

Issue - State: open - Opened by IggShaman 6 months ago - 1 comment

#78 - Make example generation work when the model is torch.compile-d

Pull Request - State: open - Opened by IggShaman 6 months ago

#77 - fix: Progress bar not complete after starting a new shard

Pull Request - State: open - Opened by sanbaiw 6 months ago

#73 - Update train_gpt2.py with compile mode for torch.compile

Pull Request - State: closed - Opened by manu-chauhan 6 months ago - 1 comment

#72 - Ensure Consistency Between GPTConfig.block_size and Sequence Length T

Pull Request - State: open - Opened by Benetti-Hub 6 months ago - 1 comment

#67 - TTS

Issue - State: open - Opened by yukiarimo 7 months ago - 8 comments

#66 - Refactor: Replace Class Name with cls in from_pretrained Method- Fix subclassing

Pull Request - State: open - Opened by mohsenhariri 7 months ago

#64 - Add `torch_xla` support to `build-nanogpt`

Pull Request - State: closed - Opened by miladm 7 months ago

#62 - Enable torch compile with DDP

Pull Request - State: open - Opened by stevebako 7 months ago

#60 - Fix torch.compile Issue - Error with HellaSwag eval and Generation

Issue - State: closed - Opened by ML-Guy 7 months ago

#59 - Run the script on MacOS

Pull Request - State: closed - Opened by amjadmajid 7 months ago

#57 - Run fineweb.py on MacOS

Pull Request - State: closed - Opened by amjadmajid 7 months ago

#56 - Text generation can use raw_model instead of model

Issue - State: open - Opened by sapphire008 7 months ago

#54 - consistently change model to raw_model

Pull Request - State: open - Opened by hdocmsu 8 months ago

#52 - Dataloader now shuffles the shards and documents within

Pull Request - State: open - Opened by fraserlove 8 months ago - 1 comment

#50 - Different inference results between flash attention and manually implemented attention appeared.

Issue - State: open - Opened by Jaeckel-d 8 months ago

#49 - How to support padding in the train dataset for training ?

Issue - State: open - Opened by mrhimanshu 8 months ago - 2 comments

#48 - Integrating GPT-2 with deepspeed Zero-1, Zero-2 and Zero-3

Issue - State: open - Opened by Devadeut 8 months ago - 1 comment

#47 - Cannot get the log file "log124M_40B/log.txt"?

Issue - State: open - Opened by dtdo90 8 months ago - 5 comments

#46 - Fineweb sharding multi-process bugfix

Pull Request - State: closed - Opened by GrahamTheCoder 8 months ago - 1 comment

#45 - Running codes on Windows issues

Issue - State: open - Opened by gerardaristizabalpla4 8 months ago - 2 comments

#44 - RuntimeError: User specified an unsupported autocast device_type 'cuda:0'

Issue - State: closed - Opened by 0smboy 8 months ago - 1 comment

#43 - Update README.md

Pull Request - State: open - Opened by sriramgkn 8 months ago

#42 - Fix out of bounds check

Pull Request - State: open - Opened by lukasugar 8 months ago

#41 - Executing with 1 GPU raises "OutOfMemory Exception", executing with 2 GPUs "RuntimeError: CUDA error: invalid device ordinal"

Issue - State: closed - Opened by nmerkle 8 months ago - 2 comments

#40 - add requirements

Pull Request - State: open - Opened by mchen610 8 months ago

#39 - add requirements

Pull Request - State: closed - Opened by mchen610 8 months ago

#38 - 564

Pull Request - State: closed - Opened by testwithtesty 8 months ago

#36 - Add .gitignore

Pull Request - State: closed - Opened by lutzroeder 8 months ago

#32 - Fix typo in configure_optimizers

Pull Request - State: open - Opened by lukasugar 8 months ago

#31 - Is dataloader making optimal batches?

Issue - State: closed - Opened by paraschopra 8 months ago - 1 comment

#30 - fix sync issue that results in incorrect gradient accumulation and incorrect loss

Pull Request - State: closed - Opened by WilsonCWu 8 months ago - 1 comment

#29 - NO dropout in MLP and CausalSelfAttention

Issue - State: closed - Opened by peter-ni-noob 8 months ago - 2 comments

#28 - fix `Attempted to set non-positive bottom ylim` UserWarning in play.ipynb

Pull Request - State: closed - Opened by WilsonCWu 8 months ago

#27 - fix script for non-cuda devices

Pull Request - State: closed - Opened by adamskrodzki 8 months ago - 1 comment

#26 - current position should be 0 at the start a shard

Pull Request - State: closed - Opened by eliebak 8 months ago - 2 comments

#25 - Sharding the dataset not completing?

Issue - State: open - Opened by dustinwloring1988 8 months ago - 7 comments

#24 - Extending GPT2 for audio generation

Pull Request - State: open - Opened by nivibilla 8 months ago

#23 - fix the double-jump for out-of-bound check in DataLoaderLite

Pull Request - State: closed - Opened by fatemi 8 months ago - 3 comments

#22 - Chunking method in the original GPT-2 training dataset

Issue - State: closed - Opened by rasbt 8 months ago - 2 comments

#20 - Fix typo in README.md

Pull Request - State: closed - Opened by awesomebytes 8 months ago - 2 comments

#19 - Propose validation loss calculation to imporve accuracy by reducing floating-point errors

Pull Request - State: open - Opened by aakashapoorv 8 months ago - 2 comments

#18 - Embeddings are initialized with std of 0.02

Issue - State: open - Opened by eryk-mazus 8 months ago - 2 comments

#17 - Implement tensor parallelism

Issue - State: closed - Opened by marib00 8 months ago - 4 comments

#16 - Fix device type in autocast

Pull Request - State: closed - Opened by banyan-god 8 months ago

#9 - Update train_gpt2.py

Pull Request - State: closed - Opened by zhangfaen 8 months ago

#8 - Update README.md

Pull Request - State: closed - Opened by aepiotti 8 months ago

GitHub / karpathy/build-nanogpt issues and pull requests