Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / karpathy/nanoGPT issues and pull requests
#555 - Adding NVIDIA hardware performance detection
Pull Request -
State: open - Opened by fparisio 1 day ago
#554 - Pretraining loss explosion
Issue -
State: open - Opened by mattgorb 8 days ago
#553 - Add fire finetuning
Pull Request -
State: open - Opened by gkielian 15 days ago
#552 - why is the warmup_iters set 2000?
Issue -
State: open - Opened by luxunxiansheng 16 days ago
#551 - The Positional Encoding is not using sin / cos?
Issue -
State: open - Opened by mw66 16 days ago
#550 - Remove flashattention from model.py
Pull Request -
State: closed - Opened by chughtapan 17 days ago
#549 - Implement muP and add code for mup guide blog
Pull Request -
State: closed - Opened by ndey96 23 days ago
#548 - Perplexity
Issue -
State: open - Opened by Precola 25 days ago
#547 - Progressive training?
Issue -
State: open - Opened by immartian 30 days ago
- 3 comments
#546 - Add support for 0 temperature
Pull Request -
State: open - Opened by jmccrosky about 1 month ago
#545 - torchrun on L40S Error:torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
Issue -
State: closed - Opened by Precola about 1 month ago
- 1 comment
#544 - Rocm support?
Issue -
State: open - Opened by ilovethensa about 1 month ago
#543 - Calculation of Batch Size
Issue -
State: closed - Opened by Precola about 1 month ago
- 1 comment
#542 - configuration for Macs(apple silicon)
Issue -
State: open - Opened by bawsi99 about 2 months ago
#541 - Adding gpt2 training experiment
Pull Request -
State: closed - Opened by NewtonSander about 2 months ago
#540 - Use weights_only for loading
Pull Request -
State: open - Opened by kit1980 about 2 months ago
#539 - What to change for training on two T4 GPUs ?
Issue -
State: open - Opened by noorchauhan about 2 months ago
- 1 comment
#538 - Update train.py for more efficiency
Pull Request -
State: open - Opened by Jesseonmi 2 months ago
#537 - Simple Use Case Demonstration with Old School Runescape Terminology
Issue -
State: open - Opened by Omarch47 2 months ago
#536 - Solution to Exercise 1 from Youtube Lecture (Batching the heads) - Why does it work?
Issue -
State: closed - Opened by Andrew-Luo1 2 months ago
- 1 comment
#535 - Nano GPT
Issue -
State: open - Opened by phanee123 2 months ago
#534 - ddp on macbook CPU
Pull Request -
State: closed - Opened by langong347 2 months ago
#533 - free up state_dict variable memory after loading checkpoint
Pull Request -
State: open - Opened by adistomar 2 months ago
#532 - FileNotFoundError: [Errno 2] No such file or directory: 'data/openwebtext/train.bin'
Issue -
State: open - Opened by HarikrishnanK9 3 months ago
- 1 comment
#531 - About the get_batch
Issue -
State: open - Opened by leo-young 3 months ago
- 1 comment
#530 - Add automatic detection of number of CPU cores
Pull Request -
State: open - Opened by Jakobovski 3 months ago
- 1 comment
#529 - Data cleaning for openwebtext
Issue -
State: open - Opened by zzkzzkjsw 3 months ago
#528 - fix val dataset size code comment
Pull Request -
State: open - Opened by vhmth 3 months ago
#527 - fix(train.py): mfu estimation to respect CPU-GPU sync point
Pull Request -
State: open - Opened by JasonLiJT 3 months ago
#526 - code gpt v1
Pull Request -
State: closed - Opened by shatrugna 3 months ago
#525 - "RuntimeError: Internal Triton PTX codegen error" is raised when I train shakespeare_char with a GPU
Issue -
State: open - Opened by shenbb 3 months ago
- 5 comments
#524 - Pretraining Divergence
Issue -
State: open - Opened by egoetz 3 months ago
- 3 comments
#493 - Overfitting of the small GPU model
Issue -
State: open - Opened by Bachstelze 3 months ago
#492 - Drop in performance when changing dtype to float32
Issue -
State: open - Opened by blaisedelattre 3 months ago
- 2 comments
#491 - Improvements to RWKV v5.1
Pull Request -
State: closed - Opened by faresobeid 3 months ago
#490 - Update README.md - added alternative running instructions
Pull Request -
State: closed - Opened by dnordfors 4 months ago
#489 - What does "prioritize teeth over education" even mean?
Issue -
State: open - Opened by dw61 4 months ago
- 2 comments
#488 - sign descent seems to do better than adamw?
Pull Request -
State: open - Opened by nullonesix 4 months ago
#487 - Update README.md
Pull Request -
State: closed - Opened by jellehak 4 months ago
#486 - [Q] Async prefetch next batch while model is doing forward pass
Issue -
State: open - Opened by GM-git-dotcom 4 months ago
- 1 comment
#485 - Shouldnt the ddp check be on ZERO instead of -1
Issue -
State: open - Opened by sajinpgupta 4 months ago
#484 - Hyperparameter Tuning
Issue -
State: closed - Opened by SinanCavusoglu 4 months ago
#483 - Index out of range when training on custom dataset
Issue -
State: open - Opened by TayTT 4 months ago
- 1 comment
#482 - What is the meaning of nh and hs
Issue -
State: closed - Opened by Bachstelze 4 months ago
- 1 comment
#481 - Fix: conditional use of GradScaler based on device_type and dtype in train.py
Pull Request -
State: open - Opened by BRAINIAC2677 5 months ago
#480 - neverMind
Issue -
State: closed - Opened by Zemulax 5 months ago
#479 - Implement multi-token prediction option for models
Issue -
State: open - Opened by tmostak 5 months ago
- 7 comments
#478 - nanoGPT/model.py where `manual implementation of attention`,Is it correct to modify it like I did?
Issue -
State: open - Opened by wmx-github 5 months ago
- 1 comment
#477 - Training fails on Python 3.12 on either GPU or CPU
Issue -
State: closed - Opened by tigran123 5 months ago
- 3 comments
#476 - Recommendation for something smaller
Issue -
State: open - Opened by diamondfishtools 5 months ago
- 1 comment
#475 - [Question] Why use `__call__` to do forward.
Issue -
State: closed - Opened by Felix-Zhenghao 5 months ago
- 2 comments
#474 - could nanoGPT be the AI assistant for the development of CAX software?
Issue -
State: open - Opened by fengsim 5 months ago
- 1 comment
#473 - [Question] The mask size seems wrong?
Issue -
State: closed - Opened by Felix-Zhenghao 5 months ago
#472 - [Question] why bias is init to zero?
Issue -
State: closed - Opened by michael8090 5 months ago
- 1 comment
#471 - Citing this project in research
Issue -
State: open - Opened by davmacario 5 months ago
- 4 comments
#470 - CUDA error: device-side assert triggered
Issue -
State: closed - Opened by ecsfu 6 months ago
#469 - How to Set "vocab_size" and "block_size" for Word Embedding?
Issue -
State: open - Opened by haibao-yu 6 months ago
- 1 comment
#468 - Is this loss curve normal
Issue -
State: open - Opened by banyan-god 6 months ago
- 20 comments
#467 - Resume Training
Issue -
State: open - Opened by tiredsoul21 6 months ago
- 3 comments
#466 - MFU too low in custom GPT-2 training
Issue -
State: closed - Opened by eonurk 6 months ago
- 2 comments
#465 - nano_gpt
Issue -
State: open - Opened by Mihir0567 6 months ago
#464 - fix: h100-mfu-calculation
Pull Request -
State: closed - Opened by OrenLeung 6 months ago
- 1 comment
#463 - Fixing eval path in README
Pull Request -
State: closed - Opened by goswamig 6 months ago
#462 - gabe init
Pull Request -
State: closed - Opened by jondestoppeleire 6 months ago
#458 - Torch >= 2.2.0 inference issues on MPS
Issue -
State: open - Opened by davmacario 6 months ago
- 3 comments
#456 - MFU calculation wrong
Issue -
State: open - Opened by lxww302 6 months ago
- 2 comments
#455 - dropout is 0.0
Issue -
State: open - Opened by dipsivenkatesh 6 months ago
- 3 comments
#454 - PyTorch nn.LayerNorm now takes bias arg - removed custom class
Pull Request -
State: open - Opened by calmitchell617 7 months ago
- 1 comment
#453 - Early stopping
Pull Request -
State: open - Opened by derekehyatt 7 months ago
- 2 comments
#450 - Implement ROPE positional encodings
Pull Request -
State: open - Opened by devinbot 7 months ago
- 1 comment
#447 - Why don't we crop attn.weight as well?
Issue -
State: open - Opened by muerghq 7 months ago
- 1 comment
#440 - nothing has been written into???
Issue -
State: open - Opened by BeimingCharles 7 months ago
- 1 comment
#439 - AssertionError when trying to run sample.py
Issue -
State: open - Opened by RexNecross 7 months ago
- 2 comments
#438 - Which Python version can be used
Issue -
State: open - Opened by denghuilong-sir 7 months ago
- 2 comments
#435 - How to train nanoGPT using TPU's?
Issue -
State: closed - Opened by kathir-ks 8 months ago
- 1 comment
#423 - 16 GPU per node
Issue -
State: open - Opened by spcrobocar 8 months ago
- 4 comments
#407 - NumPy will stop allowing conversion of out-of-bound Python integers to integer arrays. The conversion of xxxxxx to uint16 will fail in the future.
Issue -
State: closed - Opened by zyklone4096 9 months ago
- 1 comment
#398 - Fix a small bug in the attention bias calculation when flash attention is not available
Pull Request -
State: open - Opened by tbuthfer 10 months ago
#378 - Newbie Q: is it possible to train n (look ahead) tokens at a time?
Issue -
State: open - Opened by mw66 almost 1 year ago
- 5 comments
#376 - Flash attention 2.0
Pull Request -
State: closed - Opened by glinscott about 1 year ago
- 2 comments
#373 - Why using np.int64 instead of int32 in train.py?
Issue -
State: open - Opened by mw66 about 1 year ago
- 6 comments
#362 - the '=' is unuseable
Pull Request -
State: open - Opened by VatsaDev about 1 year ago
#325 - Adding of the prefix '_orig_mod.' while that gives an error while resuming the training.
Issue -
State: open - Opened by Trisert about 1 year ago
#320 - Bug - model trained on Xs from two sample texts
Issue -
State: open - Opened by Majdoddin about 1 year ago
- 5 comments
#303 - Train/Val Loss Issues when training GPT-2 from OWT
Issue -
State: open - Opened by JustinKunzi over 1 year ago
- 19 comments
#289 - Cannot run train.py on Multiple Gpus
Issue -
State: open - Opened by srivassid over 1 year ago
- 2 comments
#285 - Why does batch size affect convergence?
Issue -
State: open - Opened by 0dB over 1 year ago
- 10 comments
#278 - Why LayerNorm before Self-Attention?!
Issue -
State: open - Opened by xRootCode over 1 year ago
- 2 comments
#273 - KeyError with Sample.py
Issue -
State: open - Opened by allenweiss over 1 year ago
- 5 comments
#272 - Error in importing custom weights
Issue -
State: open - Opened by Maniues over 1 year ago
- 4 comments
#242 - Windows not yet supported for torch.compile as of now
Pull Request -
State: closed - Opened by mcblooder over 1 year ago
- 2 comments
#204 - DistributedSampler
Issue -
State: open - Opened by caiodataopshouse over 1 year ago
- 4 comments
#182 - What MFU score is to be expected?
Issue -
State: open - Opened by yohan-pg over 1 year ago
- 6 comments
#167 - Loss becomes nan after training ~6000 iterations
Issue -
State: closed - Opened by holyseven over 1 year ago
- 27 comments
#116 - Minor change to allow using ddp with exclusive process mode
Pull Request -
State: closed - Opened by ramtingh over 1 year ago
- 4 comments
#109 - how can I use this project to train a model for Chinese ?
Issue -
State: open - Opened by hesilong over 1 year ago
- 7 comments
#101 - Give love to tqdm too ;)
Pull Request -
State: closed - Opened by maraoz over 1 year ago
- 2 comments
#100 - Could it be used to train a 100B-size GPT?
Issue -
State: closed - Opened by ericxsun over 1 year ago
- 1 comment
#99 - Source for downloading train.bin and val.bin?
Issue -
State: closed - Opened by lakaschus over 1 year ago
- 2 comments
#98 - Signal: Segmentation fault
Issue -
State: open - Opened by tombenj over 1 year ago
- 2 comments