Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / karpathy/nanoGPT issues and pull requests

#555 - Adding NVIDIA hardware performance detection

Pull Request - State: open - Opened by fparisio 1 day ago

#554 - Pretraining loss explosion

Issue - State: open - Opened by mattgorb 8 days ago

#553 - Add fire finetuning

Pull Request - State: open - Opened by gkielian 15 days ago

#552 - why is the warmup_iters set 2000?

Issue - State: open - Opened by luxunxiansheng 16 days ago

#551 - The Positional Encoding is not using sin / cos?

Issue - State: open - Opened by mw66 17 days ago

#550 - Remove flashattention from model.py

Pull Request - State: closed - Opened by chughtapan 17 days ago

#549 - Implement muP and add code for mup guide blog

Pull Request - State: closed - Opened by ndey96 23 days ago

#548 - Perplexity

Issue - State: open - Opened by Precola 25 days ago

#547 - Progressive training?

Issue - State: open - Opened by immartian 30 days ago - 3 comments

#546 - Add support for 0 temperature

Pull Request - State: open - Opened by jmccrosky about 1 month ago

#544 - Rocm support?

Issue - State: open - Opened by ilovethensa about 1 month ago

#543 - Calculation of Batch Size

Issue - State: closed - Opened by Precola about 2 months ago - 1 comment

#542 - configuration for Macs(apple silicon)

Issue - State: open - Opened by bawsi99 about 2 months ago

#541 - Adding gpt2 training experiment

Pull Request - State: closed - Opened by NewtonSander about 2 months ago

#540 - Use weights_only for loading

Pull Request - State: open - Opened by kit1980 about 2 months ago

#539 - What to change for training on two T4 GPUs ?

Issue - State: open - Opened by noorchauhan about 2 months ago - 1 comment

#538 - Update train.py for more efficiency

Pull Request - State: open - Opened by Jesseonmi 2 months ago

#535 - Nano GPT

Issue - State: open - Opened by phanee123 2 months ago

#534 - ddp on macbook CPU

Pull Request - State: closed - Opened by langong347 2 months ago

#533 - free up state_dict variable memory after loading checkpoint

Pull Request - State: open - Opened by adistomar 2 months ago

#531 - About the get_batch

Issue - State: open - Opened by leo-young 3 months ago - 1 comment

#530 - Add automatic detection of number of CPU cores

Pull Request - State: open - Opened by Jakobovski 3 months ago - 1 comment

#529 - Data cleaning for openwebtext

Issue - State: open - Opened by zzkzzkjsw 3 months ago

#528 - fix val dataset size code comment

Pull Request - State: open - Opened by vhmth 3 months ago

#527 - fix(train.py): mfu estimation to respect CPU-GPU sync point

Pull Request - State: open - Opened by JasonLiJT 3 months ago

#526 - code gpt v1

Pull Request - State: closed - Opened by shatrugna 3 months ago

#524 - Pretraining Divergence

Issue - State: open - Opened by egoetz 3 months ago - 3 comments

#493 - Overfitting of the small GPU model

Issue - State: open - Opened by Bachstelze 3 months ago

#492 - Drop in performance when changing dtype to float32

Issue - State: open - Opened by blaisedelattre 3 months ago - 2 comments

#491 - Improvements to RWKV v5.1

Pull Request - State: closed - Opened by faresobeid 3 months ago

#490 - Update README.md - added alternative running instructions

Pull Request - State: closed - Opened by dnordfors 4 months ago

#489 - What does "prioritize teeth over education" even mean?

Issue - State: open - Opened by dw61 4 months ago - 2 comments

#488 - sign descent seems to do better than adamw?

Pull Request - State: open - Opened by nullonesix 4 months ago

#487 - Update README.md

Pull Request - State: closed - Opened by jellehak 4 months ago

#486 - [Q] Async prefetch next batch while model is doing forward pass

Issue - State: open - Opened by GM-git-dotcom 4 months ago - 1 comment

#485 - Shouldnt the ddp check be on ZERO instead of -1

Issue - State: open - Opened by sajinpgupta 4 months ago

#484 - Hyperparameter Tuning

Issue - State: closed - Opened by SinanCavusoglu 4 months ago

#483 - Index out of range when training on custom dataset

Issue - State: open - Opened by TayTT 4 months ago - 1 comment

#482 - What is the meaning of nh and hs

Issue - State: closed - Opened by Bachstelze 4 months ago - 1 comment

#480 - neverMind

Issue - State: closed - Opened by Zemulax 5 months ago

#479 - Implement multi-token prediction option for models

Issue - State: open - Opened by tmostak 5 months ago - 7 comments

#477 - Training fails on Python 3.12 on either GPU or CPU

Issue - State: closed - Opened by tigran123 5 months ago - 3 comments

#476 - Recommendation for something smaller

Issue - State: open - Opened by diamondfishtools 5 months ago - 1 comment

#475 - [Question] Why use `__call__` to do forward.

Issue - State: closed - Opened by Felix-Zhenghao 5 months ago - 2 comments

#474 - could nanoGPT be the AI assistant for the development of CAX software?

Issue - State: open - Opened by fengsim 5 months ago - 1 comment

#473 - [Question] The mask size seems wrong?

Issue - State: closed - Opened by Felix-Zhenghao 5 months ago

#472 - [Question] why bias is init to zero?

Issue - State: closed - Opened by michael8090 5 months ago - 1 comment

#471 - Citing this project in research

Issue - State: open - Opened by davmacario 5 months ago - 4 comments

#470 - CUDA error: device-side assert triggered

Issue - State: closed - Opened by ecsfu 6 months ago

#469 - How to Set "vocab_size" and "block_size" for Word Embedding?

Issue - State: open - Opened by haibao-yu 6 months ago - 1 comment

#468 - Is this loss curve normal

Issue - State: open - Opened by banyan-god 6 months ago - 20 comments

#467 - Resume Training

Issue - State: open - Opened by tiredsoul21 6 months ago - 3 comments

#466 - MFU too low in custom GPT-2 training

Issue - State: closed - Opened by eonurk 6 months ago - 2 comments

#465 - nano_gpt

Issue - State: open - Opened by Mihir0567 6 months ago

#464 - fix: h100-mfu-calculation

Pull Request - State: closed - Opened by OrenLeung 6 months ago - 1 comment

#463 - Fixing eval path in README

Pull Request - State: closed - Opened by goswamig 6 months ago

#462 - gabe init

Pull Request - State: closed - Opened by jondestoppeleire 6 months ago

#458 - Torch >= 2.2.0 inference issues on MPS

Issue - State: open - Opened by davmacario 6 months ago - 3 comments

#456 - MFU calculation wrong

Issue - State: open - Opened by lxww302 6 months ago - 2 comments

#455 - dropout is 0.0

Issue - State: open - Opened by dipsivenkatesh 6 months ago - 3 comments

#454 - PyTorch nn.LayerNorm now takes bias arg - removed custom class

Pull Request - State: open - Opened by calmitchell617 7 months ago - 1 comment

#453 - Early stopping

Pull Request - State: open - Opened by derekehyatt 7 months ago - 2 comments

#450 - Implement ROPE positional encodings

Pull Request - State: open - Opened by devinbot 7 months ago - 1 comment

#447 - Why don't we crop attn.weight as well?

Issue - State: open - Opened by muerghq 7 months ago - 1 comment

#440 - nothing has been written into???

Issue - State: open - Opened by BeimingCharles 7 months ago - 1 comment

#439 - AssertionError when trying to run sample.py

Issue - State: open - Opened by RexNecross 7 months ago - 2 comments

#438 - Which Python version can be used

Issue - State: open - Opened by denghuilong-sir 7 months ago - 2 comments

#435 - How to train nanoGPT using TPU's?

Issue - State: closed - Opened by kathir-ks 8 months ago - 1 comment

#423 - 16 GPU per node

Issue - State: open - Opened by spcrobocar 8 months ago - 4 comments

#378 - Newbie Q: is it possible to train n (look ahead) tokens at a time?

Issue - State: open - Opened by mw66 almost 1 year ago - 5 comments

#376 - Flash attention 2.0

Pull Request - State: closed - Opened by glinscott about 1 year ago - 2 comments

#373 - Why using np.int64 instead of int32 in train.py?

Issue - State: open - Opened by mw66 about 1 year ago - 6 comments

#362 - the '=' is unuseable

Pull Request - State: open - Opened by VatsaDev about 1 year ago

#320 - Bug - model trained on Xs from two sample texts

Issue - State: open - Opened by Majdoddin about 1 year ago - 5 comments

#303 - Train/Val Loss Issues when training GPT-2 from OWT

Issue - State: open - Opened by JustinKunzi over 1 year ago - 19 comments

#289 - Cannot run train.py on Multiple Gpus

Issue - State: open - Opened by srivassid over 1 year ago - 2 comments

#285 - Why does batch size affect convergence?

Issue - State: open - Opened by 0dB over 1 year ago - 10 comments

#278 - Why LayerNorm before Self-Attention?!

Issue - State: open - Opened by xRootCode over 1 year ago - 2 comments

#273 - KeyError with Sample.py

Issue - State: open - Opened by allenweiss over 1 year ago - 5 comments

#272 - Error in importing custom weights

Issue - State: open - Opened by Maniues over 1 year ago - 4 comments

#242 - Windows not yet supported for torch.compile as of now

Pull Request - State: closed - Opened by mcblooder over 1 year ago - 2 comments

#204 - DistributedSampler

Issue - State: open - Opened by caiodataopshouse over 1 year ago - 4 comments

#182 - What MFU score is to be expected?

Issue - State: open - Opened by yohan-pg over 1 year ago - 6 comments

#167 - Loss becomes nan after training ~6000 iterations

Issue - State: closed - Opened by holyseven over 1 year ago - 27 comments

#116 - Minor change to allow using ddp with exclusive process mode

Pull Request - State: closed - Opened by ramtingh over 1 year ago - 4 comments

#109 - how can I use this project to train a model for Chinese ?

Issue - State: open - Opened by hesilong over 1 year ago - 7 comments

#101 - Give love to tqdm too ;)

Pull Request - State: closed - Opened by maraoz over 1 year ago - 2 comments

#100 - Could it be used to train a 100B-size GPT?

Issue - State: closed - Opened by ericxsun over 1 year ago - 1 comment

#99 - Source for downloading train.bin and val.bin?

Issue - State: closed - Opened by lakaschus over 1 year ago - 2 comments

#98 - Signal: Segmentation fault

Issue - State: open - Opened by tombenj over 1 year ago - 2 comments