karpathy/llm.c issues and pull requests

#654 - Set RNG seed manually with '-rg' parameter

Pull Request - State: closed - Opened by ademeure 5 months ago - 1 comment

#650 - muP (maximum update parametrization)

Pull Request - State: open - Opened by gordicaleksa 5 months ago - 7 comments

#644 - Mixed dtypes

Pull Request - State: closed - Opened by ngc92 5 months ago

#642 - Windows issue with Cuda Toolkit 12.5 and latest MSVC compiler 17.10

Issue - State: closed - Opened by rosslwheeler 5 months ago - 2 comments

#635 - On-device reductions

Pull Request - State: closed - Opened by ngc92 5 months ago

#595 - Changes toward `layernorm_forward` in `dev/cuda`

Pull Request - State: closed - Opened by KarhouTam 5 months ago - 7 comments

#593 - Zero 2

Pull Request - State: open - Opened by ngc92 5 months ago

#492 - Cudnn error cudnn_att.cpp on train_gptcu

Issue - State: closed - Opened by maderix 6 months ago - 5 comments

#424 - vectorized gemm loading and use register to hold the intermediate value

Pull Request - State: closed - Opened by patricxu 6 months ago

#388 - Autodetect GPU compute capability using nvidia-smi.

Pull Request - State: closed - Opened by akulchik 6 months ago - 4 comments

#372 - How to do Inference on the trained weight of GPT 2 model after finishing the training on CPU using train_gpt2.py and train_gpt2 ?

Issue - State: open - Opened by asifshaikat 6 months ago - 1 comment

#366 - Assertion `graph->check_support(cudnn_handle).is_good()' failed

Issue - State: open - Opened by wfoy 6 months ago - 21 comments

#359 - Error: make: *** [Makefile:203: train_gpt2cu] Error 255

Issue - State: open - Opened by yushengsu-thu 6 months ago - 7 comments

#102 - some Rust error

Issue - State: open - Opened by nyck33 7 months ago

#101 - Building on Windows

Pull Request - State: closed - Opened by azret 7 months ago - 2 comments

#100 - Use cudaHostMalloc for inputs/targets and cpu_losses

Pull Request - State: closed - Opened by ademeure 7 months ago - 1 comment

#99 - CUDA lossless compressible memory for activations

Pull Request - State: open - Opened by ademeure 7 months ago

#98 - use cublaslt and optionally tf32, which fuses bias

Pull Request - State: closed - Opened by karpathy 7 months ago - 4 comments

#97 - fix typo in gpt2_build_from_checkpoint

Pull Request - State: closed - Opened by 3DRX 7 months ago

#96 - Does it have an interactive mode like ChatGPT?

Issue - State: closed - Opened by xhy2008 7 months ago - 2 comments

#95 - Print total training time

Pull Request - State: closed - Opened by krrishnarraj 7 months ago - 1 comment

#94 - Suggested to add a check for the return value of Malloc

Issue - State: closed - Opened by dududuguo 7 months ago - 1 comment

#93 - output is not consistent when I load the gpt2_124M.bin

Issue - State: open - Opened by kx-kexi 7 months ago - 1 comment

#92 - Support older CUDA GPU hardware by default

Issue - State: open - Opened by gel 7 months ago - 3 comments

#91 - AI is Artificial Idiot

Issue - State: closed - Opened by limaofu 7 months ago - 1 comment

#90 - Add `decode_gpt2.c` for decoding in C

Pull Request - State: closed - Opened by martin-liu 7 months ago - 9 comments

#89 - ~2x perf improvement beating PyTorch (cublasLt, TF32, CUDA graphs, kernel fusion, etc…)

Pull Request - State: open - Opened by ademeure 7 months ago - 3 comments

#88 - AssertionError: Torch not compiled with CUDA enabled

Issue - State: open - Opened by sandeepkumarsuresh 7 months ago

#87 - Use the command 'brew --prefix libomp' to retrieve the location where libomp would be installed on macOS.

Pull Request - State: open - Opened by linmajia 7 months ago - 1 comment

#86 - What else can I say, awesome

Issue - State: closed - Opened by xsxz01 7 months ago

#85 - no CUDA-capable device is detected

Issue - State: closed - Opened by rucnyz 7 months ago - 4 comments

#83 - [Suggestion] Discussions tab for general help

Issue - State: closed - Opened by AndreSlavescu 7 months ago - 2 comments

#82 - cooperative groups and fused scale kernel

Pull Request - State: closed - Opened by ngc92 7 months ago - 1 comment

#81 - RuntimeError: must forward with targets before backward

Issue - State: closed - Opened by 1997MarsRover 7 months ago - 1 comment

#80 - Draft: Layer norm v2

Pull Request - State: closed - Opened by ngc92 7 months ago - 1 comment

#79 - Include the online softmax CPU code and a fully parallelized GPU kernal

Pull Request - State: closed - Opened by lancerts 7 months ago - 4 comments

#78 - correction du readme

Pull Request - State: closed - Opened by dimaclara 7 months ago

#77 - LOSS MISMATCH AT STEP 0: 2.864161 5.270007

Issue - State: open - Opened by dbl001 7 months ago

#76 - slightly faster gelu on smaller blocksize contexts

Pull Request - State: open - Opened by AndreSlavescu 7 months ago

#75 - Include the online softmax CPU code and native port to GPU kernel

Pull Request - State: closed - Opened by lancerts 7 months ago

#74 - :OMP: Error #15: Initializing libomp.dylib, but found libomp.dylib already initialized." Then it hangs at "python train_gpt2.py"

Issue - State: closed - Opened by buffalobillhuang 7 months ago

#73 - AssertionError("Torch not compiled with CUDA enabled")

Issue - State: closed - Opened by dbl001 7 months ago - 1 comment

#72 - -O3 cannot go with -Ofast

Pull Request - State: closed - Opened by Soldy 7 months ago - 1 comment

#71 - Organize defined constants

Pull Request - State: closed - Opened by modigeko 7 months ago - 1 comment

#70 - A file not found error was encountered while compiling

Issue - State: open - Opened by dzbbdawang 7 months ago

#69 - [build failed]Compiler encountered an internal error

Issue - State: open - Opened by hhhaiai 7 months ago - 3 comments

#68 - Improve numerical stability in loss calculation

Pull Request - State: closed - Opened by poad42 7 months ago - 2 comments

#67 - Fixed a TODO to calculate the max value neatly and use inv sum trick

Pull Request - State: open - Opened by sirvan3tr 7 months ago - 2 comments

#65 - looking forward supporting winx86-msvc

Issue - State: open - Opened by miaomiao1992 7 months ago - 2 comments

#64 - [train_gpt2.py] synchronize based on device

Pull Request - State: closed - Opened by krrishnarraj 7 months ago

#63 - the provided PTX was compiled with an unsupported toolchain.

Issue - State: open - Opened by bogan-FMA 7 months ago - 3 comments

#62 - Add check for CUDA availability before synchronizing in train_gpt2.py

Pull Request - State: closed - Opened by grepinsight 7 months ago

#61 - Fix repeated calculation on forward and back prop

Pull Request - State: closed - Opened by ayushanshul07 7 months ago

#60 - Speedup `attention_forward_kernel2` by implementing Flash Attention 2 kernel

Pull Request - State: open - Opened by leloykun 7 months ago - 2 comments

#59 - Add CMake project for cross platform support and easier quick start setup

Pull Request - State: closed - Opened by abuneri 7 months ago

#58 - fix typo in crossentropy_foward.cu

Pull Request - State: closed - Opened by lancerts 7 months ago

#57 - Precompute the scaling factor in gelu_forward and gelu_backward

Issue - State: closed - Opened by ryanmcdermott 7 months ago - 4 comments

#56 - Detect OpenMP support - macOS Intel

Pull Request - State: closed - Opened by scotthaleen 7 months ago

#55 - Add Dev Container Support for CPU and GPU

Pull Request - State: open - Opened by lqdev 7 months ago

#54 - Fused bias with matmul using `cublasLtMatmul`

Issue - State: closed - Opened by andylolu2 7 months ago - 2 comments

#53 - readability updates: param_size calcs

Pull Request - State: closed - Opened by jnros 7 months ago - 1 comment

#52 - Clarify param_sizes calculation in gpt2_build_from_checkpoint()

Issue - State: closed - Opened by jnros 7 months ago - 1 comment

#51 - fully fused layer-norm kernel

Pull Request - State: closed - Opened by ngc92 7 months ago - 1 comment

#50 - Including venv/ to .gitignore and fixing typo

Pull Request - State: open - Opened by arturodrt 7 months ago

#49 - Include thread coarsening factor for matmul kernal

Pull Request - State: closed - Opened by lancerts 7 months ago

#48 - fix error in small typos in matmul_forward.cu

Pull Request - State: closed - Opened by lancerts 7 months ago - 1 comment

#47 - update layernorm.md

Pull Request - State: closed - Opened by eltociear 7 months ago

#46 - Update README.md

Pull Request - State: closed - Opened by 100apps 7 months ago

#45 - Add Python virtual environment notice

Pull Request - State: closed - Opened by Cuda-Chen 7 months ago

#44 - Added the .gitignore file.

Pull Request - State: closed - Opened by this-is-batman 7 months ago - 4 comments

#43 - Add .gitignore to the project.

Issue - State: closed - Opened by this-is-batman 7 months ago

#42 - Create LICENSE

Pull Request - State: closed - Opened by zarlo 7 months ago

#41 - project license

Issue - State: closed - Opened by zarlo 7 months ago

#40 - Support MPI distributed training

Issue - State: open - Opened by sequoiar 7 months ago - 6 comments

#39 - Suboptimal warp reductions

Issue - State: open - Opened by IlyaGrebnov 7 months ago

#38 - fix the consistency of the transpose notation in matmul_foward.cu

Pull Request - State: closed - Opened by lancerts 7 months ago

#37 - HIP support multigpu, AMD, Nvidia.

Pull Request - State: closed - Opened by Avicted 7 months ago - 1 comment

#36 - Generation error on MPS (Torch >= 2.2.0, MacOS 14.4)

Issue - State: open - Opened by davmacario 7 months ago - 8 comments

#35 - Bus ERROR while running `train_gpt2.py`

Issue - State: open - Opened by Abdurrahheem 7 months ago - 14 comments

#34 - Free the memory in layernorm.c

Pull Request - State: closed - Opened by VinciGit00 7 months ago

#33 - fix a potential error: identifier M_PI is undefined in the gelu kernal

Pull Request - State: closed - Opened by lancerts 7 months ago

#32 - Error: backward before forward

Issue - State: closed - Opened by chsasank 7 months ago - 3 comments

#31 - Why CUDA when we can SYCL

Issue - State: open - Opened by chsasank 7 months ago - 3 comments

#30 - when running python train_gpt2.py, errors out after 10 iteration -- is this normal?

Issue - State: closed - Opened by JamesHuang2004 7 months ago - 8 comments

#29 - Waiting for CUDA implement

Issue - State: closed - Opened by namtranase 7 months ago - 1 comment

#28 - Why not Mojo?

Issue - State: open - Opened by blazickjp 7 months ago - 13 comments

#27 - Update README.md

Pull Request - State: closed - Opened by risingMantis 7 months ago

#26 - [Proposal] Implement GaLore trainer

Issue - State: open - Opened by zhangchn 7 months ago

#25 - tweak: instead of using -10000.0f for finding the max, use the first item

Pull Request - State: closed - Opened by NunoSempere 7 months ago - 3 comments

#24 - enhanced tensor comparison with higher precision.

Pull Request - State: closed - Opened by anurag12-webster 7 months ago - 3 comments

#23 - fix: torch warning of python demo

Pull Request - State: closed - Opened by rokku-c 7 months ago

#22 - Will it be a walkthrough tutorial on this?

Issue - State: closed - Opened by simjak 7 months ago - 1 comment

#21 - fix for Error: must forward with targets before backward [#19]

Pull Request - State: closed - Opened by ent0n29 7 months ago - 18 comments

#20 - Fix a typo

Pull Request - State: closed - Opened by varunlakkur 7 months ago - 1 comment

#19 - Error: must forward with targets before backward

Issue - State: closed - Opened by lizhipengpeng 7 months ago - 38 comments

#18 - write LLVM optimization passes for train_gpt2

Issue - State: open - Opened by ent0n29 7 months ago - 2 comments

#17 - error while running the makefile train_gpt2 on windows machine.

Issue - State: closed - Opened by anurag12-webster 7 months ago - 1 comment

#16 - format the layernorm doc

Pull Request - State: closed - Opened by richzw 7 months ago

#15 - Include the pytorch layer_norm.cpp and layer_norm_kernel.cu code pointer in readme

Pull Request - State: closed - Opened by lancerts 7 months ago

#14 - Using the compiler at hand

Pull Request - State: closed - Opened by Ricardicus 7 months ago - 2 comments

GitHub / karpathy/llm.c issues and pull requests