Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / karpathy/llm.c issues and pull requests
#654 - Set RNG seed manually with '-rg' parameter
Pull Request -
State: closed - Opened by ademeure 5 months ago
- 1 comment
#650 - muP (maximum update parametrization)
Pull Request -
State: open - Opened by gordicaleksa 5 months ago
- 7 comments
#644 - Mixed dtypes
Pull Request -
State: closed - Opened by ngc92 5 months ago
#642 - Windows issue with Cuda Toolkit 12.5 and latest MSVC compiler 17.10
Issue -
State: closed - Opened by rosslwheeler 5 months ago
- 2 comments
#635 - On-device reductions
Pull Request -
State: closed - Opened by ngc92 5 months ago
#595 - Changes toward `layernorm_forward` in `dev/cuda`
Pull Request -
State: closed - Opened by KarhouTam 5 months ago
- 7 comments
#593 - Zero 2
Pull Request -
State: open - Opened by ngc92 5 months ago
#492 - Cudnn error cudnn_att.cpp on train_gptcu
Issue -
State: closed - Opened by maderix 6 months ago
- 5 comments
#424 - vectorized gemm loading and use register to hold the intermediate value
Pull Request -
State: closed - Opened by patricxu 6 months ago
#388 - Autodetect GPU compute capability using nvidia-smi.
Pull Request -
State: closed - Opened by akulchik 6 months ago
- 4 comments
#372 - How to do Inference on the trained weight of GPT 2 model after finishing the training on CPU using train_gpt2.py and train_gpt2 ?
Issue -
State: open - Opened by asifshaikat 6 months ago
- 1 comment
#366 - Assertion `graph->check_support(cudnn_handle).is_good()' failed
Issue -
State: open - Opened by wfoy 6 months ago
- 21 comments
#359 - Error: make: *** [Makefile:203: train_gpt2cu] Error 255
Issue -
State: open - Opened by yushengsu-thu 6 months ago
- 7 comments
#102 - some Rust error
Issue -
State: open - Opened by nyck33 7 months ago
#101 - Building on Windows
Pull Request -
State: closed - Opened by azret 7 months ago
- 2 comments
#100 - Use cudaHostMalloc for inputs/targets and cpu_losses
Pull Request -
State: closed - Opened by ademeure 7 months ago
- 1 comment
#99 - CUDA lossless compressible memory for activations
Pull Request -
State: open - Opened by ademeure 7 months ago
#98 - use cublaslt and optionally tf32, which fuses bias
Pull Request -
State: closed - Opened by karpathy 7 months ago
- 4 comments
#97 - fix typo in gpt2_build_from_checkpoint
Pull Request -
State: closed - Opened by 3DRX 7 months ago
#96 - Does it have an interactive mode like ChatGPT?
Issue -
State: closed - Opened by xhy2008 7 months ago
- 2 comments
#95 - Print total training time
Pull Request -
State: closed - Opened by krrishnarraj 7 months ago
- 1 comment
#94 - Suggested to add a check for the return value of Malloc
Issue -
State: closed - Opened by dududuguo 7 months ago
- 1 comment
#93 - output is not consistent when I load the gpt2_124M.bin
Issue -
State: open - Opened by kx-kexi 7 months ago
- 1 comment
#92 - Support older CUDA GPU hardware by default
Issue -
State: open - Opened by gel 7 months ago
- 3 comments
#91 - AI is Artificial Idiot
Issue -
State: closed - Opened by limaofu 7 months ago
- 1 comment
#90 - Add `decode_gpt2.c` for decoding in C
Pull Request -
State: closed - Opened by martin-liu 7 months ago
- 9 comments
#89 - ~2x perf improvement beating PyTorch (cublasLt, TF32, CUDA graphs, kernel fusion, etc…)
Pull Request -
State: open - Opened by ademeure 7 months ago
- 3 comments
#88 - AssertionError: Torch not compiled with CUDA enabled
Issue -
State: open - Opened by sandeepkumarsuresh 7 months ago
#87 - Use the command 'brew --prefix libomp' to retrieve the location where libomp would be installed on macOS.
Pull Request -
State: open - Opened by linmajia 7 months ago
- 1 comment
#86 - What else can I say, awesome
Issue -
State: closed - Opened by xsxz01 7 months ago
#85 - no CUDA-capable device is detected
Issue -
State: closed - Opened by rucnyz 7 months ago
- 4 comments
#83 - [Suggestion] Discussions tab for general help
Issue -
State: closed - Opened by AndreSlavescu 7 months ago
- 2 comments
#82 - cooperative groups and fused scale kernel
Pull Request -
State: closed - Opened by ngc92 7 months ago
- 1 comment
#81 - RuntimeError: must forward with targets before backward
Issue -
State: closed - Opened by 1997MarsRover 7 months ago
- 1 comment
#80 - Draft: Layer norm v2
Pull Request -
State: closed - Opened by ngc92 7 months ago
- 1 comment
#79 - Include the online softmax CPU code and a fully parallelized GPU kernal
Pull Request -
State: closed - Opened by lancerts 7 months ago
- 4 comments
#78 - correction du readme
Pull Request -
State: closed - Opened by dimaclara 7 months ago
#77 - LOSS MISMATCH AT STEP 0: 2.864161 5.270007
Issue -
State: open - Opened by dbl001 7 months ago
#76 - slightly faster gelu on smaller blocksize contexts
Pull Request -
State: open - Opened by AndreSlavescu 7 months ago
#75 - Include the online softmax CPU code and native port to GPU kernel
Pull Request -
State: closed - Opened by lancerts 7 months ago
#74 - :OMP: Error #15: Initializing libomp.dylib, but found libomp.dylib already initialized." Then it hangs at "python train_gpt2.py"
Issue -
State: closed - Opened by buffalobillhuang 7 months ago
#73 - AssertionError("Torch not compiled with CUDA enabled")
Issue -
State: closed - Opened by dbl001 7 months ago
- 1 comment
#72 - -O3 cannot go with -Ofast
Pull Request -
State: closed - Opened by Soldy 7 months ago
- 1 comment
#71 - Organize defined constants
Pull Request -
State: closed - Opened by modigeko 7 months ago
- 1 comment
#70 - A file not found error was encountered while compiling
Issue -
State: open - Opened by dzbbdawang 7 months ago
#69 - [build failed]Compiler encountered an internal error
Issue -
State: open - Opened by hhhaiai 7 months ago
- 3 comments
#68 - Improve numerical stability in loss calculation
Pull Request -
State: closed - Opened by poad42 7 months ago
- 2 comments
#67 - Fixed a TODO to calculate the max value neatly and use inv sum trick
Pull Request -
State: open - Opened by sirvan3tr 7 months ago
- 2 comments
#65 - looking forward supporting winx86-msvc
Issue -
State: open - Opened by miaomiao1992 7 months ago
- 2 comments
#64 - [train_gpt2.py] synchronize based on device
Pull Request -
State: closed - Opened by krrishnarraj 7 months ago
#63 - the provided PTX was compiled with an unsupported toolchain.
Issue -
State: open - Opened by bogan-FMA 7 months ago
- 3 comments
#62 - Add check for CUDA availability before synchronizing in train_gpt2.py
Pull Request -
State: closed - Opened by grepinsight 7 months ago
#61 - Fix repeated calculation on forward and back prop
Pull Request -
State: closed - Opened by ayushanshul07 7 months ago
#60 - Speedup `attention_forward_kernel2` by implementing Flash Attention 2 kernel
Pull Request -
State: open - Opened by leloykun 7 months ago
- 2 comments
#59 - Add CMake project for cross platform support and easier quick start setup
Pull Request -
State: closed - Opened by abuneri 7 months ago
#58 - fix typo in crossentropy_foward.cu
Pull Request -
State: closed - Opened by lancerts 7 months ago
#57 - Precompute the scaling factor in gelu_forward and gelu_backward
Issue -
State: closed - Opened by ryanmcdermott 7 months ago
- 4 comments
#56 - Detect OpenMP support - macOS Intel
Pull Request -
State: closed - Opened by scotthaleen 7 months ago
#55 - Add Dev Container Support for CPU and GPU
Pull Request -
State: open - Opened by lqdev 7 months ago
#54 - Fused bias with matmul using `cublasLtMatmul`
Issue -
State: closed - Opened by andylolu2 7 months ago
- 2 comments
#53 - readability updates: param_size calcs
Pull Request -
State: closed - Opened by jnros 7 months ago
- 1 comment
#52 - Clarify param_sizes calculation in gpt2_build_from_checkpoint()
Issue -
State: closed - Opened by jnros 7 months ago
- 1 comment
#51 - fully fused layer-norm kernel
Pull Request -
State: closed - Opened by ngc92 7 months ago
- 1 comment
#50 - Including venv/ to .gitignore and fixing typo
Pull Request -
State: open - Opened by arturodrt 7 months ago
#49 - Include thread coarsening factor for matmul kernal
Pull Request -
State: closed - Opened by lancerts 7 months ago
#48 - fix error in small typos in matmul_forward.cu
Pull Request -
State: closed - Opened by lancerts 7 months ago
- 1 comment
#47 - update layernorm.md
Pull Request -
State: closed - Opened by eltociear 7 months ago
#46 - Update README.md
Pull Request -
State: closed - Opened by 100apps 7 months ago
#45 - Add Python virtual environment notice
Pull Request -
State: closed - Opened by Cuda-Chen 7 months ago
#44 - Added the .gitignore file.
Pull Request -
State: closed - Opened by this-is-batman 7 months ago
- 4 comments
#43 - Add .gitignore to the project.
Issue -
State: closed - Opened by this-is-batman 7 months ago
#42 - Create LICENSE
Pull Request -
State: closed - Opened by zarlo 7 months ago
#41 - project license
Issue -
State: closed - Opened by zarlo 7 months ago
#40 - Support MPI distributed training
Issue -
State: open - Opened by sequoiar 7 months ago
- 6 comments
#39 - Suboptimal warp reductions
Issue -
State: open - Opened by IlyaGrebnov 7 months ago
#38 - fix the consistency of the transpose notation in matmul_foward.cu
Pull Request -
State: closed - Opened by lancerts 7 months ago
#37 - HIP support multigpu, AMD, Nvidia.
Pull Request -
State: closed - Opened by Avicted 7 months ago
- 1 comment
#36 - Generation error on MPS (Torch >= 2.2.0, MacOS 14.4)
Issue -
State: open - Opened by davmacario 7 months ago
- 8 comments
#35 - Bus ERROR while running `train_gpt2.py`
Issue -
State: open - Opened by Abdurrahheem 7 months ago
- 14 comments
#34 - Free the memory in layernorm.c
Pull Request -
State: closed - Opened by VinciGit00 7 months ago
#33 - fix a potential error: identifier M_PI is undefined in the gelu kernal
Pull Request -
State: closed - Opened by lancerts 7 months ago
#32 - Error: backward before forward
Issue -
State: closed - Opened by chsasank 7 months ago
- 3 comments
#31 - Why CUDA when we can SYCL
Issue -
State: open - Opened by chsasank 7 months ago
- 3 comments
#30 - when running python train_gpt2.py, errors out after 10 iteration -- is this normal?
Issue -
State: closed - Opened by JamesHuang2004 7 months ago
- 8 comments
#29 - Waiting for CUDA implement
Issue -
State: closed - Opened by namtranase 7 months ago
- 1 comment
#28 - Why not Mojo?
Issue -
State: open - Opened by blazickjp 7 months ago
- 13 comments
#27 - Update README.md
Pull Request -
State: closed - Opened by risingMantis 7 months ago
#26 - [Proposal] Implement GaLore trainer
Issue -
State: open - Opened by zhangchn 7 months ago
#25 - tweak: instead of using -10000.0f for finding the max, use the first item
Pull Request -
State: closed - Opened by NunoSempere 7 months ago
- 3 comments
#24 - enhanced tensor comparison with higher precision.
Pull Request -
State: closed - Opened by anurag12-webster 7 months ago
- 3 comments
#23 - fix: torch warning of python demo
Pull Request -
State: closed - Opened by rokku-c 7 months ago
#22 - Will it be a walkthrough tutorial on this?
Issue -
State: closed - Opened by simjak 7 months ago
- 1 comment
#21 - fix for Error: must forward with targets before backward [#19]
Pull Request -
State: closed - Opened by ent0n29 7 months ago
- 18 comments
#20 - Fix a typo
Pull Request -
State: closed - Opened by varunlakkur 7 months ago
- 1 comment
#19 - Error: must forward with targets before backward
Issue -
State: closed - Opened by lizhipengpeng 7 months ago
- 38 comments
#18 - write LLVM optimization passes for train_gpt2
Issue -
State: open - Opened by ent0n29 7 months ago
- 2 comments
#17 - error while running the makefile train_gpt2 on windows machine.
Issue -
State: closed - Opened by anurag12-webster 7 months ago
- 1 comment
#16 - format the layernorm doc
Pull Request -
State: closed - Opened by richzw 7 months ago
#15 - Include the pytorch layer_norm.cpp and layer_norm_kernel.cu code pointer in readme
Pull Request -
State: closed - Opened by lancerts 7 months ago
#14 - Using the compiler at hand
Pull Request -
State: closed - Opened by Ricardicus 7 months ago
- 2 comments