karpathy/llm.c issues and pull requests

#777 - Is there any way to make customized dataset?

Issue - State: closed - Opened by dongrixinyu about 1 month ago

#776 - Online Softmax is wrong

Issue - State: open - Opened by NoSavedDATA about 1 month ago

#775 - fix: false-positive check for nccl install on ubuntu

Pull Request - State: open - Opened by leiDnedyA about 1 month ago

#774 - Makefile incorrectly finds that `nccl` is installed for Linux systems with `libvncclclient`

Issue - State: open - Opened by leiDnedyA about 1 month ago

#773 - Activation Checkpointing for Llama3 branch

Pull Request - State: open - Opened by ademeure about 1 month ago

#772 - BF16 opt state (m/v) with stochastic rounding (Llama3 branch)

Pull Request - State: closed - Opened by ademeure about 1 month ago

#771 - Add `repkv_backward_kernel2` and `repkv_kernel2` (llama3 branch)

Pull Request - State: open - Opened by insop about 1 month ago - 2 comments

#769 - Fused rmsnorm reference

Pull Request - State: closed - Opened by gordicaleksa about 2 months ago

#768 - Problem when debugging cuda kernel functions.

Issue - State: closed - Opened by dongrixinyu about 2 months ago - 3 comments

#767 - Question concerning `float4`

Issue - State: open - Opened by dongrixinyu about 2 months ago

#766 - cuda mode -> gpu mode

Pull Request - State: closed - Opened by msaroufim about 2 months ago - 1 comment

#765 - -pm -> -pi: typo in error_usage

Pull Request - State: open - Opened by thundergolfer about 2 months ago

#764 - Adding backward kernel for repkv on `llama3` branch (cudamode-irl)

Pull Request - State: closed - Opened by insop about 2 months ago - 1 comment

#763 - rmsnorm backward simple baseline kernel

Pull Request - State: closed - Opened by ngc92 about 2 months ago

#762 - Micro optimization for `softmax_forward_kernel5`

Pull Request - State: open - Opened by insop about 2 months ago - 6 comments

#761 - [cudnn_frontend] Error: No execution plans support the graph.

Issue - State: open - Opened by Necktwi about 2 months ago - 2 comments

#760 - FP8 with Tensor Reorg

Pull Request - State: open - Opened by ademeure about 2 months ago

#759 - Will this repo update new documentation later?

Issue - State: open - Opened by dongrixinyu about 2 months ago

#758 - Update download_starter_pack.sh

Pull Request - State: open - Opened by dongrixinyu about 2 months ago

#757 - RMSNorm - WIP

Pull Request - State: closed - Opened by gordicaleksa about 2 months ago

#756 - Add RoPE positional encoding - llama3 feature branch

Pull Request - State: open - Opened by gordicaleksa about 2 months ago - 1 comment

#755 - Add SwiGLU support - llama3 feature branch

Pull Request - State: open - Opened by gordicaleksa about 2 months ago

#754 - add llama 3 support to llm.c

Pull Request - State: open - Opened by karpathy about 2 months ago

#753 - Adamw thread coarsening kernel

Pull Request - State: open - Opened by saladpalad 2 months ago

#752 - llm.c for inference

Issue - State: open - Opened by ztachip 2 months ago - 2 comments

#750 - implement rmsnorm in C

Pull Request - State: closed - Opened by Jake-Song 2 months ago

#749 - Error no instance of overloaded function "..." matches the argument list

Issue - State: closed - Opened by drzsdrtfg 3 months ago - 2 comments

#748 - Fix sizing typo in `train_gpt2_fp32.cu`

Pull Request - State: open - Opened by gajanan-choudhary 3 months ago - 2 comments

#747 - Can't train in FP16 on Turing

Issue - State: open - Opened by jafioti 3 months ago - 1 comment

#746 - log with LINE and FILE for better addressing.

Pull Request - State: open - Opened by NEWPLAN 3 months ago

#745 - feature/managed2

Pull Request - State: closed - Opened by karpathy 3 months ago

#744 - fix a typo

Pull Request - State: closed - Opened by dengl11 3 months ago

#743 - Re: Fixed modal script for updated cudnn version, and read errors

Pull Request - State: open - Opened by vyom1611 3 months ago

#742 - check libnccl instead of nccl to be more reliable

Pull Request - State: open - Opened by dengl11 3 months ago

#741 - [WIP] initial curand implementation for model init

Pull Request - State: open - Opened by ngc92 3 months ago - 1 comment

#740 - Gordicaleksa fix dataloader2

Pull Request - State: closed - Opened by karpathy 3 months ago

#739 - Suggestion: Test more Activation Functions

Issue - State: open - Opened by linux-leo 3 months ago

#738 - Improve compile time (simple makefile changes)

Pull Request - State: closed - Opened by ademeure 3 months ago - 2 comments

#737 - multi-threaded model initialization

Pull Request - State: open - Opened by ngc92 3 months ago - 1 comment

#736 - Fix llama 3 data loader

Pull Request - State: closed - Opened by gordicaleksa 3 months ago

#735 - Minor LLaMA 3 refactor

Pull Request - State: closed - Opened by gordicaleksa 3 months ago

#734 - Add external KV to LLaMA 3

Pull Request - State: open - Opened by gordicaleksa 3 months ago

#733 - Add llm.cpp(a port of this project using Eigen library, supporting CPU/CUDA), link to notable forks in readme

Pull Request - State: closed - Opened by zhangpiu 3 months ago

#732 - Add llm.cpp(a port of this project using Eigen library, supporting CPU/CUDA), link to notable forks in readme

Pull Request - State: closed - Opened by zhangpiu 3 months ago

#731 - Merge pull request #1 from karpathy/master

Pull Request - State: closed - Opened by invisiblepancake 3 months ago - 1 comment

#730 - Demo equivalence - tmp

Pull Request - State: closed - Opened by gordicaleksa 3 months ago

#729 - MPI run error

Issue - State: open - Opened by wzzanthony 3 months ago

#728 - add train_llama31.py

Pull Request - State: closed - Opened by karpathy 3 months ago

#727 - MPI run with 8 GPU fails

Issue - State: open - Opened by msharmavikram 3 months ago - 1 comment

#726 - Llama tmp

Pull Request - State: closed - Opened by gordicaleksa 3 months ago

#725 - Add LLaMA 3 Python support

Pull Request - State: closed - Opened by gordicaleksa 3 months ago

#724 - add llm.cpp(a port of this project featuring a tinytorch.hpp library) link to notable forks in readme

Pull Request - State: closed - Opened by GaoYusong 3 months ago

#723 - TypeError: normal_() got an unexpected keyword argument 'generator'

Issue - State: open - Opened by StarHtimE 3 months ago - 1 comment

#721 - Faster GELU forward & backward using MUFU.TANH for SM7.5+

Pull Request - State: open - Opened by ademeure 3 months ago - 4 comments

#718 - Add SwiGLU support

Pull Request - State: open - Opened by gordicaleksa 3 months ago - 1 comment

#717 - Nvidia management library for more detailed GPU state printing

Pull Request - State: closed - Opened by ngc92 3 months ago

#716 - chore(dev/cuda): use common utils in permute kernel

Pull Request - State: closed - Opened by mspronesti 4 months ago

#715 - Feature/restore from master

Pull Request - State: closed - Opened by karpathy 4 months ago

#714 - Add RoPE positional encoding

Pull Request - State: open - Opened by gordicaleksa 4 months ago - 1 comment

#713 - fix(dev/cuda): memory leaks

Pull Request - State: closed - Opened by mspronesti 4 months ago - 1 comment

#712 - Added permute kernel in dev/cuda

Pull Request - State: closed - Opened by indianspeedster 4 months ago - 1 comment

#711 - Outlier detection: catch more outliers by not updating moving average with skipped updates

Pull Request - State: open - Opened by ademeure 4 months ago - 1 comment

#710 - Different batch_size results in different evaluation loss.

Issue - State: open - Opened by iminfine 4 months ago

#709 - Allocate managed memory if device memory runs out

Pull Request - State: closed - Opened by ngc92 4 months ago

#708 - Add high perf mode

Pull Request - State: open - Opened by gordicaleksa 4 months ago - 2 comments

#707 - Add KV cache for inference

Pull Request - State: open - Opened by gordicaleksa 4 months ago

#706 - Fix the comment on tokenizer.h. Instead of saying 'we we' twice, remove one 'we'

Pull Request - State: closed - Opened by Madankh 4 months ago

#705 - Refactor C code

Pull Request - State: closed - Opened by gordicaleksa 4 months ago - 1 comment

#704 - add batch limit to 124m script to prevent infinite loop

Pull Request - State: open - Opened by varun-a10ai 4 months ago

#703 - Fix for upgraded Cuda 12.5.1 and Microsoft latest compiler

Pull Request - State: closed - Opened by rosslwheeler 4 months ago

#702 - Restore from master weights (& allow restoring from a checkpoint of different precision)

Pull Request - State: closed - Opened by ademeure 4 months ago

#701 - Larger Tokenizers

Issue - State: open - Opened by dustinwloring1988 4 months ago

#700 - Fix integer overflow by using `size_t` for parameter sizes.

Pull Request - State: closed - Opened by YuchenJin 4 months ago

#699 - Simplified/faster "backward bias" kernel (column reduction)

Pull Request - State: open - Opened by ademeure 4 months ago - 1 comment

#698 - Cache pip dependencies

Pull Request - State: closed - Opened by furkansahin 4 months ago

#697 - image-gpt

Issue - State: open - Opened by bil-ash 4 months ago - 1 comment

#696 - Major FP32 llm.c improvements/refactoring/etc.

Pull Request - State: open - Opened by ademeure 4 months ago

#695 - Suggestion: Use smollm corpus

Issue - State: open - Opened by linux-leo 4 months ago - 3 comments

#694 - Model init cleanup

Pull Request - State: closed - Opened by ngc92 4 months ago

#693 - Fixed modal script for updated cudnn version, and read errors

Pull Request - State: closed - Opened by vyom1611 4 months ago

#692 - Is Multi-GPU config enabled even when I'm using one GPU?

Issue - State: closed - Opened by BlaiseMuhirwa 4 months ago - 1 comment

#691 - Update README.md with prerequisite of libomp

Pull Request - State: open - Opened by nzhang 4 months ago

#689 - Refactor/code to zerocuh

Pull Request - State: closed - Opened by karpathy 4 months ago

#688 - feature/gpt3v1

Pull Request - State: closed - Opened by karpathy 4 months ago

#687 - Getting "Floating point exception (core dumped)" Error

Issue - State: open - Opened by alvins82 4 months ago - 4 comments

#686 - Added cudaCheck wherever missing.

Pull Request - State: closed - Opened by indianspeedster 4 months ago

#685 - Add nim port

Pull Request - State: closed - Opened by planetis-m 4 months ago

#684 - Adding CI check for exceeding loss tolerance

Pull Request - State: closed - Opened by rosslwheeler 4 months ago - 2 comments

#682 - Add a README link under related related projects for gpu.cpp under WebGPU C++

Pull Request - State: closed - Opened by austinvhuang 4 months ago

#681 - Fix small comment typo

Pull Request - State: closed - Opened by fluffyorang3 4 months ago

#680 - Add GPT3 model series

Pull Request - State: closed - Opened by ngc92 4 months ago - 1 comment

#679 - demo how to track activations without too much boilerplate code

Pull Request - State: open - Opened by ngc92 4 months ago

#678 - FP8 work in progress

Pull Request - State: open - Opened by ademeure 4 months ago

#675 - Add option to remove biases

Pull Request - State: open - Opened by gordicaleksa 4 months ago

#674 - move `set_zero_configs` into `zero.cuh`

Pull Request - State: closed - Opened by ngc92 4 months ago - 1 comment

#668 - Add Habana gaudi2 tpc kernel link

Pull Request - State: closed - Opened by abhilash1910 4 months ago

#667 - Fix eval dataloader div by zero for < 4 batch size

Pull Request - State: closed - Opened by gordicaleksa 4 months ago - 1 comment

#665 - zero-grad is async and part of backward call

Pull Request - State: closed - Opened by ngc92 4 months ago

#660 - Pretraining (with CPUs)

Issue - State: open - Opened by bitmarkcc 4 months ago - 5 comments

#655 - block-level stable adamw

Pull Request - State: open - Opened by ngc92 5 months ago

GitHub / karpathy/llm.c issues and pull requests