pytorch-labs/gpt-fast issues and pull requests

#220 - 对于源码没有给出的模型（如DeepSeek-R1-Distill-Qwen-1.5B）该如何使用这个方法？

Issue - State: open - Opened by yabuke 7 days ago

#219 - AttributeError: 'Tensor' object has no attribute 'mask_mod'

Issue - State: open - Opened by jiapingW about 1 month ago

#218 - Issue: Performance degradation with int8 quantization in multi-batch scenarios

Issue - State: open - Opened by kakarotzzz about 1 month ago - 2 comments

#217 - int4 quant broken right now?

Issue - State: open - Opened by jerryzh168 2 months ago - 3 comments

#216 - merge Quantize activation new int main

Pull Request - State: closed - Opened by shellmik 2 months ago - 2 comments

#215 - Llama 3.1 70B size mismatch for tok_embeddings.weight

Issue - State: open - Opened by albertbn 3 months ago

#214 - Has anyone run this code with bs>1 and speculatively?

Issue - State: open - Opened by deafTim 4 months ago

#213 - Mistake in 191 line if is_speculative=True generate.py ?

Issue - State: open - Opened by deafTim 4 months ago - 1 comment

#212 - [wip] entropy specdec

Pull Request - State: closed - Opened by stillmatic 4 months ago
Labels: CLA Signed

#211 - Error with meta-llama/Llama-3.2-1B

Issue - State: open - Opened by deafTim 4 months ago - 2 comments

#210 - Request for Smaller Model Options (~1B Parameters)

Issue - State: open - Opened by deafTim 4 months ago

#209 - Error with stories15M and stories110M

Issue - State: open - Opened by deafTim 4 months ago - 8 comments

#208 - Adding torchao apis to gpt-fast

Pull Request - State: open - Opened by HDCharles 4 months ago - 3 comments
Labels: CLA Signed

#207 - The Actual Throughput of int8 Quantization is Significantly Lower than Baseline on A100

Issue - State: closed - Opened by crhcrhcrhcrh 4 months ago - 4 comments

#206 - Support generation tasks for eval.py

Pull Request - State: open - Opened by mostafaelhoushi 5 months ago
Labels: CLA Signed

#205 - add huggingface-hub

Pull Request - State: open - Opened by kunschg 5 months ago
Labels: CLA Signed

#204 - update README.md

Pull Request - State: open - Opened by kunschg 5 months ago
Labels: CLA Signed

#203 - feat: add Llama-3.2-[1B/3B] support

Pull Request - State: open - Opened by stillmatic 5 months ago - 2 comments
Labels: CLA Signed

#202 - Fix Llama3 HF checkpoint converter

Pull Request - State: closed - Opened by yanboliang 5 months ago
Labels: CLA Signed

#201 - Ensemble

Pull Request - State: closed - Opened by mayank31398 5 months ago - 1 comment

#200 - Add support for llama 3.1 8B/70B

Pull Request - State: closed - Opened by yanboliang 6 months ago
Labels: CLA Signed

#199 - Support Llama-3.1-405B

Pull Request - State: closed - Opened by yanboliang 6 months ago
Labels: CLA Signed

#198 - Reasons for the poor effect of Speculative Sampling

Issue - State: open - Opened by JoeNan1 6 months ago - 1 comment

#197 - Fix docstring args names

Pull Request - State: closed - Opened by kit1980 6 months ago
Labels: CLA Signed

#196 - Integrate Flex Decoding

Pull Request - State: closed - Opened by BoyuanFeng 6 months ago - 4 comments
Labels: CLA Signed

#195 - Add Phi-3-mini-4k-instruct bfloat16/int8

Pull Request - State: open - Opened by makaveli10 6 months ago - 8 comments
Labels: CLA Signed

#194 - Activation quantization support

Issue - State: open - Opened by ayyoobimani 6 months ago - 1 comment

#193 - int4 quantization cpu fix

Pull Request - State: open - Opened by likholat 7 months ago - 3 comments
Labels: CLA Signed

#192 - flex_attention ver.

Pull Request - State: open - Opened by joydddd 7 months ago - 2 comments
Labels: CLA Signed

#191 - Update sdpa function with enable_gqa=True

Pull Request - State: open - Opened by jainapurva 7 months ago - 1 comment
Labels: CLA Signed

#190 - permute function in `convert_hf_checkpoint.py`

Issue - State: closed - Opened by Sohaib9920 7 months ago - 3 comments

#189 - trying to convert huggingface whisper model to pytorch

Issue - State: open - Opened by nullonesix 8 months ago - 1 comment

#188 - Support of FlashDecoding

Issue - State: closed - Opened by jianc99 8 months ago - 3 comments

#187 - Decouple int4 weight with serialized format

Pull Request - State: open - Opened by yanbing-j 8 months ago - 5 comments
Labels: CLA Signed

#186 - tokenizer.model

Issue - State: open - Opened by hasakikiki 8 months ago - 1 comment

#185 - It doesn't accelerate very well at L4

Issue - State: open - Opened by songh11 8 months ago - 1 comment

#184 - getting different acceptance prob when using `torch.compile` after making a small change.

Issue - State: open - Opened by kalradivyanshu 8 months ago

#183 - Question about the ENABLE_INTRA_NODE_COMM for speculative decoding

Issue - State: closed - Opened by jianc99 8 months ago - 9 comments

#182 - GGUF support?

Issue - State: open - Opened by yukiarimo 8 months ago

#181 - Fix rope base issue with llama 3

Pull Request - State: closed - Opened by VikParuchuri 8 months ago - 3 comments
Labels: CLA Signed

#180 - [WIP] Use DTensor-based tensor parallel

Pull Request - State: open - Opened by kwen2501 8 months ago
Labels: CLA Signed

#179 - `meta-llama/Meta-Llama-3-8B-Instruct` generates gibberish for long prompts

Issue - State: closed - Opened by griff4692 8 months ago - 5 comments

#178 - Update installation instructions in README.md

Pull Request - State: closed - Opened by Jokeren 8 months ago - 1 comment
Labels: CLA Signed

#177 - Hard-coded Llama-3 model name pattern matching breaks scripts/convert_hf_checkpoint.py

Issue - State: closed - Opened by ephremw 9 months ago

#176 - Update Grok-1 and DBRX support in README

Pull Request - State: closed - Opened by yanboliang 9 months ago
Labels: CLA Signed

#175 - Remove nn.Embedding layer from model size

Pull Request - State: closed - Opened by yanboliang 9 months ago
Labels: CLA Signed

#174 - [example] Add support for DBRX

Pull Request - State: open - Opened by yanboliang 10 months ago
Labels: CLA Signed

#173 - Throughput Benchmark Scripts

Issue - State: closed - Opened by HanGuo97 10 months ago - 2 comments

#172 - Missing Keys in state_dict

Issue - State: open - Opened by bjohn22 10 months ago - 2 comments

#171 - [example] Added (hacky) Grok1 support

Pull Request - State: open - Opened by Chillee 10 months ago - 2 comments
Labels: CLA Signed

#170 - Making TokenizerInterface more usable for the user's code.

Pull Request - State: open - Opened by Artyom17 10 months ago
Labels: CLA Signed

#169 - Unified Llama 3 (8b,70b) + Safetensors support

Pull Request - State: closed - Opened by nivibilla 10 months ago - 20 comments
Labels: CLA Signed

#168 - Unified llama 3 support.

Pull Request - State: closed - Opened by nivibilla 10 months ago - 1 comment

#167 - Tensor Parallel Inside notebook

Issue - State: open - Opened by nivibilla 10 months ago - 3 comments

#166 - Llama3 8b perf numbers on A100

Pull Request - State: closed - Opened by yanboliang 10 months ago
Labels: CLA Signed

#165 - mmap issue in bf16 of gpt-fast

Issue - State: open - Opened by yanbing-j 10 months ago - 1 comment

#164 - Remove used empty variable

Pull Request - State: open - Opened by yncxcw 10 months ago - 2 comments
Labels: CLA Signed

#163 - Add download script for tinyllamas

Pull Request - State: open - Opened by yiliu30 10 months ago - 2 comments
Labels: CLA Signed

#162 - Naming: n_local_heads -> n_kv_heads

Issue - State: open - Opened by ad8e 10 months ago

#161 - Optimize Int8 Woq for CPU

Pull Request - State: open - Opened by yanbing-j 10 months ago - 2 comments
Labels: CLA Signed

#160 - Input token length question

Issue - State: closed - Opened by kaizizzzzzz 10 months ago - 2 comments

#159 - Fixing quantize in int4 mode

Pull Request - State: open - Opened by Artyom17 10 months ago - 4 comments
Labels: CLA Signed

#158 - llama3 8B support, tiktoken tokenizer

Pull Request - State: closed - Opened by Artyom17 10 months ago - 21 comments
Labels: CLA Signed

#157 - fix input_pos shape in comment

Pull Request - State: open - Opened by YassineYousfi 10 months ago - 2 comments
Labels: CLA Signed

#156 - shape fix for gptq

Pull Request - State: closed - Opened by HDCharles 10 months ago
Labels: CLA Signed

#154 - INT4 quantization not working on MI210

Issue - State: closed - Opened by yafehlis 11 months ago - 2 comments

#153 - Fix compile_prefill to prevent CUDA error

Pull Request - State: open - Opened by PasserBy4 11 months ago - 2 comments
Labels: CLA Signed

#150 - Tiny Llamas Not Found

Issue - State: closed - Opened by guihao-liang 11 months ago - 2 comments

#149 - On the memory usage of `ConditionalFeedForward`

Issue - State: closed - Opened by carmocca 11 months ago - 4 comments

#145 - Add CPU support in mixtral-moe for int8 woq

Pull Request - State: closed - Opened by yanbing-j 11 months ago - 3 comments
Labels: CLA Signed

#144 - int8 Woq raise Codegen Error with `--compile_prefill`

Issue - State: open - Opened by yanbing-j 11 months ago - 4 comments

#141 - Fixing block size for Mistral-7B.

Pull Request - State: open - Opened by Artyom17 11 months ago - 1 comment
Labels: CLA Signed

#137 - CUDA error if enabling compile_prefill for quantization model (int8)

Issue - State: open - Opened by yanboliang 11 months ago - 8 comments

#134 - GGUF fp32/fp16 conversion to checkpoint

Pull Request - State: open - Opened by mergennachin 11 months ago - 1 comment
Labels: CLA Signed

#133 - Optimized the process of loading PyTorch state dictionaries, merging …

Pull Request - State: open - Opened by hvaria 11 months ago - 2 comments
Labels: CLA Signed

#131 - Update to use torch.nn.attention.sdpa_kernel

Pull Request - State: closed - Opened by yanboliang 11 months ago - 2 comments
Labels: CLA Signed

#129 - int4/int4-gptq support in Mixtral 8x7B

Issue - State: closed - Opened by yanbing-j 11 months ago - 4 comments

#125 - Int4 perplexity

Issue - State: closed - Opened by SinanAkkoyun 12 months ago - 2 comments

#124 - Can't quantize to int4 and can't compile on RTX2080Ti

Issue - State: closed - Opened by kaizizzzzzz 12 months ago - 3 comments

#119 - Updating requirements.txt and .gitignore

Pull Request - State: open - Opened by Artyom17 12 months ago
Labels: CLA Signed

#115 - [example] Added gemma support

Pull Request - State: open - Opened by Chillee 12 months ago - 4 comments
Labels: CLA Signed

#114 - Question about the gennerated code of `WeightOnlyInt8Linear`

Issue - State: open - Opened by feiyuvl 12 months ago - 6 comments

#112 - batching/dynamic batching

Issue - State: open - Opened by nivibilla 12 months ago - 2 comments

#103 - Remove unnecessary wrapper code

Pull Request - State: closed - Opened by HDCharles about 1 year ago
Labels: CLA Signed

#102 - [quant] Add int8 per token dynamic quant + int4 per group quant for ExecuTorch

Pull Request - State: closed - Opened by jerryzh168 about 1 year ago - 1 comment
Labels: CLA Signed

#101 - fixing over padding and GPTQ padding bug

Pull Request - State: closed - Opened by jerryzh168 about 1 year ago
Labels: CLA Signed

#100 - fixing over padding and GPTQ padding bug

Pull Request - State: closed - Opened by HDCharles about 1 year ago
Labels: CLA Signed

#99 - Bandwidth achieved for INT8 is much smaller than FP16

Issue - State: open - Opened by yafehlis about 1 year ago - 3 comments

#97 - fixing circular import

Pull Request - State: closed - Opened by HDCharles about 1 year ago - 1 comment
Labels: CLA Signed

#94 - pass@1 score extremely low using GPT-fast API

Issue - State: closed - Opened by yafehlis about 1 year ago - 3 comments

#93 - Fixes for eval and GPTQ after move to gpt-fast

Pull Request - State: closed - Opened by HDCharles about 1 year ago
Labels: CLA Signed

#92 - I try to speed up with llava,but this it slower then eager mode,why?

Issue - State: open - Opened by bleedingfight about 1 year ago - 1 comment

#91 - Updating eval for lm_eval 0.4 and 0.3

Pull Request - State: closed - Opened by HDCharles about 1 year ago
Labels: CLA Signed

#90 - Can GPT-Fast support larger batch sizes

Issue - State: closed - Opened by yetingqiaqia about 1 year ago - 3 comments

#89 - `eval.py` uses older version of lm_eval

Issue - State: closed - Opened by nairbv about 1 year ago - 1 comment

#88 - Size mismatch error occurs when loading models quantized by GPTQ

Issue - State: open - Opened by sdc17 about 1 year ago - 2 comments

#87 - RuntimeError: CUDA error: named symbol not found

Issue - State: open - Opened by ce1190222 about 1 year ago - 3 comments

#86 - How is llama-7b trained, what is the verification accuracy?

Issue - State: closed - Opened by frankxyy about 1 year ago - 2 comments

#85 - generate.py: remove duplicate if condition

Pull Request - State: closed - Opened by guoyejun about 1 year ago
Labels: CLA Signed

#84 - add int4 cpu support

Pull Request - State: closed - Opened by mingfeima about 1 year ago - 7 comments
Labels: CLA Signed

GitHub / pytorch-labs/gpt-fast issues and pull requests