turboderp/exllamav2 issues and pull requests

#704 - [BUG] RoPE Scaling through Alpha

Issue - State: open - Opened by Vhallo about 1 month ago
Labels: bug

#703 - [REQUEST] Support for the new Command-r7b

Issue - State: open - Opened by ciprianveg about 1 month ago

#702 - [REQUEST] Sage Attention? Anyone tried it with exllama?

Issue - State: open - Opened by Ph0rk0z about 1 month ago - 2 comments

#701 - Modify handling for Pixtral Large model params

Pull Request - State: closed - Opened by nintwentydo about 1 month ago - 2 comments

#700 - [BUG] 1.0 bpw not possible?

Issue - State: closed - Opened by frenzybiscuit about 1 month ago - 3 comments
Labels: bug

#699 - [BUG] Formatron is not working with DynamicJob

Issue - State: closed - Opened by abpani about 1 month ago - 2 comments
Labels: bug

#698 - Fedora 41 support?

Issue - State: closed - Opened by frenzybiscuit about 2 months ago - 4 comments

#697 - [BUG] Qwen2.5-72B-2.xxbpw/Llama-70B-2.4bpw (maybe related to KV caching code) garbage output on some specific prompts.

Issue - State: open - Opened by Originalimoc about 2 months ago - 15 comments
Labels: bug

#696 - [BUG] lmformatenforcer integration seems to be broken on new versions

Issue - State: open - Opened by hvico about 2 months ago - 2 comments
Labels: bug

#695 - [REQUEST] EXAONE 3.5 Support

Issue - State: open - Opened by necrogay about 2 months ago - 2 comments

#694 - Prevent UnboundLocalError when loading with yarn/su with short ctx len

Pull Request - State: closed - Opened by DocShotgun about 2 months ago

#693 - Question on ExLlamaV2DynamicGenerator queue size?

Issue - State: closed - Opened by TheUniquePaulSmith about 2 months ago - 1 comment
Labels: bug

#692 - [BUG] Regression: Memory error when quantizing

Issue - State: closed - Opened by zpin about 2 months ago - 5 comments
Labels: bug

#690 - [BUG] ExLlamaV2DynamicGenerator class is not multiple threads supported

Issue - State: open - Opened by UTSAV-44 2 months ago - 4 comments
Labels: bug

#690 - [BUG] ExLlamaV2DynamicGenerator class is not multiple threads supported

Issue - State: open - Opened by UTSAV-44 2 months ago - 4 comments
Labels: bug

#689 - [BUG] `generator.iterate()` returns corrupted result objects in some cases

Issue - State: open - Opened by p-e-w 2 months ago - 8 comments
Labels: bug

#688 - Prevent NPE in `deallocate_pages`

Pull Request - State: closed - Opened by p-e-w 2 months ago - 2 comments

#688 - Prevent NPE in `deallocate_pages`

Pull Request - State: closed - Opened by p-e-w 2 months ago - 2 comments

#687 - [REQUEST] LLama3.2 Vison Support

Issue - State: closed - Opened by royallavanya140 2 months ago - 5 comments

#687 - [REQUEST] LLama3.2 Vison Support

Issue - State: closed - Opened by royallavanya140 2 months ago - 5 comments

#686 - [REQUEST] High throughput with large batch size

Issue - State: open - Opened by fzyzcjy 2 months ago - 5 comments

#686 - [REQUEST] High throughput with large batch size

Issue - State: open - Opened by fzyzcjy 2 months ago - 5 comments

#685 - [BUG] Speculative decoding regresses performance on 7900 xtx under ROCM

Issue - State: open - Opened by Mushoz 2 months ago - 1 comment
Labels: bug

#684 - [BUG] Having trouble with ExLlamaV2DynamicJobAsync

Issue - State: closed - Opened by bolaft 2 months ago - 2 comments
Labels: bug

#684 - [BUG] Having trouble with ExLlamaV2DynamicJobAsync

Issue - State: closed - Opened by bolaft 2 months ago - 2 comments
Labels: bug

#682 - qwen coder32b run on colab t4

Issue - State: open - Opened by werruww 2 months ago - 10 comments
Labels: bug

#682 - qwen coder32b run on colab t4

Issue - State: open - Opened by werruww 2 months ago - 10 comments
Labels: bug

#681 - An issue with gemma2-27b-it related to measurement

Issue - State: closed - Opened by antonovkz 2 months ago - 5 comments
Labels: bug

#680 - [BUG] RuntimeError: index 1000000000 is out of bounds

Issue - State: closed - Opened by xonfour 2 months ago - 4 comments
Labels: bug

#679 - [BUG] Very slow Generation with Paged Attention

Issue - State: closed - Opened by rjmehta1993 2 months ago - 6 comments
Labels: bug

#679 - [BUG] Very slow Generation with Paged Attention

Issue - State: closed - Opened by rjmehta1993 2 months ago - 6 comments
Labels: bug

#678 - [REQUEST] Passing cache to and from generate() function for use in loop

Issue - State: closed - Opened by cmunna0052 2 months ago - 2 comments

#677 - [BUG] Out of memory from a 2.4bpw 70B parameter model

Issue - State: closed - Opened by cmunna0052 2 months ago - 3 comments
Labels: bug

#676 - [BUG] Async with Paged Attention Reduces accuracy

Issue - State: closed - Opened by rjmehta1993 2 months ago - 8 comments
Labels: bug

#675 - [REQUEST] Can we have 1.0/1.5 bpw internally?

Issue - State: open - Opened by Originalimoc 2 months ago - 1 comment

#674 - [BUG] [Qwen] Draft model produce garbage output

Issue - State: open - Opened by Nepherpitou 3 months ago - 5 comments
Labels: bug

#673 - [REQUEST] Convert.py: Option to skip measurement when setting 8.0/8.0

Issue - State: open - Opened by Originalimoc 3 months ago - 1 comment

#672 - [REQUEST] Support for a Qwen based vision model

Issue - State: open - Opened by TyraVex 3 months ago - 6 comments

#669 - [REQUEST] Synthetic Data generation features

Issue - State: open - Opened by AstrisCantCode 3 months ago - 3 comments

#669 - [REQUEST] Synthetic Data generation features

Issue - State: open - Opened by AstrisCantCode 3 months ago - 3 comments

#665 - [BUG] How can we increase or reduce the cache size

Issue - State: closed - Opened by royallavanya140 3 months ago - 1 comment
Labels: bug

#658 - [REQUEST] Llama 3.2 Vision Support (or already exists?)

Issue - State: open - Opened by grimulkan 3 months ago - 13 comments

#658 - [REQUEST] Llama 3.2 Vision Support (or already exists?)

Issue - State: open - Opened by grimulkan 3 months ago - 13 comments

#657 - Implementation of logit threshold sampler and confidence breaker

Pull Request - State: open - Opened by anchortense 4 months ago

#657 - Implementation of logit threshold sampler and confidence breaker

Pull Request - State: open - Opened by anchortense 4 months ago

#656 - [BUG] Appending-Runtime-LoRA-weights

Issue - State: open - Opened by royallavanya140 4 months ago - 3 comments
Labels: bug

#628 - [BUG] Qwen 2.5 34B returns garbage at certain quantization levels, but not others

Issue - State: closed - Opened by Downtown-Case 4 months ago - 8 comments
Labels: bug

#611 - how can i solve this problem

Issue - State: closed - Opened by Sultan0ML 5 months ago - 1 comment

#604 - Async Stream Genenerator?

Issue - State: closed - Opened by KingBipo 5 months ago - 3 comments

#595 - Request for multi model support

Issue - State: closed - Opened by royallavanya140 5 months ago - 2 comments

#450 - Scaling inference throughput when increasing the batch size

Issue - State: open - Opened by lopuhin 9 months ago - 2 comments

#450 - Scaling inference throughput when increasing the batch size

Issue - State: open - Opened by lopuhin 9 months ago - 2 comments

#330 - Refactor token healing initialization.

Pull Request - State: open - Opened by bjj 12 months ago - 7 comments

#149 - Suggestion: allow different context lengths for draft model and main model in speculative sampling

Issue - State: closed - Opened by Antollo about 1 year ago - 4 comments

#149 - Suggestion: allow different context lengths for draft model and main model in speculative sampling

Issue - State: closed - Opened by Antollo about 1 year ago - 4 comments

#115 - Implement HyperAttention - Long-context Attention in Near-Linear Time: outperforms FlashAttention and offers up to 5x speedup on long contexts

Issue - State: closed - Opened by kabachuha over 1 year ago - 4 comments

#115 - Implement HyperAttention - Long-context Attention in Near-Linear Time: outperforms FlashAttention and offers up to 5x speedup on long contexts

Issue - State: closed - Opened by kabachuha over 1 year ago - 4 comments

#114 - Ninja build stopped

Issue - State: closed - Opened by jayeshthk over 1 year ago - 3 comments

#114 - Ninja build stopped

Issue - State: closed - Opened by jayeshthk over 1 year ago - 3 comments

#113 - Production of quantitative datasets and expansion of models

Issue - State: closed - Opened by venxzw over 1 year ago - 2 comments

#113 - Production of quantitative datasets and expansion of models

Issue - State: closed - Opened by venxzw over 1 year ago - 2 comments

#112 - support 8bit kv cache

Pull Request - State: closed - Opened by zgce over 1 year ago - 3 comments

#112 - support 8bit kv cache

Pull Request - State: closed - Opened by zgce over 1 year ago - 3 comments

#111 - Use Pytorch 2.1 for CUDA 11.8+ and ROCm builds

Pull Request - State: closed - Opened by jllllll over 1 year ago - 2 comments

#111 - Use Pytorch 2.1 for CUDA 11.8+ and ROCm builds

Pull Request - State: closed - Opened by jllllll over 1 year ago - 2 comments

#110 - Problem at _tsize

Issue - State: closed - Opened by ParisNeo over 1 year ago - 9 comments

#110 - Problem at _tsize

Issue - State: closed - Opened by ParisNeo over 1 year ago - 9 comments

#109 - Weird performance

Issue - State: closed - Opened by jianyuheng over 1 year ago - 2 comments

#109 - Weird performance

Issue - State: closed - Opened by jianyuheng over 1 year ago - 2 comments

#108 - Streaming and Stop tokens for speculative sampling

Issue - State: closed - Opened by CyberTimon over 1 year ago - 6 comments

#108 - Streaming and Stop tokens for speculative sampling

Issue - State: closed - Opened by CyberTimon over 1 year ago - 6 comments

#107 - Early perplexity broken on higher quants. Gibberish outputs.

Issue - State: closed - Opened by 11415142513152119 over 1 year ago - 5 comments

#106 - Added zephyr chatformat

Pull Request - State: closed - Opened by SinanAkkoyun over 1 year ago - 5 comments

#106 - Added zephyr chatformat

Pull Request - State: closed - Opened by SinanAkkoyun over 1 year ago - 5 comments

#105 - Error after the generation. AssertionError: Total sequence length exceeds cache size in model.forward

Issue - State: closed - Opened by Rajmehta123 over 1 year ago - 8 comments

#105 - Error after the generation. AssertionError: Total sequence length exceeds cache size in model.forward

Issue - State: closed - Opened by Rajmehta123 over 1 year ago - 8 comments

#104 - Check if Ampere GPUs or newer before using flash-attn

Pull Request - State: closed - Opened by oobabooga over 1 year ago - 2 comments

#104 - Check if Ampere GPUs or newer before using flash-attn

Pull Request - State: closed - Opened by oobabooga over 1 year ago - 2 comments

#103 - Sliding Attention Window

Issue - State: closed - Opened by anujnayyar1 over 1 year ago - 7 comments

#103 - Sliding Attention Window

Issue - State: closed - Opened by anujnayyar1 over 1 year ago - 7 comments

#102 - Batched Inference

Issue - State: closed - Opened by anujnayyar1 over 1 year ago - 1 comment

#102 - Batched Inference

Issue - State: closed - Opened by anujnayyar1 over 1 year ago - 1 comment

#101 - Endless flood of 'rfm max: x.xx bpw x.xx' on quant

Issue - State: closed - Opened by discordianbelle over 1 year ago - 3 comments

#101 - Endless flood of 'rfm max: x.xx bpw x.xx' on quant

Issue - State: closed - Opened by discordianbelle over 1 year ago - 3 comments

#100 - calling lora.unload() gives key error

Issue - State: closed - Opened by Ph0rk0z over 1 year ago - 4 comments

#99 - `regex` needs to be added to requirements/setup.py

Issue - State: closed - Opened by andrewgross over 1 year ago - 1 comment

#99 - `regex` needs to be added to requirements/setup.py

Issue - State: closed - Opened by andrewgross over 1 year ago - 1 comment

#98 - undefined symbol error during inference

Issue - State: closed - Opened by AmineDjeghri over 1 year ago - 11 comments

#98 - undefined symbol error during inference

Issue - State: closed - Opened by AmineDjeghri over 1 year ago - 11 comments

#97 - [BUG] CUDA error: invalid configuration argument /exllamav2/exllamav2/exllamav2_ext/cuda/rope.cu 131

Issue - State: closed - Opened by Facico over 1 year ago - 7 comments

#97 - [BUG] CUDA error: invalid configuration argument /exllamav2/exllamav2/exllamav2_ext/cuda/rope.cu 131

Issue - State: closed - Opened by Facico over 1 year ago - 7 comments

#96 - Feature Request: support for exllamav2 lora training

Issue - State: closed - Opened by LZY-the-boys over 1 year ago - 2 comments

#96 - Feature Request: support for exllamav2 lora training

Issue - State: closed - Opened by LZY-the-boys over 1 year ago - 2 comments

#95 - Parallel decoding

Issue - State: closed - Opened by nivibilla over 1 year ago - 11 comments

#95 - Parallel decoding

Issue - State: closed - Opened by nivibilla over 1 year ago - 11 comments

#93 - Error with most recent changes to Sampler

Issue - State: closed - Opened by Rajmehta123 over 1 year ago - 2 comments

#93 - Error with most recent changes to Sampler

Issue - State: closed - Opened by Rajmehta123 over 1 year ago - 2 comments

#92 - test_inference.py : AttributeError: module 'exllamav2_ext' has no attribute 'rms_norm'

Issue - State: closed - Opened by DFuller134 over 1 year ago - 13 comments

#92 - test_inference.py : AttributeError: module 'exllamav2_ext' has no attribute 'rms_norm'

Issue - State: closed - Opened by DFuller134 over 1 year ago - 13 comments

#91 - EXL2 quants at 4.65 bits in dual 3090 gpu´s

Issue - State: closed - Opened by jostack over 1 year ago - 4 comments

GitHub / turboderp/exllamav2 issues and pull requests