Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / turboderp/exllamav2 issues and pull requests
#704 - [BUG] RoPE Scaling through Alpha
Issue -
State: open - Opened by Vhallo about 1 month ago
Labels: bug
#703 - [REQUEST] Support for the new Command-r7b
Issue -
State: open - Opened by ciprianveg about 1 month ago
#702 - [REQUEST] Sage Attention? Anyone tried it with exllama?
Issue -
State: open - Opened by Ph0rk0z about 1 month ago
- 2 comments
#701 - Modify handling for Pixtral Large model params
Pull Request -
State: closed - Opened by nintwentydo about 1 month ago
- 2 comments
#700 - [BUG] 1.0 bpw not possible?
Issue -
State: closed - Opened by frenzybiscuit about 1 month ago
- 3 comments
Labels: bug
#699 - [BUG] Formatron is not working with DynamicJob
Issue -
State: closed - Opened by abpani about 1 month ago
- 2 comments
Labels: bug
#698 - Fedora 41 support?
Issue -
State: closed - Opened by frenzybiscuit about 2 months ago
- 4 comments
#697 - [BUG] Qwen2.5-72B-2.xxbpw/Llama-70B-2.4bpw (maybe related to KV caching code) garbage output on some specific prompts.
Issue -
State: open - Opened by Originalimoc about 2 months ago
- 15 comments
Labels: bug
#696 - [BUG] lmformatenforcer integration seems to be broken on new versions
Issue -
State: open - Opened by hvico about 2 months ago
- 2 comments
Labels: bug
#695 - [REQUEST] EXAONE 3.5 Support
Issue -
State: open - Opened by necrogay about 2 months ago
- 2 comments
#694 - Prevent UnboundLocalError when loading with yarn/su with short ctx len
Pull Request -
State: closed - Opened by DocShotgun about 2 months ago
#693 - Question on ExLlamaV2DynamicGenerator queue size?
Issue -
State: closed - Opened by TheUniquePaulSmith about 2 months ago
- 1 comment
Labels: bug
#692 - [BUG] Regression: Memory error when quantizing
Issue -
State: closed - Opened by zpin about 2 months ago
- 5 comments
Labels: bug
#690 - [BUG] ExLlamaV2DynamicGenerator class is not multiple threads supported
Issue -
State: open - Opened by UTSAV-44 2 months ago
- 4 comments
Labels: bug
#690 - [BUG] ExLlamaV2DynamicGenerator class is not multiple threads supported
Issue -
State: open - Opened by UTSAV-44 2 months ago
- 4 comments
Labels: bug
#689 - [BUG] `generator.iterate()` returns corrupted result objects in some cases
Issue -
State: open - Opened by p-e-w 2 months ago
- 8 comments
Labels: bug
#688 - Prevent NPE in `deallocate_pages`
Pull Request -
State: closed - Opened by p-e-w 2 months ago
- 2 comments
#688 - Prevent NPE in `deallocate_pages`
Pull Request -
State: closed - Opened by p-e-w 2 months ago
- 2 comments
#687 - [REQUEST] LLama3.2 Vison Support
Issue -
State: closed - Opened by royallavanya140 2 months ago
- 5 comments
#687 - [REQUEST] LLama3.2 Vison Support
Issue -
State: closed - Opened by royallavanya140 2 months ago
- 5 comments
#686 - [REQUEST] High throughput with large batch size
Issue -
State: open - Opened by fzyzcjy 2 months ago
- 5 comments
#686 - [REQUEST] High throughput with large batch size
Issue -
State: open - Opened by fzyzcjy 2 months ago
- 5 comments
#685 - [BUG] Speculative decoding regresses performance on 7900 xtx under ROCM
Issue -
State: open - Opened by Mushoz 2 months ago
- 1 comment
Labels: bug
#684 - [BUG] Having trouble with ExLlamaV2DynamicJobAsync
Issue -
State: closed - Opened by bolaft 2 months ago
- 2 comments
Labels: bug
#684 - [BUG] Having trouble with ExLlamaV2DynamicJobAsync
Issue -
State: closed - Opened by bolaft 2 months ago
- 2 comments
Labels: bug
#682 - qwen coder32b run on colab t4
Issue -
State: open - Opened by werruww 2 months ago
- 10 comments
Labels: bug
#682 - qwen coder32b run on colab t4
Issue -
State: open - Opened by werruww 2 months ago
- 10 comments
Labels: bug
#681 - An issue with gemma2-27b-it related to measurement
Issue -
State: closed - Opened by antonovkz 2 months ago
- 5 comments
Labels: bug
#680 - [BUG] RuntimeError: index 1000000000 is out of bounds
Issue -
State: closed - Opened by xonfour 2 months ago
- 4 comments
Labels: bug
#679 - [BUG] Very slow Generation with Paged Attention
Issue -
State: closed - Opened by rjmehta1993 2 months ago
- 6 comments
Labels: bug
#679 - [BUG] Very slow Generation with Paged Attention
Issue -
State: closed - Opened by rjmehta1993 2 months ago
- 6 comments
Labels: bug
#678 - [REQUEST] Passing cache to and from generate() function for use in loop
Issue -
State: closed - Opened by cmunna0052 2 months ago
- 2 comments
#677 - [BUG] Out of memory from a 2.4bpw 70B parameter model
Issue -
State: closed - Opened by cmunna0052 2 months ago
- 3 comments
Labels: bug
#676 - [BUG] Async with Paged Attention Reduces accuracy
Issue -
State: closed - Opened by rjmehta1993 2 months ago
- 8 comments
Labels: bug
#675 - [REQUEST] Can we have 1.0/1.5 bpw internally?
Issue -
State: open - Opened by Originalimoc 2 months ago
- 1 comment
#674 - [BUG] [Qwen] Draft model produce garbage output
Issue -
State: open - Opened by Nepherpitou 3 months ago
- 5 comments
Labels: bug
#673 - [REQUEST] Convert.py: Option to skip measurement when setting 8.0/8.0
Issue -
State: open - Opened by Originalimoc 3 months ago
- 1 comment
#672 - [REQUEST] Support for a Qwen based vision model
Issue -
State: open - Opened by TyraVex 3 months ago
- 6 comments
#669 - [REQUEST] Synthetic Data generation features
Issue -
State: open - Opened by AstrisCantCode 3 months ago
- 3 comments
#669 - [REQUEST] Synthetic Data generation features
Issue -
State: open - Opened by AstrisCantCode 3 months ago
- 3 comments
#665 - [BUG] How can we increase or reduce the cache size
Issue -
State: closed - Opened by royallavanya140 3 months ago
- 1 comment
Labels: bug
#658 - [REQUEST] Llama 3.2 Vision Support (or already exists?)
Issue -
State: open - Opened by grimulkan 3 months ago
- 13 comments
#658 - [REQUEST] Llama 3.2 Vision Support (or already exists?)
Issue -
State: open - Opened by grimulkan 3 months ago
- 13 comments
#657 - Implementation of logit threshold sampler and confidence breaker
Pull Request -
State: open - Opened by anchortense 4 months ago
#657 - Implementation of logit threshold sampler and confidence breaker
Pull Request -
State: open - Opened by anchortense 4 months ago
#656 - [BUG] Appending-Runtime-LoRA-weights
Issue -
State: open - Opened by royallavanya140 4 months ago
- 3 comments
Labels: bug
#628 - [BUG] Qwen 2.5 34B returns garbage at certain quantization levels, but not others
Issue -
State: closed - Opened by Downtown-Case 4 months ago
- 8 comments
Labels: bug
#611 - how can i solve this problem
Issue -
State: closed - Opened by Sultan0ML 5 months ago
- 1 comment
#604 - Async Stream Genenerator?
Issue -
State: closed - Opened by KingBipo 5 months ago
- 3 comments
#595 - Request for multi model support
Issue -
State: closed - Opened by royallavanya140 5 months ago
- 2 comments
#450 - Scaling inference throughput when increasing the batch size
Issue -
State: open - Opened by lopuhin 9 months ago
- 2 comments
#450 - Scaling inference throughput when increasing the batch size
Issue -
State: open - Opened by lopuhin 9 months ago
- 2 comments
#330 - Refactor token healing initialization.
Pull Request -
State: open - Opened by bjj 12 months ago
- 7 comments
#149 - Suggestion: allow different context lengths for draft model and main model in speculative sampling
Issue -
State: closed - Opened by Antollo about 1 year ago
- 4 comments
#149 - Suggestion: allow different context lengths for draft model and main model in speculative sampling
Issue -
State: closed - Opened by Antollo about 1 year ago
- 4 comments
#115 - Implement HyperAttention - Long-context Attention in Near-Linear Time: outperforms FlashAttention and offers up to 5x speedup on long contexts
Issue -
State: closed - Opened by kabachuha over 1 year ago
- 4 comments
#115 - Implement HyperAttention - Long-context Attention in Near-Linear Time: outperforms FlashAttention and offers up to 5x speedup on long contexts
Issue -
State: closed - Opened by kabachuha over 1 year ago
- 4 comments
#114 - Ninja build stopped
Issue -
State: closed - Opened by jayeshthk over 1 year ago
- 3 comments
#114 - Ninja build stopped
Issue -
State: closed - Opened by jayeshthk over 1 year ago
- 3 comments
#113 - Production of quantitative datasets and expansion of models
Issue -
State: closed - Opened by venxzw over 1 year ago
- 2 comments
#113 - Production of quantitative datasets and expansion of models
Issue -
State: closed - Opened by venxzw over 1 year ago
- 2 comments
#112 - support 8bit kv cache
Pull Request -
State: closed - Opened by zgce over 1 year ago
- 3 comments
#112 - support 8bit kv cache
Pull Request -
State: closed - Opened by zgce over 1 year ago
- 3 comments
#111 - Use Pytorch 2.1 for CUDA 11.8+ and ROCm builds
Pull Request -
State: closed - Opened by jllllll over 1 year ago
- 2 comments
#111 - Use Pytorch 2.1 for CUDA 11.8+ and ROCm builds
Pull Request -
State: closed - Opened by jllllll over 1 year ago
- 2 comments
#110 - Problem at _tsize
Issue -
State: closed - Opened by ParisNeo over 1 year ago
- 9 comments
#110 - Problem at _tsize
Issue -
State: closed - Opened by ParisNeo over 1 year ago
- 9 comments
#109 - Weird performance
Issue -
State: closed - Opened by jianyuheng over 1 year ago
- 2 comments
#109 - Weird performance
Issue -
State: closed - Opened by jianyuheng over 1 year ago
- 2 comments
#108 - Streaming and Stop tokens for speculative sampling
Issue -
State: closed - Opened by CyberTimon over 1 year ago
- 6 comments
#108 - Streaming and Stop tokens for speculative sampling
Issue -
State: closed - Opened by CyberTimon over 1 year ago
- 6 comments
#107 - Early perplexity broken on higher quants. Gibberish outputs.
Issue -
State: closed - Opened by 11415142513152119 over 1 year ago
- 5 comments
#106 - Added zephyr chatformat
Pull Request -
State: closed - Opened by SinanAkkoyun over 1 year ago
- 5 comments
#106 - Added zephyr chatformat
Pull Request -
State: closed - Opened by SinanAkkoyun over 1 year ago
- 5 comments
#105 - Error after the generation. AssertionError: Total sequence length exceeds cache size in model.forward
Issue -
State: closed - Opened by Rajmehta123 over 1 year ago
- 8 comments
#105 - Error after the generation. AssertionError: Total sequence length exceeds cache size in model.forward
Issue -
State: closed - Opened by Rajmehta123 over 1 year ago
- 8 comments
#104 - Check if Ampere GPUs or newer before using flash-attn
Pull Request -
State: closed - Opened by oobabooga over 1 year ago
- 2 comments
#104 - Check if Ampere GPUs or newer before using flash-attn
Pull Request -
State: closed - Opened by oobabooga over 1 year ago
- 2 comments
#103 - Sliding Attention Window
Issue -
State: closed - Opened by anujnayyar1 over 1 year ago
- 7 comments
#103 - Sliding Attention Window
Issue -
State: closed - Opened by anujnayyar1 over 1 year ago
- 7 comments
#102 - Batched Inference
Issue -
State: closed - Opened by anujnayyar1 over 1 year ago
- 1 comment
#102 - Batched Inference
Issue -
State: closed - Opened by anujnayyar1 over 1 year ago
- 1 comment
#101 - Endless flood of 'rfm max: x.xx bpw x.xx' on quant
Issue -
State: closed - Opened by discordianbelle over 1 year ago
- 3 comments
#101 - Endless flood of 'rfm max: x.xx bpw x.xx' on quant
Issue -
State: closed - Opened by discordianbelle over 1 year ago
- 3 comments
#100 - calling lora.unload() gives key error
Issue -
State: closed - Opened by Ph0rk0z over 1 year ago
- 4 comments
#99 - `regex` needs to be added to requirements/setup.py
Issue -
State: closed - Opened by andrewgross over 1 year ago
- 1 comment
#99 - `regex` needs to be added to requirements/setup.py
Issue -
State: closed - Opened by andrewgross over 1 year ago
- 1 comment
#98 - undefined symbol error during inference
Issue -
State: closed - Opened by AmineDjeghri over 1 year ago
- 11 comments
#98 - undefined symbol error during inference
Issue -
State: closed - Opened by AmineDjeghri over 1 year ago
- 11 comments
#97 - [BUG] CUDA error: invalid configuration argument /exllamav2/exllamav2/exllamav2_ext/cuda/rope.cu 131
Issue -
State: closed - Opened by Facico over 1 year ago
- 7 comments
#97 - [BUG] CUDA error: invalid configuration argument /exllamav2/exllamav2/exllamav2_ext/cuda/rope.cu 131
Issue -
State: closed - Opened by Facico over 1 year ago
- 7 comments
#96 - Feature Request: support for exllamav2 lora training
Issue -
State: closed - Opened by LZY-the-boys over 1 year ago
- 2 comments
#96 - Feature Request: support for exllamav2 lora training
Issue -
State: closed - Opened by LZY-the-boys over 1 year ago
- 2 comments
#95 - Parallel decoding
Issue -
State: closed - Opened by nivibilla over 1 year ago
- 11 comments
#95 - Parallel decoding
Issue -
State: closed - Opened by nivibilla over 1 year ago
- 11 comments
#93 - Error with most recent changes to Sampler
Issue -
State: closed - Opened by Rajmehta123 over 1 year ago
- 2 comments
#93 - Error with most recent changes to Sampler
Issue -
State: closed - Opened by Rajmehta123 over 1 year ago
- 2 comments
#92 - test_inference.py : AttributeError: module 'exllamav2_ext' has no attribute 'rms_norm'
Issue -
State: closed - Opened by DFuller134 over 1 year ago
- 13 comments
#92 - test_inference.py : AttributeError: module 'exllamav2_ext' has no attribute 'rms_norm'
Issue -
State: closed - Opened by DFuller134 over 1 year ago
- 13 comments
#91 - EXL2 quants at 4.65 bits in dual 3090 gpu´s
Issue -
State: closed - Opened by jostack over 1 year ago
- 4 comments