turboderp/exllamav2 issues and pull requests

#90 - Dimensions overflow bug

Issue - State: closed - Opened by grimulkan over 1 year ago - 5 comments

#90 - Dimensions overflow bug

Issue - State: closed - Opened by grimulkan over 1 year ago - 5 comments

#89 - Maximum bitrate

Issue - State: closed - Opened by grimulkan over 1 year ago - 3 comments

#89 - Maximum bitrate

Issue - State: closed - Opened by grimulkan over 1 year ago - 3 comments

#88 - RAM requirements on quantization

Issue - State: closed - Opened by Nikita-Sherstnev over 1 year ago - 2 comments

#88 - RAM requirements on quantization

Issue - State: closed - Opened by Nikita-Sherstnev over 1 year ago - 2 comments

#86 - Added ChatML format to chat.py

Pull Request - State: closed - Opened by SinanAkkoyun over 1 year ago - 7 comments

#86 - Added ChatML format to chat.py

Pull Request - State: closed - Opened by SinanAkkoyun over 1 year ago - 7 comments

#85 - GPU Peer Fix needs to come back

Issue - State: closed - Opened by andrewgross over 1 year ago - 8 comments

#85 - GPU Peer Fix needs to come back

Issue - State: closed - Opened by andrewgross over 1 year ago - 8 comments

#84 - Beam Search Implementation

Issue - State: open - Opened by ChrisCates over 1 year ago - 4 comments

#83 - test_inference.py is broken

Issue - State: closed - Opened by 11415142513152119 over 1 year ago - 16 comments

#82 - remake README+WS OPTIMIZE

Pull Request - State: closed - Opened by Kerushii over 1 year ago - 2 comments

#81 - Chat format: Recognize specified language and offloaded lexguessing to every newline

Pull Request - State: closed - Opened by SinanAkkoyun over 1 year ago - 2 comments

#81 - Chat format: Recognize specified language and offloaded lexguessing to every newline

Pull Request - State: closed - Opened by SinanAkkoyun over 1 year ago - 2 comments

#80 - Using Venv or Dockerisation

Issue - State: closed - Opened by bkutasi over 1 year ago - 5 comments

#79 - README+WS OPTIMIZE

Pull Request - State: closed - Opened by Kerushii over 1 year ago

#78 - WS update, readme puzzle

Pull Request - State: closed - Opened by Kerushii over 1 year ago

#76 - Exclude python caches from git repository

Pull Request - State: closed - Opened by SinanAkkoyun over 1 year ago

#75 - Fixed Speculative Generator

Pull Request - State: closed - Opened by SinanAkkoyun over 1 year ago - 2 comments

#74 - Fixed Speculative Generator

Pull Request - State: closed - Opened by SinanAkkoyun over 1 year ago - 1 comment

#72 - Feature request: Add support for phi-1_5 quantization

Issue - State: closed - Opened by Nikita-Sherstnev over 1 year ago - 1 comment

#72 - Feature request: Add support for phi-1_5 quantization

Issue - State: closed - Opened by Nikita-Sherstnev over 1 year ago - 1 comment

#71 - Code highlighting in chat CLI

Pull Request - State: closed - Opened by SinanAkkoyun over 1 year ago - 17 comments

#71 - Code highlighting in chat CLI

Pull Request - State: closed - Opened by SinanAkkoyun over 1 year ago - 17 comments

#69 - Feature request: Tail-free sampling

Issue - State: closed - Opened by jakaline-dev over 1 year ago - 2 comments

#69 - Feature request: Tail-free sampling

Issue - State: closed - Opened by jakaline-dev over 1 year ago - 2 comments

#68 - test_inference.py PPL evaluation for >4K context

Issue - State: closed - Opened by grimulkan over 1 year ago - 1 comment

#68 - test_inference.py PPL evaluation for >4K context

Issue - State: closed - Opened by grimulkan over 1 year ago - 1 comment

#64 - Exlv2 8.13bit max quants at very low context produce gibberish

Issue - State: closed - Opened by 11415142513152119 over 1 year ago

#64 - Exlv2 8.13bit max quants at very low context produce gibberish

Issue - State: closed - Opened by 11415142513152119 over 1 year ago

#63 - Support AwQ quantization in the future?

Issue - State: closed - Opened by yhyu13 over 1 year ago - 1 comment

#63 - Support AwQ quantization in the future?

Issue - State: closed - Opened by yhyu13 over 1 year ago - 1 comment

#61 - Build wheels with pre-compiled CUDA kernels

Pull Request - State: closed - Opened by jllllll over 1 year ago - 15 comments

#61 - Build wheels with pre-compiled CUDA kernels

Pull Request - State: closed - Opened by jllllll over 1 year ago - 15 comments

#60 - will it be possible to make something like this for diffusion?

Issue - State: closed - Opened by Shaistrong over 1 year ago - 2 comments

#60 - will it be possible to make something like this for diffusion?

Issue - State: closed - Opened by Shaistrong over 1 year ago - 2 comments

#59 - Stream Bug

Issue - State: closed - Opened by wangyu1997 over 1 year ago - 3 comments

#59 - Stream Bug

Issue - State: closed - Opened by wangyu1997 over 1 year ago - 3 comments

#58 - Grid number of GPTQ matrix reconstruction

Pull Request - State: closed - Opened by chu-tianxiang over 1 year ago - 1 comment

#58 - Grid number of GPTQ matrix reconstruction

Pull Request - State: closed - Opened by chu-tianxiang over 1 year ago - 1 comment

#57 - Possible bug in discarding overflow

Issue - State: closed - Opened by Antollo over 1 year ago - 1 comment

#57 - Possible bug in discarding overflow

Issue - State: closed - Opened by Antollo over 1 year ago - 1 comment

#56 - publish the releases to github

Issue - State: closed - Opened by happysalada over 1 year ago - 5 comments

#56 - publish the releases to github

Issue - State: closed - Opened by happysalada over 1 year ago - 5 comments

#54 - Feature request: encode special tokens

Issue - State: closed - Opened by vt404v2 over 1 year ago - 8 comments

#54 - Feature request: encode special tokens

Issue - State: closed - Opened by vt404v2 over 1 year ago - 8 comments

#53 - Feature request: typical_p

Issue - State: closed - Opened by vt404v2 over 1 year ago - 2 comments

#53 - Feature request: typical_p

Issue - State: closed - Opened by vt404v2 over 1 year ago - 2 comments

#52 - RuntimeError: start (0) + length (32032) exceeds dimension size (32001).

Issue - State: closed - Opened by Thireus over 1 year ago - 2 comments

#52 - RuntimeError: start (0) + length (32032) exceeds dimension size (32001).

Issue - State: closed - Opened by Thireus over 1 year ago - 2 comments

#51 - Pass stop words to a model

Issue - State: closed - Opened by Nikita-Sherstnev over 1 year ago - 2 comments

#51 - Pass stop words to a model

Issue - State: closed - Opened by Nikita-Sherstnev over 1 year ago - 2 comments

#50 - inference script starts producing bullshit at low temperatures

Issue - State: closed - Opened by alimadelshin over 1 year ago - 3 comments

#50 - inference script starts producing bullshit at low temperatures

Issue - State: closed - Opened by alimadelshin over 1 year ago - 3 comments

#49 - Does not access add_tokens config when creating config

Issue - State: closed - Opened by alimadelshin over 1 year ago - 9 comments

#49 - Does not access add_tokens config when creating config

Issue - State: closed - Opened by alimadelshin over 1 year ago - 9 comments

#47 - Llama 70B 2.5bpw does not fit in 24GB GPU

Issue - State: closed - Opened by Nikita-Sherstnev over 1 year ago - 10 comments

#47 - Llama 70B 2.5bpw does not fit in 24GB GPU

Issue - State: closed - Opened by Nikita-Sherstnev over 1 year ago - 10 comments

#46 - Ninja Build Error for ROCm

Issue - State: closed - Opened by lufixSch over 1 year ago - 11 comments

#46 - Ninja Build Error for ROCm

Issue - State: closed - Opened by lufixSch over 1 year ago - 11 comments

#45 - Error because of string in PyPI flash-attn 2.2.3.post2 version

Issue - State: closed - Opened by TehNomad over 1 year ago - 1 comment

#45 - Error because of string in PyPI flash-attn 2.2.3.post2 version

Issue - State: closed - Opened by TehNomad over 1 year ago - 1 comment

#44 - Quantization subtly broken recently?

Issue - State: closed - Opened by QM60 over 1 year ago - 9 comments

#44 - Quantization subtly broken recently?

Issue - State: closed - Opened by QM60 over 1 year ago - 9 comments

#43 - Chat work greate but inference broken. Why?

Issue - State: closed - Opened by 50Bytes-dev over 1 year ago - 3 comments

#43 - Chat work greate but inference broken. Why?

Issue - State: closed - Opened by 50Bytes-dev over 1 year ago - 3 comments

#42 - don't support for batch inference?

Issue - State: closed - Opened by LZY-the-boys over 1 year ago - 2 comments

#42 - don't support for batch inference?

Issue - State: closed - Opened by LZY-the-boys over 1 year ago - 2 comments

#41 - ninja compilation error - gcc11

Issue - State: closed - Opened by vasqu over 1 year ago - 5 comments

#41 - ninja compilation error - gcc11

Issue - State: closed - Opened by vasqu over 1 year ago - 5 comments

#40 - Tesla P40 performance is still very low.

Issue - State: closed - Opened by siriume over 1 year ago - 4 comments

#40 - Tesla P40 performance is still very low.

Issue - State: closed - Opened by siriume over 1 year ago - 4 comments

#39 - Script to convert model, run quant, and save measurements

Pull Request - State: closed - Opened by lonestriker over 1 year ago

#39 - Script to convert model, run quant, and save measurements

Pull Request - State: closed - Opened by lonestriker over 1 year ago

#38 - Rope scaling, length, measurement_length during EXL2 quantization

Issue - State: closed - Opened by grimulkan over 1 year ago - 4 comments

#38 - Rope scaling, length, measurement_length during EXL2 quantization

Issue - State: closed - Opened by grimulkan over 1 year ago - 4 comments

#37 - bpw calculation

Issue - State: closed - Opened by Chainfire over 1 year ago - 6 comments

#37 - bpw calculation

Issue - State: closed - Opened by Chainfire over 1 year ago - 6 comments

#36 - Integrating Medusa

Issue - State: closed - Opened by KaruroChori over 1 year ago - 2 comments

#35 - Run a 34b model with a 4080 (16gb VRAM)

Issue - State: closed - Opened by ScottSump over 1 year ago - 3 comments

#34 - convert_safetensors doesn't force existing GPU

Issue - State: closed - Opened by Chainfire over 1 year ago - 1 comment

#34 - convert_safetensors doesn't force existing GPU

Issue - State: closed - Opened by Chainfire over 1 year ago - 1 comment

#33 - ROCM: Garbadge output

Issue - State: closed - Opened by Jipok over 1 year ago - 46 comments

#33 - ROCM: Garbadge output

Issue - State: closed - Opened by Jipok over 1 year ago - 46 comments

#32 - What's the best way to train ext2?

Issue - State: closed - Opened by laoda513 over 1 year ago - 1 comment

#32 - What's the best way to train ext2?

Issue - State: closed - Opened by laoda513 over 1 year ago - 1 comment

#31 - Conversion: release CUDA cache after VRAM intensive quant blocks

Pull Request - State: closed - Opened by 19h over 1 year ago - 12 comments

#30 - Convert/Quantizer bf16 support

Issue - State: closed - Opened by Qubitium over 1 year ago - 7 comments

#29 - Calibration data format clarification

Issue - State: closed - Opened by Qubitium over 1 year ago - 5 comments

#29 - Calibration data format clarification

Issue - State: closed - Opened by Qubitium over 1 year ago - 5 comments

#28 - We should use exllama1 for GPTQ and exllama2 for exl2?

Issue - State: closed - Opened by BadisG over 1 year ago - 3 comments

#28 - We should use exllama1 for GPTQ and exllama2 for exl2?

Issue - State: closed - Opened by BadisG over 1 year ago - 3 comments

#27 - Repetitive output in NVidia Jetson Orin Nano

Issue - State: closed - Opened by EraldoMJunior over 1 year ago - 4 comments

#27 - Repetitive output in NVidia Jetson Orin Nano

Issue - State: closed - Opened by EraldoMJunior over 1 year ago - 4 comments

#25 - Conversion help

Issue - State: closed - Opened by Chainfire over 1 year ago - 4 comments

#25 - Conversion help

Issue - State: closed - Opened by Chainfire over 1 year ago - 4 comments

#24 - add comment on model.load() usage

Pull Request - State: closed - Opened by gojefferson over 1 year ago

#24 - add comment on model.load() usage

Pull Request - State: closed - Opened by gojefferson over 1 year ago

#23 - Add copilot server example

Pull Request - State: open - Opened by chenhunghan over 1 year ago - 9 comments

GitHub / turboderp/exllamav2 issues and pull requests