turboderp/exllama issues and pull requests

#315 - Run on CPU without AVX2

Issue - State: open - Opened by ZanMax 8 months ago - 3 comments

#314 - piece id is out of range

Issue - State: open - Opened by chethanwiz 8 months ago - 3 comments

#313 - ValueError: Unrecognized layer: lm_head.q_groups on a new install

Issue - State: closed - Opened by Fuckingnameless 9 months ago - 2 comments

#312 - ERROR: Could not install packages due to an OSError: [Errno 13] Permission denied: '/home/exllama/env/lib/python3.11/site-packages/sentencepiece' Check the permissions.

Issue - State: closed - Opened by Fuckingnameless 9 months ago

#311 - updates since 0.0.11 causing code to not compile on Ubuntu (23.04, 23.10) with AMD HIP / ROCm ( 5.6 5.7 6.0 ... )

Issue - State: closed - Opened by nktice 11 months ago - 3 comments

#310 - When will the bfloat16 type of GPTQ algorithm be supported?

Issue - State: open - Opened by Kelang-Tian 12 months ago

#309 - Does it support safetytensor formate?>

Issue - State: open - Opened by lucasjinreal about 1 year ago

#308 - Error when using Beam Search

Issue - State: open - Opened by bibekyess about 1 year ago

#307 - Occasionally RuntimeError

Issue - State: open - Opened by leegohi04517 about 1 year ago

#306 - Using Exllama backend requires all the modules to be on GPU - how?

Issue - State: open - Opened by tigerinus about 1 year ago - 1 comment

#305 - Issue with How --gpu_split / -gs argument works.

Issue - State: closed - Opened by JustinKunzi about 1 year ago - 2 comments

#304 - does the benchmark support batch size>1?

Issue - State: closed - Opened by deltaguo about 1 year ago - 1 comment

#302 - test_inference.py : AttributeError: module 'exllamav2_ext' has no attribute 'rms_norm'

Issue - State: closed - Opened by DFuller134 about 1 year ago - 1 comment

#301 - test_benchmark_inference.py broken?

Issue - State: closed - Opened by 11415142513152119 about 1 year ago - 1 comment

#300 - llama_cpp_python_cuda is not a supported wheel on this platform

Issue - State: closed - Opened by arif599 about 1 year ago - 1 comment

#299 - Changing hyper-parameters after initilization without reloading weights from disk.

Issue - State: open - Opened by kmccleary3301 about 1 year ago

#298 - finetuned Llama-2-7B-32K-Instruct-GPTQ only returns '\n'

Issue - State: closed - Opened by Napuh about 1 year ago - 1 comment

#295 - Why can't the llama2 model output EOS id?

Issue - State: closed - Opened by pangr about 1 year ago - 4 comments

#293 - doesn't use CUDA_HOME?

Issue - State: open - Opened by j2l about 1 year ago

#292 - list index out of range

Issue - State: closed - Opened by j2l about 1 year ago - 1 comment

#291 - OSError: CUDA_HOME environment variable is not set.

Issue - State: open - Opened by jamesbraza about 1 year ago - 8 comments

#290 - CodeLLaMA + LoRA: RuntimeError: CUDA error: an illegal memory access was encountered

Issue - State: open - Opened by juanps90 about 1 year ago - 3 comments

#289 - GPU Inference from IPython

Issue - State: open - Opened by Rajmehta123 about 1 year ago

#288 - followed instructions with error

Issue - State: open - Opened by hiqsociety about 1 year ago - 2 comments

#286 - is it too much of me to ask for an MPI option like llama.cpp?

Issue - State: closed - Opened by hiqsociety about 1 year ago - 5 comments

#285 - exception about replacing the op q4_matmul_kernel

Issue - State: closed - Opened by deltaguo about 1 year ago - 2 comments

#284 - phi-1.5 support?

Issue - State: closed - Opened by SinanAkkoyun about 1 year ago - 5 comments

#283 - multi stoptoken

Pull Request - State: closed - Opened by Kerushii about 1 year ago

#281 - Multi-GPU issues

Issue - State: open - Opened by nktice about 1 year ago - 9 comments

#280 - Support for Baichuan2 models

Issue - State: open - Opened by bernardx about 1 year ago - 1 comment

#279 - Progress on the rewrite for older cards (Like the P40)

Issue - State: open - Opened by TimyIsCool about 1 year ago - 1 comment

#278 - LoRA appears to not be used after the first run

Issue - State: closed - Opened by technillogue about 1 year ago - 1 comment

#277 - Is Tesla T4 supported?

Issue - State: closed - Opened by ivsanro1 about 1 year ago - 2 comments

#276 - Multi-GPU inference?

Issue - State: closed - Opened by mbhenaff about 1 year ago - 1 comment

#275 - Optimize q4_matmul

Pull Request - State: closed - Opened by QuarticCat about 1 year ago - 21 comments

#274 - remove tokens that exceed the max_seq_len

Issue - State: open - Opened by p11188536 about 1 year ago - 1 comment

#273 - Completion abruptly stopped - RuntimeError: CUDA error: an illegal memory access was encountered

Issue - State: open - Opened by Thireus about 1 year ago - 1 comment

#272 - YaRN Support

Issue - State: open - Opened by grimulkan about 1 year ago - 8 comments

#270 - Codelama support

Issue - State: open - Opened by ParisNeo about 1 year ago - 11 comments

#269 - Running Llama2 on multiple GPUs outputs gibberish

Issue - State: closed - Opened by mirth over 1 year ago - 2 comments

#268 - Support for AMD ROCM

Issue - State: open - Opened by yehowshuaradialrad over 1 year ago - 1 comment

#267 - Is it possible and efficient if load layer on demand?

Issue - State: open - Opened by fahadh4ilyas over 1 year ago - 2 comments

#266 - Speed on A100

Issue - State: open - Opened by Ber666 over 1 year ago - 4 comments

#265 - Optimize and extend ws example for chatborts

Pull Request - State: closed - Opened by Kerushii over 1 year ago

#264 - Any blogs on the project?

Issue - State: open - Opened by qizzzh over 1 year ago

#263 - Performance issues

Issue - State: open - Opened by bryanhpchiang over 1 year ago - 3 comments

#262 - RoPE Frequency Base and Frequency Scale Support

Issue - State: open - Opened by ChrisCates over 1 year ago - 3 comments

#261 - Codellama 16K context length?

Issue - State: open - Opened by ShahZ181 over 1 year ago - 3 comments

#260 - Codellama support

Issue - State: open - Opened by lucasjinreal over 1 year ago - 10 comments

#259 - Cache size below max_seq_len?

Issue - State: closed - Opened by fahadh4ilyas over 1 year ago - 2 comments

#258 - Tried to build setup exllama but encountering ninja related errors, can someone please help me?

Issue - State: open - Opened by BwandoWando over 1 year ago - 3 comments

#257 - stop-string support?

Issue - State: open - Opened by krypterro over 1 year ago - 2 comments

#256 - Request: Some improvements to web app.py

Issue - State: open - Opened by Midaychi over 1 year ago

#255 - refine json dicts for ws example

Pull Request - State: closed - Opened by Kerushii over 1 year ago

#254 - Bad output for 2080 ti

Issue - State: open - Opened by filipemesquita over 1 year ago - 1 comment

#253 - GPU Usage Keeps High Even Without Inference Load

Issue - State: open - Opened by leonxia1018 over 1 year ago - 7 comments

#252 - Is it possible to do batch generate?

Issue - State: open - Opened by fahadh4ilyas over 1 year ago - 7 comments

#251 - Are we really using nvlink?

Issue - State: closed - Opened by Ph0rk0z over 1 year ago - 1 comment

#250 - recover unsaved modification

Pull Request - State: closed - Opened by Kerushii over 1 year ago - 3 comments

#249 - ws example for streaming with context reuse and token testing

Pull Request - State: closed - Opened by Kerushii over 1 year ago

#248 - Custom multiple stop token (for roleplay / conversation)

Pull Request - State: closed - Opened by wangerzi over 1 year ago - 6 comments

#245 - Possible to load model with low system ram?

Issue - State: open - Opened by gros87 over 1 year ago - 4 comments

#244 - RuntimeError: temp_state buffer is too small

Issue - State: closed - Opened by daniel-kukiela over 1 year ago - 1 comment

#243 - Modify generator.py > generate_simple to accept encode_special_characters?

Issue - State: open - Opened by zmarty over 1 year ago - 1 comment

#242 - Header too large error when running benchmark

Issue - State: closed - Opened by DKormann over 1 year ago - 2 comments

#241 - Is there a way to make compress_pos_emb dynamic?

Issue - State: closed - Opened by fahadh4ilyas over 1 year ago - 2 comments

#240 - Can max_seq_len be set via CLI or GUI in webui?

Issue - State: closed - Opened by int19h over 1 year ago - 2 comments

#238 - KV caching?

Issue - State: open - Opened by bryanhpchiang over 1 year ago - 2 comments

#237 - Continuous Batching support

Issue - State: open - Opened by FireMasterK over 1 year ago

#236 - Generation uses config.max_seq_len instead of default 2048

Pull Request - State: closed - Opened by flotos over 1 year ago - 1 comment

#235 - Question about example_flask.py

Issue - State: open - Opened by ZeroYuJie over 1 year ago - 1 comment

#234 - Question about sampling and kernel fusion

Issue - State: closed - Opened by sleepwalker2017 over 1 year ago - 6 comments

#233 - RuntimeError with airoboros-l2-13b

Issue - State: closed - Opened by corv89 over 1 year ago - 2 comments

#232 - Strange output / doesn't make any sense

Issue - State: closed - Opened by lordwebbie over 1 year ago - 5 comments

#231 - Slower tokens/s than expecting

Issue - State: open - Opened by teknium1 over 1 year ago - 14 comments

#230 - Support for NF4?

Issue - State: open - Opened by hoagy-davis-digges over 1 year ago - 1 comment

#223 - custom stop tokens in generator.py

Pull Request - State: closed - Opened by Kerushii over 1 year ago - 1 comment

#221 - Llama 2 Chat implementation

Pull Request - State: open - Opened by SinanAkkoyun over 1 year ago - 10 comments

#220 - Weird issue with context length

Issue - State: open - Opened by zzzacwork over 1 year ago - 6 comments

#218 - Speculative decoding?

Issue - State: open - Opened by bryanhpchiang over 1 year ago - 17 comments

#217 - Very bad response

Issue - State: closed - Opened by pourfard over 1 year ago - 9 comments

#214 - Question about storing models in Container

Issue - State: open - Opened by JacobGoldenArt over 1 year ago - 2 comments

#212 - [Feature Request] OpenAI-compatible API

Issue - State: closed - Opened by langchain4j over 1 year ago - 11 comments

#202 - Latency grows substantially as batch size increases, even with small batch sizes

Issue - State: open - Opened by joehoover over 1 year ago - 2 comments

#192 - Exllama tutorials?

Issue - State: open - Opened by NickDatLe over 1 year ago - 23 comments

#180 - Illegal memory access when using a lora

Issue - State: open - Opened by sampbarrow over 1 year ago - 32 comments

#178 - A potential rotation inconsistency of Dynamically Scaled RoPE

Issue - State: closed - Opened by NormXU over 1 year ago - 3 comments

#172 - Add "min tokens" slider to webui

Pull Request - State: closed - Opened by EyeDeck over 1 year ago - 1 comment

#164 - Dual GPU setup on 13900k

Issue - State: closed - Opened by SinanAkkoyun over 1 year ago - 7 comments

#146 - Fix half2 with HIP

Pull Request - State: open - Opened by Engininja2 over 1 year ago - 9 comments

#145 - File not found when compiling exllama_ext

Issue - State: closed - Opened by Flameish over 1 year ago - 3 comments

#144 - Get nan error when prompt is more that 8 tokens.

Issue - State: closed - Opened by dandm1 over 1 year ago

#126 - Trying to apply Dynamic NTK RoPE scaling into exllama.

Issue - State: closed - Opened by Panchovix over 1 year ago - 6 comments

#125 - Setup package

Pull Request - State: closed - Opened by paolorechia over 1 year ago - 5 comments

#120 - OOM/CUDA errors when running in batch mode?

Issue - State: closed - Opened by nikshepsvn over 1 year ago - 6 comments

#118 - (Experimental) Add support to NTK RoPE scaling

Pull Request - State: closed - Opened by Panchovix over 1 year ago - 6 comments

#117 - Issue when attempting to run exllama (P40)

Issue - State: closed - Opened by wereretot over 1 year ago - 14 comments

#116 - Should we add option to set seed?

Issue - State: closed - Opened by bdqfork over 1 year ago - 2 comments

#115 - NTK RoPE scaling.

Issue - State: closed - Opened by alkeryn over 1 year ago - 24 comments

#114 - Is there any way to support multiple parallel generation request to the same model?

Issue - State: closed - Opened by marcoripa96 over 1 year ago - 11 comments

GitHub / turboderp/exllama issues and pull requests