turboderp/exllama issues and pull requests

#318 - Beating the (probably dead) Metal horse

Issue - State: open - Opened by bleedingedgedebian 4 months ago

#317 - [WinError 183] Cannot create a file when that file already exists

Issue - State: closed - Opened by 71cj34 7 months ago - 1 comment

#315 - Run on CPU without AVX2

Issue - State: open - Opened by ZanMax over 1 year ago - 3 comments

#314 - piece id is out of range

Issue - State: open - Opened by chethanwiz over 1 year ago - 3 comments

#313 - ValueError: Unrecognized layer: lm_head.q_groups on a new install

Issue - State: closed - Opened by Fuckingnameless over 1 year ago - 2 comments

#312 - ERROR: Could not install packages due to an OSError: [Errno 13] Permission denied: '/home/exllama/env/lib/python3.11/site-packages/sentencepiece' Check the permissions.

Issue - State: closed - Opened by Fuckingnameless over 1 year ago

#311 - updates since 0.0.11 causing code to not compile on Ubuntu (23.04, 23.10) with AMD HIP / ROCm ( 5.6 5.7 6.0 ... )

Issue - State: closed - Opened by nktice over 1 year ago - 3 comments

#310 - When will the bfloat16 type of GPTQ algorithm be supported?

Issue - State: open - Opened by Kelang-Tian over 1 year ago

#309 - Does it support safetytensor formate?>

Issue - State: open - Opened by lucasjinreal over 1 year ago

#308 - Error when using Beam Search

Issue - State: open - Opened by bibekyess over 1 year ago

#307 - Occasionally RuntimeError

Issue - State: open - Opened by leegohi04517 over 1 year ago

#306 - Using Exllama backend requires all the modules to be on GPU - how?

Issue - State: open - Opened by tigerinus over 1 year ago - 1 comment

#305 - Issue with How --gpu_split / -gs argument works.

Issue - State: closed - Opened by JustinKunzi almost 2 years ago - 2 comments

#304 - does the benchmark support batch size>1?

Issue - State: closed - Opened by deltaguo almost 2 years ago - 1 comment

#302 - test_inference.py : AttributeError: module 'exllamav2_ext' has no attribute 'rms_norm'

Issue - State: closed - Opened by DFuller134 almost 2 years ago - 1 comment

#301 - test_benchmark_inference.py broken?

Issue - State: closed - Opened by 11415142513152119 almost 2 years ago - 1 comment

#300 - llama_cpp_python_cuda is not a supported wheel on this platform

Issue - State: closed - Opened by arif599 almost 2 years ago - 1 comment

#299 - Changing hyper-parameters after initilization without reloading weights from disk.

Issue - State: open - Opened by kmccleary3301 almost 2 years ago

#298 - finetuned Llama-2-7B-32K-Instruct-GPTQ only returns '\n'

Issue - State: closed - Opened by Napuh almost 2 years ago - 1 comment

#295 - Why can't the llama2 model output EOS id?

Issue - State: closed - Opened by pangr almost 2 years ago - 4 comments

#293 - doesn't use CUDA_HOME?

Issue - State: open - Opened by j2l almost 2 years ago

#292 - list index out of range

Issue - State: closed - Opened by j2l almost 2 years ago - 1 comment

#291 - OSError: CUDA_HOME environment variable is not set.

Issue - State: open - Opened by jamesbraza almost 2 years ago - 8 comments

#290 - CodeLLaMA + LoRA: RuntimeError: CUDA error: an illegal memory access was encountered

Issue - State: open - Opened by juanps90 almost 2 years ago - 3 comments

#289 - GPU Inference from IPython

Issue - State: open - Opened by Rajmehta123 almost 2 years ago

#288 - followed instructions with error

Issue - State: open - Opened by hiqsociety almost 2 years ago - 2 comments

#286 - is it too much of me to ask for an MPI option like llama.cpp?

Issue - State: closed - Opened by hiqsociety almost 2 years ago - 5 comments

#285 - exception about replacing the op q4_matmul_kernel

Issue - State: closed - Opened by deltaguo almost 2 years ago - 2 comments

#284 - phi-1.5 support?

Issue - State: closed - Opened by SinanAkkoyun almost 2 years ago - 5 comments

#283 - multi stoptoken

Pull Request - State: closed - Opened by Kerushii almost 2 years ago

#281 - Multi-GPU issues

Issue - State: open - Opened by nktice almost 2 years ago - 9 comments

#280 - Support for Baichuan2 models

Issue - State: open - Opened by bernardx almost 2 years ago - 1 comment

#279 - Progress on the rewrite for older cards (Like the P40)

Issue - State: open - Opened by TimyIsCool almost 2 years ago - 1 comment

#278 - LoRA appears to not be used after the first run

Issue - State: closed - Opened by technillogue almost 2 years ago - 1 comment

#277 - Is Tesla T4 supported?

Issue - State: closed - Opened by ivsanro1 almost 2 years ago - 2 comments

#276 - Multi-GPU inference?

Issue - State: closed - Opened by mbhenaff almost 2 years ago - 1 comment

#275 - Optimize q4_matmul

Pull Request - State: closed - Opened by QuarticCat almost 2 years ago - 21 comments

#274 - remove tokens that exceed the max_seq_len

Issue - State: open - Opened by p11188536 almost 2 years ago - 1 comment

#273 - Completion abruptly stopped - RuntimeError: CUDA error: an illegal memory access was encountered

Issue - State: open - Opened by Thireus almost 2 years ago - 1 comment

#272 - YaRN Support

Issue - State: open - Opened by grimulkan almost 2 years ago - 8 comments

#270 - Codelama support

Issue - State: open - Opened by ParisNeo almost 2 years ago - 11 comments

#269 - Running Llama2 on multiple GPUs outputs gibberish

Issue - State: closed - Opened by mirth almost 2 years ago - 2 comments

#268 - Support for AMD ROCM

Issue - State: open - Opened by yehowshuaradialrad almost 2 years ago - 1 comment

#267 - Is it possible and efficient if load layer on demand?

Issue - State: open - Opened by fahadh4ilyas almost 2 years ago - 2 comments

#266 - Speed on A100

Issue - State: open - Opened by Ber666 almost 2 years ago - 4 comments

#265 - Optimize and extend ws example for chatborts

Pull Request - State: closed - Opened by Kerushii almost 2 years ago

#264 - Any blogs on the project?

Issue - State: open - Opened by qizzzh almost 2 years ago

#263 - Performance issues

Issue - State: open - Opened by bryanhpchiang almost 2 years ago - 3 comments

#262 - RoPE Frequency Base and Frequency Scale Support

Issue - State: open - Opened by ChrisCates almost 2 years ago - 3 comments

#261 - Codellama 16K context length?

Issue - State: open - Opened by ShahZ181 almost 2 years ago - 3 comments

#260 - Codellama support

Issue - State: open - Opened by lucasjinreal almost 2 years ago - 10 comments

#259 - Cache size below max_seq_len?

Issue - State: closed - Opened by fahadh4ilyas almost 2 years ago - 2 comments

#258 - Tried to build setup exllama but encountering ninja related errors, can someone please help me?

Issue - State: open - Opened by BwandoWando almost 2 years ago - 3 comments

#257 - stop-string support?

Issue - State: open - Opened by krypterro almost 2 years ago - 2 comments

#256 - Request: Some improvements to web app.py

Issue - State: open - Opened by Midaychi almost 2 years ago

#255 - refine json dicts for ws example

Pull Request - State: closed - Opened by Kerushii almost 2 years ago

#254 - Bad output for 2080 ti

Issue - State: open - Opened by filipemesquita almost 2 years ago - 2 comments

#253 - GPU Usage Keeps High Even Without Inference Load

Issue - State: open - Opened by leonxia1018 almost 2 years ago - 7 comments

#252 - Is it possible to do batch generate?

Issue - State: open - Opened by fahadh4ilyas almost 2 years ago - 7 comments

#251 - Are we really using nvlink?

Issue - State: closed - Opened by Ph0rk0z almost 2 years ago - 1 comment

#250 - recover unsaved modification

Pull Request - State: closed - Opened by Kerushii almost 2 years ago - 3 comments

#249 - ws example for streaming with context reuse and token testing

Pull Request - State: closed - Opened by Kerushii almost 2 years ago

#248 - Custom multiple stop token (for roleplay / conversation)

Pull Request - State: closed - Opened by wangerzi almost 2 years ago - 6 comments

#245 - Possible to load model with low system ram?

Issue - State: open - Opened by gros87 almost 2 years ago - 4 comments

#244 - RuntimeError: temp_state buffer is too small

Issue - State: closed - Opened by daniel-kukiela almost 2 years ago - 1 comment

#243 - Modify generator.py > generate_simple to accept encode_special_characters?

Issue - State: open - Opened by zmarty almost 2 years ago - 1 comment

#242 - Header too large error when running benchmark

Issue - State: closed - Opened by DKormann almost 2 years ago - 2 comments

#241 - Is there a way to make compress_pos_emb dynamic?

Issue - State: closed - Opened by fahadh4ilyas almost 2 years ago - 2 comments

#240 - Can max_seq_len be set via CLI or GUI in webui?

Issue - State: closed - Opened by int19h almost 2 years ago - 2 comments

#238 - KV caching?

Issue - State: open - Opened by bryanhpchiang almost 2 years ago - 2 comments

#237 - Continuous Batching support

Issue - State: open - Opened by FireMasterK almost 2 years ago

#236 - Generation uses config.max_seq_len instead of default 2048

Pull Request - State: closed - Opened by flotos almost 2 years ago - 1 comment

#235 - Question about example_flask.py

Issue - State: open - Opened by ZeroYuJie almost 2 years ago - 1 comment

#234 - Question about sampling and kernel fusion

Issue - State: closed - Opened by sleepwalker2017 almost 2 years ago - 6 comments

#233 - RuntimeError with airoboros-l2-13b

Issue - State: closed - Opened by corv89 almost 2 years ago - 2 comments

#232 - Strange output / doesn't make any sense

Issue - State: closed - Opened by lordwebbie almost 2 years ago - 5 comments

#231 - Slower tokens/s than expecting

Issue - State: open - Opened by teknium1 almost 2 years ago - 14 comments

#230 - Support for NF4?

Issue - State: open - Opened by hoagy-davis-digges almost 2 years ago - 1 comment

#223 - custom stop tokens in generator.py

Pull Request - State: closed - Opened by Kerushii almost 2 years ago - 1 comment

#221 - Llama 2 Chat implementation

Pull Request - State: open - Opened by SinanAkkoyun almost 2 years ago - 10 comments

#220 - Weird issue with context length

Issue - State: open - Opened by zzzacwork almost 2 years ago - 6 comments

#218 - Speculative decoding?

Issue - State: open - Opened by bryanhpchiang almost 2 years ago - 17 comments

#217 - Very bad response

Issue - State: closed - Opened by pourfard almost 2 years ago - 9 comments

#214 - Question about storing models in Container

Issue - State: open - Opened by JacobGoldenArt almost 2 years ago - 2 comments

#212 - [Feature Request] OpenAI-compatible API

Issue - State: closed - Opened by langchain4j almost 2 years ago - 11 comments

#202 - Latency grows substantially as batch size increases, even with small batch sizes

Issue - State: open - Opened by joehoover about 2 years ago - 2 comments

#192 - Exllama tutorials?

Issue - State: open - Opened by NickDatLe about 2 years ago - 23 comments

#180 - Illegal memory access when using a lora

Issue - State: open - Opened by sampbarrow about 2 years ago - 32 comments

#178 - A potential rotation inconsistency of Dynamically Scaled RoPE

Issue - State: closed - Opened by NormXU about 2 years ago - 3 comments

#172 - Add "min tokens" slider to webui

Pull Request - State: closed - Opened by EyeDeck about 2 years ago - 1 comment

#164 - Dual GPU setup on 13900k

Issue - State: closed - Opened by SinanAkkoyun about 2 years ago - 7 comments

#146 - Fix half2 with HIP

Pull Request - State: open - Opened by Engininja2 about 2 years ago - 9 comments

#145 - File not found when compiling exllama_ext

Issue - State: closed - Opened by Flameish about 2 years ago - 3 comments

#144 - Get nan error when prompt is more that 8 tokens.

Issue - State: closed - Opened by dandm1 about 2 years ago

#126 - Trying to apply Dynamic NTK RoPE scaling into exllama.

Issue - State: closed - Opened by Panchovix about 2 years ago - 6 comments

#125 - Setup package

Pull Request - State: closed - Opened by paolorechia about 2 years ago - 5 comments

#120 - OOM/CUDA errors when running in batch mode?

Issue - State: closed - Opened by nikshepsvn about 2 years ago - 6 comments

#118 - (Experimental) Add support to NTK RoPE scaling

Pull Request - State: closed - Opened by Panchovix about 2 years ago - 6 comments

#117 - Issue when attempting to run exllama (P40)

Issue - State: closed - Opened by wereretot about 2 years ago - 14 comments

#116 - Should we add option to set seed?

Issue - State: closed - Opened by bdqfork about 2 years ago - 2 comments

GitHub / turboderp/exllama issues and pull requests