GitHub / turboderp/exllama issues and pull requests
#318 - Beating the (probably dead) Metal horse
Issue -
State: open - Opened by bleedingedgedebian 4 months ago
#317 - [WinError 183] Cannot create a file when that file already exists
Issue -
State: closed - Opened by 71cj34 7 months ago
- 1 comment
#315 - Run on CPU without AVX2
Issue -
State: open - Opened by ZanMax over 1 year ago
- 3 comments
#314 - piece id is out of range
Issue -
State: open - Opened by chethanwiz over 1 year ago
- 3 comments
#313 - ValueError: Unrecognized layer: lm_head.q_groups on a new install
Issue -
State: closed - Opened by Fuckingnameless over 1 year ago
- 2 comments
#312 - ERROR: Could not install packages due to an OSError: [Errno 13] Permission denied: '/home/exllama/env/lib/python3.11/site-packages/sentencepiece' Check the permissions.
Issue -
State: closed - Opened by Fuckingnameless over 1 year ago
#311 - updates since 0.0.11 causing code to not compile on Ubuntu (23.04, 23.10) with AMD HIP / ROCm ( 5.6 5.7 6.0 ... )
Issue -
State: closed - Opened by nktice over 1 year ago
- 3 comments
#310 - When will the bfloat16 type of GPTQ algorithm be supported?
Issue -
State: open - Opened by Kelang-Tian over 1 year ago
#309 - Does it support safetytensor formate?>
Issue -
State: open - Opened by lucasjinreal over 1 year ago
#308 - Error when using Beam Search
Issue -
State: open - Opened by bibekyess over 1 year ago
#307 - Occasionally RuntimeError
Issue -
State: open - Opened by leegohi04517 over 1 year ago
#306 - Using Exllama backend requires all the modules to be on GPU - how?
Issue -
State: open - Opened by tigerinus over 1 year ago
- 1 comment
#305 - Issue with How --gpu_split / -gs argument works.
Issue -
State: closed - Opened by JustinKunzi almost 2 years ago
- 2 comments
#304 - does the benchmark support batch size>1?
Issue -
State: closed - Opened by deltaguo almost 2 years ago
- 1 comment
#302 - test_inference.py : AttributeError: module 'exllamav2_ext' has no attribute 'rms_norm'
Issue -
State: closed - Opened by DFuller134 almost 2 years ago
- 1 comment
#301 - test_benchmark_inference.py broken?
Issue -
State: closed - Opened by 11415142513152119 almost 2 years ago
- 1 comment
#300 - llama_cpp_python_cuda is not a supported wheel on this platform
Issue -
State: closed - Opened by arif599 almost 2 years ago
- 1 comment
#299 - Changing hyper-parameters after initilization without reloading weights from disk.
Issue -
State: open - Opened by kmccleary3301 almost 2 years ago
#298 - finetuned Llama-2-7B-32K-Instruct-GPTQ only returns '\n'
Issue -
State: closed - Opened by Napuh almost 2 years ago
- 1 comment
#295 - Why can't the llama2 model output EOS id?
Issue -
State: closed - Opened by pangr almost 2 years ago
- 4 comments
#293 - doesn't use CUDA_HOME?
Issue -
State: open - Opened by j2l almost 2 years ago
#292 - list index out of range
Issue -
State: closed - Opened by j2l almost 2 years ago
- 1 comment
#291 - OSError: CUDA_HOME environment variable is not set.
Issue -
State: open - Opened by jamesbraza almost 2 years ago
- 8 comments
#290 - CodeLLaMA + LoRA: RuntimeError: CUDA error: an illegal memory access was encountered
Issue -
State: open - Opened by juanps90 almost 2 years ago
- 3 comments
#289 - GPU Inference from IPython
Issue -
State: open - Opened by Rajmehta123 almost 2 years ago
#288 - followed instructions with error
Issue -
State: open - Opened by hiqsociety almost 2 years ago
- 2 comments
#286 - is it too much of me to ask for an MPI option like llama.cpp?
Issue -
State: closed - Opened by hiqsociety almost 2 years ago
- 5 comments
#285 - exception about replacing the op q4_matmul_kernel
Issue -
State: closed - Opened by deltaguo almost 2 years ago
- 2 comments
#284 - phi-1.5 support?
Issue -
State: closed - Opened by SinanAkkoyun almost 2 years ago
- 5 comments
#283 - multi stoptoken
Pull Request -
State: closed - Opened by Kerushii almost 2 years ago
#281 - Multi-GPU issues
Issue -
State: open - Opened by nktice almost 2 years ago
- 9 comments
#280 - Support for Baichuan2 models
Issue -
State: open - Opened by bernardx almost 2 years ago
- 1 comment
#279 - Progress on the rewrite for older cards (Like the P40)
Issue -
State: open - Opened by TimyIsCool almost 2 years ago
- 1 comment
#278 - LoRA appears to not be used after the first run
Issue -
State: closed - Opened by technillogue almost 2 years ago
- 1 comment
#277 - Is Tesla T4 supported?
Issue -
State: closed - Opened by ivsanro1 almost 2 years ago
- 2 comments
#276 - Multi-GPU inference?
Issue -
State: closed - Opened by mbhenaff almost 2 years ago
- 1 comment
#275 - Optimize q4_matmul
Pull Request -
State: closed - Opened by QuarticCat almost 2 years ago
- 21 comments
#274 - remove tokens that exceed the max_seq_len
Issue -
State: open - Opened by p11188536 almost 2 years ago
- 1 comment
#273 - Completion abruptly stopped - RuntimeError: CUDA error: an illegal memory access was encountered
Issue -
State: open - Opened by Thireus almost 2 years ago
- 1 comment
#272 - YaRN Support
Issue -
State: open - Opened by grimulkan almost 2 years ago
- 8 comments
#270 - Codelama support
Issue -
State: open - Opened by ParisNeo almost 2 years ago
- 11 comments
#269 - Running Llama2 on multiple GPUs outputs gibberish
Issue -
State: closed - Opened by mirth almost 2 years ago
- 2 comments
#268 - Support for AMD ROCM
Issue -
State: open - Opened by yehowshuaradialrad almost 2 years ago
- 1 comment
#267 - Is it possible and efficient if load layer on demand?
Issue -
State: open - Opened by fahadh4ilyas almost 2 years ago
- 2 comments
#266 - Speed on A100
Issue -
State: open - Opened by Ber666 almost 2 years ago
- 4 comments
#265 - Optimize and extend ws example for chatborts
Pull Request -
State: closed - Opened by Kerushii almost 2 years ago
#264 - Any blogs on the project?
Issue -
State: open - Opened by qizzzh almost 2 years ago
#263 - Performance issues
Issue -
State: open - Opened by bryanhpchiang almost 2 years ago
- 3 comments
#262 - RoPE Frequency Base and Frequency Scale Support
Issue -
State: open - Opened by ChrisCates almost 2 years ago
- 3 comments
#261 - Codellama 16K context length?
Issue -
State: open - Opened by ShahZ181 almost 2 years ago
- 3 comments
#260 - Codellama support
Issue -
State: open - Opened by lucasjinreal almost 2 years ago
- 10 comments
#259 - Cache size below max_seq_len?
Issue -
State: closed - Opened by fahadh4ilyas almost 2 years ago
- 2 comments
#258 - Tried to build setup exllama but encountering ninja related errors, can someone please help me?
Issue -
State: open - Opened by BwandoWando almost 2 years ago
- 3 comments
#257 - stop-string support?
Issue -
State: open - Opened by krypterro almost 2 years ago
- 2 comments
#256 - Request: Some improvements to web app.py
Issue -
State: open - Opened by Midaychi almost 2 years ago
#255 - refine json dicts for ws example
Pull Request -
State: closed - Opened by Kerushii almost 2 years ago
#254 - Bad output for 2080 ti
Issue -
State: open - Opened by filipemesquita almost 2 years ago
- 2 comments
#253 - GPU Usage Keeps High Even Without Inference Load
Issue -
State: open - Opened by leonxia1018 almost 2 years ago
- 7 comments
#252 - Is it possible to do batch generate?
Issue -
State: open - Opened by fahadh4ilyas almost 2 years ago
- 7 comments
#251 - Are we *really* using nvlink?
Issue -
State: closed - Opened by Ph0rk0z almost 2 years ago
- 1 comment
#250 - recover unsaved modification
Pull Request -
State: closed - Opened by Kerushii almost 2 years ago
- 3 comments
#249 - ws example for streaming with context reuse and token testing
Pull Request -
State: closed - Opened by Kerushii almost 2 years ago
#248 - Custom multiple stop token (for roleplay / conversation)
Pull Request -
State: closed - Opened by wangerzi almost 2 years ago
- 6 comments
#245 - Possible to load model with low system ram?
Issue -
State: open - Opened by gros87 almost 2 years ago
- 4 comments
#244 - RuntimeError: temp_state buffer is too small
Issue -
State: closed - Opened by daniel-kukiela almost 2 years ago
- 1 comment
#243 - Modify generator.py > generate_simple to accept encode_special_characters?
Issue -
State: open - Opened by zmarty almost 2 years ago
- 1 comment
#242 - Header too large error when running benchmark
Issue -
State: closed - Opened by DKormann almost 2 years ago
- 2 comments
#241 - Is there a way to make compress_pos_emb dynamic?
Issue -
State: closed - Opened by fahadh4ilyas almost 2 years ago
- 2 comments
#240 - Can max_seq_len be set via CLI or GUI in webui?
Issue -
State: closed - Opened by int19h almost 2 years ago
- 2 comments
#238 - KV caching?
Issue -
State: open - Opened by bryanhpchiang almost 2 years ago
- 2 comments
#237 - Continuous Batching support
Issue -
State: open - Opened by FireMasterK almost 2 years ago
#236 - Generation uses config.max_seq_len instead of default 2048
Pull Request -
State: closed - Opened by flotos almost 2 years ago
- 1 comment
#235 - Question about example_flask.py
Issue -
State: open - Opened by ZeroYuJie almost 2 years ago
- 1 comment
#234 - Question about sampling and kernel fusion
Issue -
State: closed - Opened by sleepwalker2017 almost 2 years ago
- 6 comments
#233 - RuntimeError with airoboros-l2-13b
Issue -
State: closed - Opened by corv89 almost 2 years ago
- 2 comments
#232 - Strange output / doesn't make any sense
Issue -
State: closed - Opened by lordwebbie almost 2 years ago
- 5 comments
#231 - Slower tokens/s than expecting
Issue -
State: open - Opened by teknium1 almost 2 years ago
- 14 comments
#230 - Support for NF4?
Issue -
State: open - Opened by hoagy-davis-digges almost 2 years ago
- 1 comment
#223 - custom stop tokens in generator.py
Pull Request -
State: closed - Opened by Kerushii almost 2 years ago
- 1 comment
#221 - Llama 2 Chat implementation
Pull Request -
State: open - Opened by SinanAkkoyun almost 2 years ago
- 10 comments
#220 - Weird issue with context length
Issue -
State: open - Opened by zzzacwork almost 2 years ago
- 6 comments
#218 - Speculative decoding?
Issue -
State: open - Opened by bryanhpchiang almost 2 years ago
- 17 comments
#217 - Very bad response
Issue -
State: closed - Opened by pourfard almost 2 years ago
- 9 comments
#214 - Question about storing models in Container
Issue -
State: open - Opened by JacobGoldenArt almost 2 years ago
- 2 comments
#212 - [Feature Request] OpenAI-compatible API
Issue -
State: closed - Opened by langchain4j almost 2 years ago
- 11 comments
#202 - Latency grows substantially as batch size increases, even with small batch sizes
Issue -
State: open - Opened by joehoover about 2 years ago
- 2 comments
#192 - Exllama tutorials?
Issue -
State: open - Opened by NickDatLe about 2 years ago
- 23 comments
#180 - Illegal memory access when using a lora
Issue -
State: open - Opened by sampbarrow about 2 years ago
- 32 comments
#178 - A potential rotation inconsistency of Dynamically Scaled RoPE
Issue -
State: closed - Opened by NormXU about 2 years ago
- 3 comments
#172 - Add "min tokens" slider to webui
Pull Request -
State: closed - Opened by EyeDeck about 2 years ago
- 1 comment
#164 - Dual GPU setup on 13900k
Issue -
State: closed - Opened by SinanAkkoyun about 2 years ago
- 7 comments
#146 - Fix half2 with HIP
Pull Request -
State: open - Opened by Engininja2 about 2 years ago
- 9 comments
#145 - File not found when compiling exllama_ext
Issue -
State: closed - Opened by Flameish about 2 years ago
- 3 comments
#144 - Get nan error when prompt is more that 8 tokens.
Issue -
State: closed - Opened by dandm1 about 2 years ago
#126 - Trying to apply Dynamic NTK RoPE scaling into exllama.
Issue -
State: closed - Opened by Panchovix about 2 years ago
- 6 comments
#125 - Setup package
Pull Request -
State: closed - Opened by paolorechia about 2 years ago
- 5 comments
#120 - OOM/CUDA errors when running in batch mode?
Issue -
State: closed - Opened by nikshepsvn about 2 years ago
- 6 comments
#118 - (Experimental) Add support to NTK RoPE scaling
Pull Request -
State: closed - Opened by Panchovix about 2 years ago
- 6 comments
#117 - Issue when attempting to run exllama (P40)
Issue -
State: closed - Opened by wereretot about 2 years ago
- 14 comments
#116 - Should we add option to set seed?
Issue -
State: closed - Opened by bdqfork about 2 years ago
- 2 comments