Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / turboderp/exllama issues and pull requests
#315 - Run on CPU without AVX2
Issue -
State: open - Opened by ZanMax 8 months ago
- 3 comments
#314 - piece id is out of range
Issue -
State: open - Opened by chethanwiz 8 months ago
- 3 comments
#313 - ValueError: Unrecognized layer: lm_head.q_groups on a new install
Issue -
State: closed - Opened by Fuckingnameless 9 months ago
- 2 comments
#312 - ERROR: Could not install packages due to an OSError: [Errno 13] Permission denied: '/home/exllama/env/lib/python3.11/site-packages/sentencepiece' Check the permissions.
Issue -
State: closed - Opened by Fuckingnameless 9 months ago
#311 - updates since 0.0.11 causing code to not compile on Ubuntu (23.04, 23.10) with AMD HIP / ROCm ( 5.6 5.7 6.0 ... )
Issue -
State: closed - Opened by nktice 10 months ago
- 3 comments
#310 - When will the bfloat16 type of GPTQ algorithm be supported?
Issue -
State: open - Opened by Kelang-Tian 11 months ago
#309 - Does it support safetytensor formate?>
Issue -
State: open - Opened by lucasjinreal 12 months ago
#308 - Error when using Beam Search
Issue -
State: open - Opened by bibekyess about 1 year ago
#307 - Occasionally RuntimeError
Issue -
State: open - Opened by leegohi04517 about 1 year ago
#306 - Using Exllama backend requires all the modules to be on GPU - how?
Issue -
State: open - Opened by tigerinus about 1 year ago
- 1 comment
#305 - Issue with How --gpu_split / -gs argument works.
Issue -
State: closed - Opened by JustinKunzi about 1 year ago
- 2 comments
#304 - does the benchmark support batch size>1?
Issue -
State: closed - Opened by deltaguo about 1 year ago
- 1 comment
#302 - test_inference.py : AttributeError: module 'exllamav2_ext' has no attribute 'rms_norm'
Issue -
State: closed - Opened by DFuller134 about 1 year ago
- 1 comment
#301 - test_benchmark_inference.py broken?
Issue -
State: closed - Opened by 11415142513152119 about 1 year ago
- 1 comment
#300 - llama_cpp_python_cuda is not a supported wheel on this platform
Issue -
State: closed - Opened by arif599 about 1 year ago
- 1 comment
#299 - Changing hyper-parameters after initilization without reloading weights from disk.
Issue -
State: open - Opened by kmccleary3301 about 1 year ago
#298 - finetuned Llama-2-7B-32K-Instruct-GPTQ only returns '\n'
Issue -
State: closed - Opened by Napuh about 1 year ago
- 1 comment
#295 - Why can't the llama2 model output EOS id?
Issue -
State: closed - Opened by pangr about 1 year ago
- 4 comments
#293 - doesn't use CUDA_HOME?
Issue -
State: open - Opened by j2l about 1 year ago
#292 - list index out of range
Issue -
State: closed - Opened by j2l about 1 year ago
- 1 comment
#291 - OSError: CUDA_HOME environment variable is not set.
Issue -
State: open - Opened by jamesbraza about 1 year ago
- 8 comments
#290 - CodeLLaMA + LoRA: RuntimeError: CUDA error: an illegal memory access was encountered
Issue -
State: open - Opened by juanps90 about 1 year ago
- 3 comments
#289 - GPU Inference from IPython
Issue -
State: open - Opened by Rajmehta123 about 1 year ago
#288 - followed instructions with error
Issue -
State: open - Opened by hiqsociety about 1 year ago
- 2 comments
#286 - is it too much of me to ask for an MPI option like llama.cpp?
Issue -
State: closed - Opened by hiqsociety about 1 year ago
- 5 comments
#285 - exception about replacing the op q4_matmul_kernel
Issue -
State: closed - Opened by deltaguo about 1 year ago
- 2 comments
#284 - phi-1.5 support?
Issue -
State: closed - Opened by SinanAkkoyun about 1 year ago
- 5 comments
#283 - multi stoptoken
Pull Request -
State: closed - Opened by Kerushii about 1 year ago
#281 - Multi-GPU issues
Issue -
State: open - Opened by nktice about 1 year ago
- 9 comments
#280 - Support for Baichuan2 models
Issue -
State: open - Opened by bernardx about 1 year ago
- 1 comment
#279 - Progress on the rewrite for older cards (Like the P40)
Issue -
State: open - Opened by TimyIsCool about 1 year ago
- 1 comment
#278 - LoRA appears to not be used after the first run
Issue -
State: closed - Opened by technillogue about 1 year ago
- 1 comment
#277 - Is Tesla T4 supported?
Issue -
State: closed - Opened by ivsanro1 about 1 year ago
- 2 comments
#276 - Multi-GPU inference?
Issue -
State: closed - Opened by mbhenaff about 1 year ago
- 1 comment
#275 - Optimize q4_matmul
Pull Request -
State: closed - Opened by QuarticCat about 1 year ago
- 21 comments
#274 - remove tokens that exceed the max_seq_len
Issue -
State: open - Opened by p11188536 about 1 year ago
- 1 comment
#273 - Completion abruptly stopped - RuntimeError: CUDA error: an illegal memory access was encountered
Issue -
State: open - Opened by Thireus about 1 year ago
- 1 comment
#272 - YaRN Support
Issue -
State: open - Opened by grimulkan about 1 year ago
- 8 comments
#270 - Codelama support
Issue -
State: open - Opened by ParisNeo about 1 year ago
- 11 comments
#269 - Running Llama2 on multiple GPUs outputs gibberish
Issue -
State: closed - Opened by mirth about 1 year ago
- 2 comments
#268 - Support for AMD ROCM
Issue -
State: open - Opened by yehowshuaradialrad about 1 year ago
- 1 comment
#267 - Is it possible and efficient if load layer on demand?
Issue -
State: open - Opened by fahadh4ilyas about 1 year ago
- 2 comments
#266 - Speed on A100
Issue -
State: open - Opened by Ber666 about 1 year ago
- 4 comments
#265 - Optimize and extend ws example for chatborts
Pull Request -
State: closed - Opened by Kerushii about 1 year ago
#264 - Any blogs on the project?
Issue -
State: open - Opened by qizzzh about 1 year ago
#263 - Performance issues
Issue -
State: open - Opened by bryanhpchiang about 1 year ago
- 3 comments
#262 - RoPE Frequency Base and Frequency Scale Support
Issue -
State: open - Opened by ChrisCates about 1 year ago
- 3 comments
#261 - Codellama 16K context length?
Issue -
State: open - Opened by ShahZ181 about 1 year ago
- 3 comments
#260 - Codellama support
Issue -
State: open - Opened by lucasjinreal over 1 year ago
- 10 comments
#259 - Cache size below max_seq_len?
Issue -
State: closed - Opened by fahadh4ilyas over 1 year ago
- 2 comments
#258 - Tried to build setup exllama but encountering ninja related errors, can someone please help me?
Issue -
State: open - Opened by BwandoWando over 1 year ago
- 3 comments
#257 - stop-string support?
Issue -
State: open - Opened by krypterro over 1 year ago
- 2 comments
#256 - Request: Some improvements to web app.py
Issue -
State: open - Opened by Midaychi over 1 year ago
#255 - refine json dicts for ws example
Pull Request -
State: closed - Opened by Kerushii over 1 year ago
#254 - Bad output for 2080 ti
Issue -
State: open - Opened by filipemesquita over 1 year ago
- 1 comment
#253 - GPU Usage Keeps High Even Without Inference Load
Issue -
State: open - Opened by leonxia1018 over 1 year ago
- 7 comments
#252 - Is it possible to do batch generate?
Issue -
State: open - Opened by fahadh4ilyas over 1 year ago
- 7 comments
#251 - Are we *really* using nvlink?
Issue -
State: closed - Opened by Ph0rk0z over 1 year ago
- 1 comment
#250 - recover unsaved modification
Pull Request -
State: closed - Opened by Kerushii over 1 year ago
- 3 comments
#249 - ws example for streaming with context reuse and token testing
Pull Request -
State: closed - Opened by Kerushii over 1 year ago
#248 - Custom multiple stop token (for roleplay / conversation)
Pull Request -
State: closed - Opened by wangerzi over 1 year ago
- 6 comments
#245 - Possible to load model with low system ram?
Issue -
State: open - Opened by gros87 over 1 year ago
- 4 comments
#244 - RuntimeError: temp_state buffer is too small
Issue -
State: closed - Opened by daniel-kukiela over 1 year ago
- 1 comment
#243 - Modify generator.py > generate_simple to accept encode_special_characters?
Issue -
State: open - Opened by zmarty over 1 year ago
- 1 comment
#242 - Header too large error when running benchmark
Issue -
State: closed - Opened by DKormann over 1 year ago
- 2 comments
#241 - Is there a way to make compress_pos_emb dynamic?
Issue -
State: closed - Opened by fahadh4ilyas over 1 year ago
- 2 comments
#240 - Can max_seq_len be set via CLI or GUI in webui?
Issue -
State: closed - Opened by int19h over 1 year ago
- 2 comments
#238 - KV caching?
Issue -
State: open - Opened by bryanhpchiang over 1 year ago
- 2 comments
#237 - Continuous Batching support
Issue -
State: open - Opened by FireMasterK over 1 year ago
#236 - Generation uses config.max_seq_len instead of default 2048
Pull Request -
State: closed - Opened by flotos over 1 year ago
- 1 comment
#235 - Question about example_flask.py
Issue -
State: open - Opened by ZeroYuJie over 1 year ago
- 1 comment
#234 - Question about sampling and kernel fusion
Issue -
State: closed - Opened by sleepwalker2017 over 1 year ago
- 6 comments
#233 - RuntimeError with airoboros-l2-13b
Issue -
State: closed - Opened by corv89 over 1 year ago
- 2 comments
#232 - Strange output / doesn't make any sense
Issue -
State: closed - Opened by lordwebbie over 1 year ago
- 5 comments
#231 - Slower tokens/s than expecting
Issue -
State: open - Opened by teknium1 over 1 year ago
- 14 comments
#230 - Support for NF4?
Issue -
State: open - Opened by hoagy-davis-digges over 1 year ago
- 1 comment
#223 - custom stop tokens in generator.py
Pull Request -
State: closed - Opened by Kerushii over 1 year ago
- 1 comment
#221 - Llama 2 Chat implementation
Pull Request -
State: open - Opened by SinanAkkoyun over 1 year ago
- 10 comments
#220 - Weird issue with context length
Issue -
State: open - Opened by zzzacwork over 1 year ago
- 6 comments
#218 - Speculative decoding?
Issue -
State: open - Opened by bryanhpchiang over 1 year ago
- 17 comments
#217 - Very bad response
Issue -
State: closed - Opened by pourfard over 1 year ago
- 9 comments
#214 - Question about storing models in Container
Issue -
State: open - Opened by JacobGoldenArt over 1 year ago
- 2 comments
#212 - [Feature Request] OpenAI-compatible API
Issue -
State: closed - Opened by langchain4j over 1 year ago
- 11 comments
#202 - Latency grows substantially as batch size increases, even with small batch sizes
Issue -
State: open - Opened by joehoover over 1 year ago
- 2 comments
#192 - Exllama tutorials?
Issue -
State: open - Opened by NickDatLe over 1 year ago
- 23 comments
#180 - Illegal memory access when using a lora
Issue -
State: open - Opened by sampbarrow over 1 year ago
- 32 comments
#178 - A potential rotation inconsistency of Dynamically Scaled RoPE
Issue -
State: closed - Opened by NormXU over 1 year ago
- 3 comments
#172 - Add "min tokens" slider to webui
Pull Request -
State: closed - Opened by EyeDeck over 1 year ago
- 1 comment
#164 - Dual GPU setup on 13900k
Issue -
State: closed - Opened by SinanAkkoyun over 1 year ago
- 7 comments
#146 - Fix half2 with HIP
Pull Request -
State: open - Opened by Engininja2 over 1 year ago
- 9 comments
#145 - File not found when compiling exllama_ext
Issue -
State: closed - Opened by Flameish over 1 year ago
- 3 comments
#144 - Get nan error when prompt is more that 8 tokens.
Issue -
State: closed - Opened by dandm1 over 1 year ago
#126 - Trying to apply Dynamic NTK RoPE scaling into exllama.
Issue -
State: closed - Opened by Panchovix over 1 year ago
- 6 comments
#125 - Setup package
Pull Request -
State: closed - Opened by paolorechia over 1 year ago
- 5 comments
#120 - OOM/CUDA errors when running in batch mode?
Issue -
State: closed - Opened by nikshepsvn over 1 year ago
- 6 comments
#118 - (Experimental) Add support to NTK RoPE scaling
Pull Request -
State: closed - Opened by Panchovix over 1 year ago
- 6 comments
#117 - Issue when attempting to run exllama (P40)
Issue -
State: closed - Opened by wereretot over 1 year ago
- 14 comments
#116 - Should we add option to set seed?
Issue -
State: closed - Opened by bdqfork over 1 year ago
- 2 comments
#115 - NTK RoPE scaling.
Issue -
State: closed - Opened by alkeryn over 1 year ago
- 24 comments
#114 - Is there any way to support multiple parallel generation request to the same model?
Issue -
State: closed - Opened by marcoripa96 over 1 year ago
- 11 comments