turboderp/exllama issues and pull requests

#112 - Update model_compatibility.md

Pull Request - State: open - Opened by eltociear over 1 year ago - 1 comment

#111 - Why is there a huge lag between reading the prompt and starting to generate output?

Issue - State: closed - Opened by ENjoyBlue2021 over 1 year ago - 6 comments

#110 - Strange behavior with caching on 8K models

Issue - State: closed - Opened by kaiokendev over 1 year ago - 2 comments

#108 - not support lora with autogptq/peft?

Issue - State: closed - Opened by laoda513 over 1 year ago - 6 comments

#107 - openllama support

Issue - State: closed - Opened by cnut1648 over 1 year ago - 4 comments

#106 - More functions in webui, interface is more adapted to mobile

Pull Request - State: open - Opened by CORRUPTOR2037 over 1 year ago - 1 comment

#105 - Compiling issue on Sagemaker

Issue - State: closed - Opened by buzzCraft over 1 year ago - 7 comments

#104 - Adds the possibility to influence prediction with bias

Pull Request - State: closed - Opened by paolorechia over 1 year ago - 18 comments

#103 - Integrating with Guidance: adding a positive bias to certain tokens

Issue - State: closed - Opened by paolorechia over 1 year ago - 5 comments

#101 - Fixed: batching lead to faulty results, crashes and men wielding bananas.

Pull Request - State: closed - Opened by aljungberg over 1 year ago - 14 comments

#100 - ImportError: DLL load failed while importing exllama_ext: 找不到指定的模块。

Issue - State: closed - Opened by onexixi over 1 year ago - 7 comments

#99 - TheBloke/robin-13B-v2-GPTQ - models keeps generating tokens

Issue - State: closed - Opened by marcoripa96 over 1 year ago - 2 comments

#98 - OOM even with multiple GPUs (4x 3090 @ 24GB)

Issue - State: closed - Opened by nikshepsvn over 1 year ago - 21 comments

#97 - Fix download_dataset and perplexity wrt to downloaded datasets on Windows

Pull Request - State: closed - Opened by allenbenz over 1 year ago

#96 - Fix AttributeError: 'torch.device' object has no attribute 'startswith' when using gpu_peer_fix.

Pull Request - State: closed - Opened by Panchovix over 1 year ago

#95 - 3-bit and 2-bit GPTQ support

Issue - State: closed - Opened by TechnotechGit over 1 year ago - 23 comments

#94 - About Llama checkpoint 4-bit

Issue - State: closed - Opened by Iambestfeed over 1 year ago - 3 comments

#93 - Custom this repo for another architecture like BLOOM, MPT, Falcon

Issue - State: closed - Opened by Iambestfeed over 1 year ago - 1 comment

#92 - Interesting method to extend a model's max context length.

Issue - State: closed - Opened by allenbenz over 1 year ago - 49 comments

#89 - Fix compiling in venv on Windows

Pull Request - State: closed - Opened by EyeDeck over 1 year ago - 5 comments

#88 - Benchmarks vs vLLM?

Issue - State: closed - Opened by nikshepsvn over 1 year ago - 6 comments

#87 - Request for server API script without sessions

Issue - State: closed - Opened by CORRUPTOR2037 over 1 year ago - 5 comments

#86 - elapsed can be 0 for prompt processing on windows

Pull Request - State: closed - Opened by allenbenz over 1 year ago - 1 comment

#85 - Support for models with 8-bit quants?

Issue - State: closed - Opened by Panchovix over 1 year ago - 3 comments

#84 - Minor import time output suppression for windows

Pull Request - State: closed - Opened by allenbenz over 1 year ago - 1 comment

#83 - Add option to run docker container as root user

Pull Request - State: closed - Opened by nopperl over 1 year ago - 2 comments

#82 - Add waitress to Dockerfile please

Issue - State: closed - Opened by ghost over 1 year ago - 1 comment

#81 - performance & quality drop (3x) when setting top_p = 1.0 vs. 0.99

Issue - State: closed - Opened by matatonic over 1 year ago - 4 comments

#79 - TypeError: 'type' object is not subscriptable

Issue - State: closed - Opened by KPTK over 1 year ago - 5 comments

#78 - Multimodal support

Issue - State: closed - Opened by realsammyt over 1 year ago - 9 comments

#77 - Problem with generation leading space.

Issue - State: closed - Opened by Larryvrh over 1 year ago

#76 - "fatal error LNK1104: cannot open file 'python310.lib'" + Solution (Windows)

Issue - State: closed - Opened by JLuke73 over 1 year ago - 8 comments

#75 - Tesla P40 only using 70W underload

Issue - State: closed - Opened by TimyIsCool over 1 year ago - 15 comments

#74 - Support for llama models with >2048 context?

Issue - State: closed - Opened by Panchovix over 1 year ago - 1 comment

#73 - Using cache cause random behavior during generation

Issue - State: closed - Opened by Larryvrh over 1 year ago - 6 comments

#72 - Is is able to turning with exllama？

Issue - State: closed - Opened by laoda513 over 1 year ago - 21 comments

#71 - Fix some cublas hipification

Pull Request - State: closed - Opened by ardfork over 1 year ago

#70 - OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.

Issue - State: closed - Opened by zero-thermo over 1 year ago - 4 comments

#69 - Using QLoRA?

Issue - State: closed - Opened by gameveloster over 1 year ago - 1 comment

#68 - Added streaming langchain example.

Pull Request - State: open - Opened by CoffeeVampir3 over 1 year ago - 2 comments

#65 - Error running. ArgTypes. Ninja: Build stopped: subcommand failed

Issue - State: closed - Opened by ckasimis over 1 year ago - 5 comments

#64 - Support for StarCoder

Issue - State: closed - Opened by bkutasi over 1 year ago - 1 comment

#63 - Possible to add a pip package?

Issue - State: closed - Opened by CoffeeVampir3 over 1 year ago - 2 comments

#62 - KeyError when loading GPTQ Model

Issue - State: closed - Opened by mambug over 1 year ago - 2 comments

#59 - Batch generation support

Issue - State: closed - Opened by ri938 over 1 year ago - 2 comments

#58 - Add option to run docker container as root user (fixes #57)

Pull Request - State: closed - Opened by nopperl over 1 year ago - 5 comments

#57 - Docker and ownership permissions

Issue - State: closed - Opened by chrisbward over 1 year ago - 5 comments

#56 - Add flask inference example

Pull Request - State: closed - Opened by Kerushii over 1 year ago - 2 comments

#55 - Lora support

Issue - State: open - Opened by alain40 over 1 year ago - 18 comments

#54 - SqueezeLLM Support?

Issue - State: closed - Opened by nikshepsvn over 1 year ago - 1 comment

#53 - New API endpoint

Pull Request - State: closed - Opened by jisungk2 over 1 year ago - 5 comments

#52 - ExLlamaDeviceMap's layers offload to CPU?

Issue - State: closed - Opened by tiendung over 1 year ago - 1 comment

#51 - Correct years from 2024 to 2023

Pull Request - State: closed - Opened by tiendung over 1 year ago - 2 comments

#50 - API for batched input?

Issue - State: closed - Opened by 0x1997 over 1 year ago - 8 comments

#49 - how to get correct model type?

Issue - State: closed - Opened by lx0126z over 1 year ago - 5 comments

#48 - Feature Request: length_penalty support

Issue - State: closed - Opened by Qubitium over 1 year ago - 3 comments

#47 - Very poor output quality

Issue - State: open - Opened by calebmor460 over 1 year ago - 55 comments

#46 - Landmark Attention support

Issue - State: closed - Opened by grimulkan over 1 year ago - 17 comments

#45 - Perplexity refactor

Pull Request - State: closed - Opened by lhl over 1 year ago

#44 - "ValueError: Found group index but no groupsize. What do?"

Issue - State: closed - Opened by dvoidus over 1 year ago - 4 comments

#43 - Add docker support

Pull Request - State: closed - Opened by nopperl over 1 year ago - 4 comments

#42 - make pascal compile

Pull Request - State: closed - Opened by Ph0rk0z over 1 year ago

#41 - Perplexity Data Format/Testing Data Question

Issue - State: closed - Opened by lhl over 1 year ago - 20 comments

#40 - RuntimeError: CUDA error: an illegal memory access was encountered

Issue - State: open - Opened by TianqiYe over 1 year ago - 11 comments

#38 - 65B working on multi-gpu

Issue - State: closed - Opened by ortegaalfredo over 1 year ago - 1 comment

#37 - Streaming API

Issue - State: open - Opened by bkutasi over 1 year ago - 5 comments

#36 - Improve Windows compatibility

Pull Request - State: closed - Opened by EyeDeck over 1 year ago - 2 comments

#35 - Pure C++ core instead of Python

Issue - State: closed - Opened by gotzmann over 1 year ago - 1 comment

#33 - Can't compile on Windows

Issue - State: closed - Opened by Panchovix over 1 year ago - 13 comments

#32 - 2 x RTX A5000 performance

Issue - State: closed - Opened by alain40 over 1 year ago - 14 comments

#30 - Typo in model.py

Issue - State: closed - Opened by g0morra over 1 year ago

#29 - Performance degradation

Issue - State: open - Opened by dvoidus over 1 year ago - 20 comments

#28 - --host for running webui across the network

Pull Request - State: closed - Opened by disarmyouwitha over 1 year ago

#27 - will it work with Nvidia P40 24GB on Linux?

Issue - State: open - Opened by waan1 over 1 year ago - 29 comments

#26 - WebUI Multi-bot

Issue - State: closed - Opened by Fairfax-Mooresby over 1 year ago - 3 comments

#25 - Get error when compiling.

Issue - State: closed - Opened by Cortega13 over 1 year ago - 2 comments

#23 - Fix reuse

Pull Request - State: closed - Opened by osmarks over 1 year ago - 1 comment

#21 - TransformerEngine FP8 support

Issue - State: closed - Opened by SinanAkkoyun over 1 year ago - 4 comments

#20 - Kernel wouldn't compile in my conda env

Issue - State: closed - Opened by Ph0rk0z over 1 year ago - 8 comments

#19 - the inference speed of GPTQ 4bit quantized model

Issue - State: closed - Opened by pineking over 1 year ago - 25 comments

#17 - Are you able to help?

Issue - State: closed - Opened by NO-ob over 1 year ago - 4 comments

#15 - Gradio error: "Not implemented yet"

Issue - State: closed - Opened by mmealman over 1 year ago - 2 comments

#14 - Question - possible to run starcoder with exllama?

Issue - State: closed - Opened by tpfwrz over 1 year ago - 8 comments

#13 - ExLlama API spec / discussion

Issue - State: closed - Opened by nikshepsvn over 1 year ago - 6 comments

#12 - Error when trying to run Wizard-Vicuna-13B-Uncensored-GPTQ

Issue - State: closed - Opened by nikshepsvn over 1 year ago - 8 comments

#11 - Pushing working code to master

Pull Request - State: closed - Opened by disarmyouwitha over 1 year ago - 1 comment

#10 - Splitting model on multiple GPUs produces RuntimeError

Issue - State: closed - Opened by h3ss over 1 year ago - 19 comments

#8 - Turn into Python module, hack in transformers support

Pull Request - State: closed - Opened by 0cc4m over 1 year ago - 4 comments

#7 - Add ROCm support

Pull Request - State: closed - Opened by ardfork over 1 year ago - 75 comments

#6 - RTX 3060 12GB Benchmarking

Issue - State: closed - Opened by 1aienthusiast over 1 year ago - 6 comments

#5 - Working with TheBloke/WizardLM-30B-Uncensored-GPTQ

Issue - State: closed - Opened by gabriel-peracio over 1 year ago - 4 comments

#3 - Multi-GPU

Issue - State: closed - Opened by Fairfax-Mooresby over 1 year ago - 6 comments

#2 - Cuda 12.1 - Fails to Build Here

Issue - State: closed - Opened by ilikenwf over 1 year ago - 29 comments

#1 - Crashing with act order and no act order since latest changes.

Issue - State: closed - Opened by disarmyouwitha over 1 year ago - 3 comments

GitHub / turboderp/exllama issues and pull requests