Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / hamelsmu/llama-inference issues and pull requests

#17 - CTranslate2: support multiple GPUs (if run with mpirun) and Flash Attention 2.

Pull Request - State: open - Opened by ivanbaldo 10 months ago

#16 - OpenAI compatible servers benchmark based on the anyscale and exllama benchmarks.

Pull Request - State: open - Opened by ivanbaldo 11 months ago

#15 - Update exllama/bench.py from OpenAI 0.27.8 to 1.16.2.

Pull Request - State: open - Opened by ivanbaldo 11 months ago

#14 - Upstream renamed from mlc_chat to mlc_llm.

Pull Request - State: open - Opened by ivanbaldo 11 months ago

#13 - Add new candle-vllm Dockerfile with instructions to benchmark it.

Pull Request - State: open - Opened by ivanbaldo about 1 year ago

#12 - Add PowerInfer benchmark.

Pull Request - State: open - Opened by ivanbaldo about 1 year ago

#11 - Add mlc/Dockerfile with instructions inside it.

Pull Request - State: open - Opened by ivanbaldo about 1 year ago

#10 - Add ctranslate/Dockerfile with instructions to use it.

Pull Request - State: open - Opened by ivanbaldo about 1 year ago

#9 - hf/bench.py: need to specify bfloat16 otherwise it consumes twice as much memory in A10.

Pull Request - State: open - Opened by ivanbaldo about 1 year ago

#8 - New hf/bench-bt_fa.py for testing Optimum BetterTransformer and Flash Attention.

Pull Request - State: closed - Opened by ivanbaldo about 1 year ago - 1 comment

#7 - Add a Dockerfile for the /hf benchmarks with instructions to build and run them.

Pull Request - State: open - Opened by ivanbaldo about 1 year ago

#6 - Fix hf/bench-gptq.py.

Pull Request - State: open - Opened by ivanbaldo about 1 year ago

#5 - Add plot in summary notebook

Pull Request - State: open - Opened by emattia about 1 year ago - 2 comments

#4 - Can you do a benchmark on what happens when you load the basic HF model with bfloat16 ?

Issue - State: open - Opened by mihaipora over 1 year ago

#3 - Add support for all Hugging Face Chat, Text models + OpenAI, Claude2, Cohere, Palm, Replicate models

Pull Request - State: open - Opened by ishaan-jaff over 1 year ago - 2 comments

#2 - Incorrect results for exllama

Issue - State: closed - Opened by arbi-dev over 1 year ago - 3 comments

#1 - How is it compared with the Deepspeed inference?

Issue - State: open - Opened by allanj over 1 year ago