alipay/painlessinferenceacceleration issues and pull requests

#31 - Generation of Draft tokens and Trie tree creation

Issue - State: closed - Opened by Vithulep about 2 months ago

#30 - 论文里看到Table 8 Inference Latency with Lookahead for vLLM.

Issue - State: open - Opened by zjjznw123 3 months ago

#29 - 请问：batch_indices 是什么含义？ LookaheadCache 的put、stream_put函数的最后一个参数ids是什么含义？ class Tree(): 里面的idx代表什么含义？

Issue - State: open - Opened by handsome-chips 4 months ago

#28 - stop_ids does not seem to be taking effect？

Issue - State: open - Opened by jianyuheng 5 months ago

#27 - modeling_qwen attention not use multi branch position ids & attention_mask

Issue - State: open - Opened by snippetzero 6 months ago - 1 comment

#26 - lookahead with do_sample=True does not take temperature, top_k, top_p

Issue - State: open - Opened by learning-chip 7 months ago - 2 comments

#25 - How is verification done in PAIN?

Issue - State: open - Opened by jivanph 8 months ago

#24 - Do lookahead and repetition_penalty conflict?

Issue - State: open - Opened by zhanweiw 8 months ago - 1 comment

#23 - AntRAG

Issue - State: open - Opened by nrmer 8 months ago - 1 comment

#22 - Changing naive attention to SDPA gives wrong result for batched llama example

Issue - State: open - Opened by learning-chip 9 months ago - 3 comments

#21 - size of memory footprint

Issue - State: closed - Opened by nrmer 9 months ago - 1 comment

#20 - 是否支持Qwen 1.5?

Issue - State: open - Opened by hwang824 9 months ago

#19 - TODO in PainlessInferenceAcceleration/pia/lookahead/common/lookahead_cache.py

Issue - State: closed - Opened by nrmer 9 months ago - 1 comment

#18 - Clarification on edls/dls/ft in perf_check

Issue - State: closed - Opened by nrmer 9 months ago - 1 comment

#17 - when the batchSzie > 1, the lookahead not work.

Issue - State: closed - Opened by yuenyu1 9 months ago

#17 - when the batchSzie > 1, the lookahead not work.

Issue - State: closed - Opened by yuenyu1 9 months ago

#16 - Error: no attribute 'rope_theta'` for llama2 model

Issue - State: closed - Opened by learning-chip 9 months ago - 1 comment

#15 - Consultation on Trie Tree Maintenance？

Issue - State: closed - Opened by ZipECHO 10 months ago - 7 comments

#14 - Counting how many forward passes/steps were done when using PAIN

Issue - State: open - Opened by jivanph 10 months ago - 4 comments

#13 - Consider Support CodeLlama?

Issue - State: open - Opened by RainYQ 10 months ago - 2 comments

#12 - 是不支持ptuning之后的模型吗？

Issue - State: closed - Opened by 13269279918 10 months ago - 1 comment

#11 - 为什么我测试没有性能提升？

Issue - State: closed - Opened by MeJerry215 10 months ago - 3 comments

#10 - Dockerfile集成pia无法使用

Issue - State: closed - Opened by May-Yaha 10 months ago - 7 comments

#9 - In the benchmark studies, how are the draft tokens generated?

Issue - State: open - Opened by jivanph 10 months ago - 9 comments

#8 - Update README.md

Pull Request - State: closed - Opened by eltociear 10 months ago

#7 - BUG for chatglm3-6b and Qwen-14B-Int4

Issue - State: closed - Opened by AGI-Jarvis 10 months ago - 7 comments

#6 - 为什么感觉没有什么效果

Issue - State: closed - Opened by shinerdeng 10 months ago - 6 comments

#5 - 速度确实有提升，但是生成的质量存在问题

Issue - State: closed - Opened by dafen12 10 months ago - 2 comments

#4 - 这个lookahead的实现和原版hao-ailab的实现相比优化点在哪里？

Issue - State: closed - Opened by janelu9 10 months ago - 1 comment

#3 - How the performance VS vLLM inference（vLLM vs Lookahead）

Issue - State: open - Opened by buptygz 10 months ago - 5 comments

#3 - How the performance VS vLLM inference（vLLM vs Lookahead）

Issue - State: open - Opened by buptygz 10 months ago - 5 comments

#2 - 可以批(batch)推理吗？比如一次推理256个输入？满级压榨GPU！^_^

Issue - State: closed - Opened by janelu9 10 months ago - 3 comments

#1 - 请问有支持量化版本的qwen架构的计划吗？

Issue - State: closed - Opened by xiningnlp 10 months ago - 2 comments

Ecosyste.ms: Issues

GitHub / alipay/painlessinferenceacceleration issues and pull requests

#31 - Generation of Draft tokens and Trie tree creation

#30 - 论文里看到Table 8 Inference Latency with Lookahead for vLLM.

#29 - 请问：batch_indices 是什么含义？ LookaheadCache 的put、stream_put函数的最后一个参数ids是什么含义？ class Tree(): 里面的idx代表什么含义？

#28 - stop_ids does not seem to be taking effect？

#27 - modeling_qwen attention not use multi branch position ids & attention_mask

#26 - lookahead with do_sample=True does not take temperature, top_k, top_p

#25 - How is verification done in PAIN?

#24 - Do lookahead and repetition_penalty conflict?

#23 - AntRAG

#22 - Changing naive attention to SDPA gives wrong result for batched llama example

#21 - size of memory footprint

#20 - 是否支持Qwen 1.5?

#19 - TODO in PainlessInferenceAcceleration/pia/lookahead/common/lookahead_cache.py

#18 - Clarification on edls/dls/ft in perf_check

#17 - when the batchSzie > 1, the lookahead not work.

#17 - when the batchSzie > 1, the lookahead not work.

#16 - Error: no attribute 'rope_theta'` for llama2 model

#15 - Consultation on Trie Tree Maintenance？

#14 - Counting how many forward passes/steps were done when using PAIN

#13 - Consider Support CodeLlama?

#12 - 是不支持ptuning之后的模型吗？

#11 - 为什么我测试没有性能提升？

#10 - Dockerfile集成pia无法使用

#9 - In the benchmark studies, how are the draft tokens generated?

#8 - Update README.md

#7 - BUG for chatglm3-6b and Qwen-14B-Int4

#6 - 为什么感觉没有什么效果

#5 - 速度确实有提升，但是生成的质量存在问题

#4 - 这个lookahead的实现和原版hao-ailab的实现相比优化点在哪里？

#3 - How the performance VS vLLM inference（vLLM vs Lookahead）

#3 - How the performance VS vLLM inference（vLLM vs Lookahead）

#2 - 可以批(batch)推理吗？比如一次推理256个输入？满级压榨GPU！^_^

#1 - 请问有支持量化版本的qwen架构的计划吗？