alibaba/rtp-llm issues and pull requests

#120 - 2张16G的T4卡都跑不起来examples/test.py

Issue - State: open - Opened by zhangtaibo 9 days ago

#119 - [cpu] add sampleGreedy implementation

Pull Request - State: closed - Opened by wenhuanh 10 days ago

#118 - fix: open source build and deps on Arm

Pull Request - State: open - Opened by TianyuLi0 11 days ago

#117 - perf: optimization of attention, softmax, layernorm

Pull Request - State: closed - Opened by Reyfone 12 days ago

#116 - Add grouped query attention support

Pull Request - State: closed - Opened by Reyfone 17 days ago

#115 - [Doc] 多卡并行文档修改建议

Issue - State: open - Opened by flliny 23 days ago

#114 - RTP-LLM 模式下，llama3.1 FP16 效果不一样

Issue - State: open - Opened by anigi98932 23 days ago

#113 - support to run example/test.py and integrate optimized gemm/attention operator

Pull Request - State: closed - Opened by TianyuLi0 24 days ago - 1 comment

#112 - support to run example/test.py on Arm

Pull Request - State: closed - Opened by TianyuLi0 about 1 month ago

#111 - 双卡A6000推理，模型推理结束，一张卡GPU利用率为0，一张卡GPU利用率100%

Issue - State: open - Opened by zf761 about 1 month ago - 1 comment

#110 - 无法运行tests目录下的Python测试脚本，缺少libtest_ops.so

Issue - State: open - Opened by leepoly about 1 month ago - 1 comment

#109 - 双卡A6000推理，模型推理结束，一张卡GPU利用率为0，一张卡GPU利用率100%

Issue - State: open - Opened by zf761 about 1 month ago - 1 comment

#108 - fix: unit test and cpp model test

Pull Request - State: closed - Opened by Reyfone about 1 month ago

#107 - Enable MHA parallel on Arm

Pull Request - State: closed - Opened by Reyfone about 1 month ago

#106 - attention: add MHA parallel support

Pull Request - State: closed - Opened by Reyfone about 1 month ago - 1 comment

#105 - speculate sampling用medusa加载medusa官方模型报错

Issue - State: open - Opened by wcsjtu about 1 month ago - 6 comments

#104 - reranker token长度拦截异常

Issue - State: closed - Opened by invisifire about 2 months ago - 2 comments

#103 - add opt_125M

Pull Request - State: open - Opened by Nanuion about 2 months ago - 2 comments

#102 - 新增OPT模型，模型输出不符合预期

Issue - State: closed - Opened by samaritan1998 about 2 months ago

#101 - [CPU] add implementation for GEMM and token embedding

Pull Request - State: closed - Opened by wenhuanh about 2 months ago

#100 - 推理出现乱码（показать показать показать показать показать）（USE_NEW_DEVICE_IMPL=1）

Issue - State: open - Opened by w066650 about 2 months ago - 1 comment

#99 - [ROCm] refine quantization related code

Pull Request - State: closed - Opened by feifei14119 2 months ago - 2 comments

#98 - [ROCm] MoE version1

Pull Request - State: closed - Opened by feifei14119 2 months ago - 1 comment

#97 - [ROCm] Support Int4 and bf16 for rocm version

Pull Request - State: closed - Opened by feifei14119 2 months ago

#96 - [ROCm] add quant op and port rccl

Pull Request - State: closed - Opened by feifei14119 2 months ago

#95 - 新增OPT模型后跑不通，报CUDA错误

Issue - State: closed - Opened by samaritan1998 2 months ago - 5 comments

#94 - [ROCm] Includes docker container creation script file for rocm build

Pull Request - State: closed - Opened by feifei14119 2 months ago - 1 comment

#93 - [ROCm] Fix ROCm sampler OP test

Pull Request - State: closed - Opened by feifei14119 2 months ago

#92 - [cpu-impl] Add for layernorm and rmsnorm

Pull Request - State: closed - Opened by wenhuanh 3 months ago

#91 - HELP: No matching distribution found for torch==2.1.0+cu121 error while install maga_transformer with .whl in release 0.2.0

Issue - State: open - Opened by HuXinjing 3 months ago - 7 comments

#90 - fix: adapt to index based kv cache for Arm device

Pull Request - State: closed - Opened by Reyfone 3 months ago - 1 comment

#89 - `Illegal instruction` error when running version 0.2.0

Issue - State: closed - Opened by frankang 3 months ago - 2 comments

#88 - bazel build error

Issue - State: closed - Opened by frankang 3 months ago - 2 comments

#87 - [ROCm] Port basic gpt model to rocm. qwen2 end-to-end test pass

Pull Request - State: closed - Opened by feifei14119 3 months ago - 5 comments

#86 - 您好，I'd like to ask a question that might not be very professional. In the code, the weights are loaded through Python. Where are they passed to the C++(fasttransformer) part？

Issue - State: closed - Opened by samaritan1998 3 months ago - 1 comment

#85 - [DRAFT] not ready, please do NOT review

Pull Request - State: closed - Opened by feifei14119 3 months ago - 1 comment

#84 - support DeepSeek-V2-Lite-Chat

Issue - State: open - Opened by jianglan89 3 months ago - 1 comment

#83 - feat: add arm cpu device support

Pull Request - State: closed - Opened by TianyuLi0 3 months ago - 1 comment

#82 - 多机单卡/多卡，报错 gang_info self None

Issue - State: closed - Opened by MasterJanus 3 months ago

#81 - [ROCm] Init rocm_impl device and add test op

Pull Request - State: closed - Opened by feifei14119 3 months ago - 4 comments

#80 - feat: add cpu attention api

Pull Request - State: closed - Opened by wenhuanh 3 months ago

#79 - [ROCm] Initial enablement

Pull Request - State: closed - Opened by draganmladjenovic 3 months ago - 6 comments

#78 - git clone Error

Issue - State: closed - Opened by hz0ne 3 months ago - 3 comments

#77 - fix(src): fix bazel build special type cast and template match for cuda118

Pull Request - State: closed - Opened by khan-yin 3 months ago - 14 comments

#76 - 单机多卡如何制定卡号

Issue - State: closed - Opened by 256785 3 months ago - 1 comment

#75 - Glm4v运行问题

Issue - State: closed - Opened by samaritan1998 3 months ago - 3 comments

#74 - v0.2.0(cuda12)对比 v0.1.13(cuda11)表现下降

Issue - State: open - Opened by invisifire 3 months ago - 1 comment

#73 - glm4v 单卡Cuda out of memory

Issue - State: closed - Opened by samaritan1998 3 months ago - 1 comment

#72 - 请问 0.2.0 支持cuda 11环境么？

Issue - State: closed - Opened by samaritan1998 3 months ago

#71 - qwen2 gptq tp=4 报错：AssertionError: error config

Issue - State: open - Opened by xinge666 3 months ago - 3 comments

#70 - 编译报错 v0.2.0 版error: cannot convert ‘<brace-enclosed initializer list>’ to ‘const fastertransformer::AllGatherParams&’

Issue - State: closed - Opened by invisifire 3 months ago - 8 comments

#69 - ChatGLM4-9B运行不起来

Issue - State: closed - Opened by samaritan1998 3 months ago - 1 comment

#68 - v0.1.13 load qwen2 gptq失败

Issue - State: closed - Opened by xinge666 4 months ago - 2 comments

#67 - 请问multimodel_mixin.py中BaseMultiModalWeightInfo的_get_vit_info函数中self.vit_weights？

Issue - State: closed - Opened by samaritan1998 4 months ago - 8 comments

#66 - openai.InternalServerError: Error code: 500 - {'error_code': 514, 'message': 'ErrorMsg: failed to malloc 134 blocks, only 28 blocks left

Issue - State: closed - Opened by wanglaiqi 4 months ago - 1 comment

#65 - Does it support Qwen2、ChatGLM4-9B?

Issue - State: closed - Opened by ZCDu 4 months ago - 4 comments

#64 - 多卡部署空闲但导致的其他模型速度降低很多

Issue - State: closed - Opened by invisifire 4 months ago - 2 comments

#63 - Qwen Chat CUDA OutOfMemory

Issue - State: open - Opened by xorange 4 months ago - 2 comments

#62 - [Feature Request] llama3

Issue - State: closed - Opened by samaritan1998 4 months ago - 1 comment

#61 - [Feature Request] Add support for CogVLM2

Issue - State: closed - Opened by samaritan1998 5 months ago - 5 comments

#60 - Qwen-vl-chat的结果和transformer的结果不一样，有点奇怪地像续写出来的

Issue - State: closed - Opened by chiquitita-101 5 months ago - 3 comments

#59 - 请问支持流式吗？

Issue - State: closed - Opened by lcvcl 5 months ago - 1 comment

#58 - + Add ffn layer cpu impl

Pull Request - State: closed - Opened by wenhuanh 5 months ago - 1 comment

#57 - Buffer overflow at CudaAttentionOpTest::selfAttentionOpTest

Issue - State: closed - Opened by skmkt 5 months ago - 1 comment

#56 - rtp-llm example test issue

Issue - State: closed - Opened by haic0 5 months ago - 1 comment

#55 - Remove print statements

Issue - State: closed - Opened by mrchi 5 months ago - 1 comment

#54 - update multi-gpu.md

Pull Request - State: closed - Opened by gujingit 6 months ago

#53 - Feature request: encoder-decoder model support

Issue - State: closed - Opened by samaritan1998 6 months ago - 1 comment

#52 - Build error in cuda:tensor_utils: ‘getTypeSize’ is not a member of ‘fastertransformer::Tensor

Issue - State: closed - Opened by skmkt 6 months ago - 1 comment

#51 - 多卡推理

Issue - State: closed - Opened by Vincent131499 6 months ago - 8 comments

#50 - failed to run : RuntimeError: torch.cat()

Issue - State: closed - Opened by davideuler 6 months ago - 1 comment

#49 - 请问src/fastertransformer/models/multi_gpu_gpt/ParallelAttentionWrapper.cc的ContextAttention是什么概念呢？和SelfAttention有什么区别呢

Issue - State: closed - Opened by samaritan1998 6 months ago - 2 comments

#48 - qwen1.5-14b-chat部署awq

Issue - State: closed - Opened by Vincent131499 6 months ago - 3 comments

#47 - awq

Issue - State: closed - Opened by Vincent131499 6 months ago - 2 comments

#46 - ValueError: max() arg is an empty sequence

Issue - State: closed - Opened by boxiaowave 6 months ago - 4 comments

#45 - 请问kmonitor metrics怎么开启打印呢？想测试一下每个阶段的耗时

Issue - State: closed - Opened by samaritan1998 6 months ago - 1 comment

#44 - bazel构建成功，但是测试报错

Issue - State: closed - Opened by samaritan1998 6 months ago - 4 comments

#43 - 2 GPUs with TP=2 run Lora inference, one GPU

Issue - State: closed - Opened by cwlseu 6 months ago - 1 comment

#42 - bazel构建失败

Issue - State: closed - Opened by samaritan1998 6 months ago - 10 comments

#41 - Error in DeployDocker.md

Issue - State: closed - Opened by vegetable-yx 6 months ago - 1 comment

#40 - Poor performance at batchsize=1 on V100

Issue - State: closed - Opened by cwlseu 6 months ago - 12 comments

#39 - 按照官方示例 https://github.com/alibaba/rtp-llm/blob/main/docs/Multimodal-Tutorial.md 报错 maga_transformer.config.exceptions.FtRuntimeException: raw request format cannot accept dict prompt

Issue - State: closed - Opened by samaritan1998 6 months ago - 1 comment

#38 - bazel cu11x 编译失败

Issue - State: closed - Opened by cwlseu 6 months ago - 1 comment

#37 - 怎么使用qwen medusa推理加速

Issue - State: closed - Opened by BucherLi 6 months ago - 2 comments

#36 - 0.1.8 release cuda12.1 whl包不完整

Issue - State: closed - Opened by is 6 months ago - 3 comments

#35 - 最新whl包无法启动server

Issue - State: closed - Opened by frankang 7 months ago - 5 comments

#34 - follow readme then error

Issue - State: closed - Opened by okwinds 7 months ago - 2 comments

#33 - bazel编译失败

Issue - State: closed - Opened by yuhui-xie 7 months ago - 1 comment

#32 - Problem：多模态的部分是如何处理的？

Issue - State: closed - Opened by t90tank 7 months ago - 1 comment

#31 - Is there a plan to support Eagle?

Issue - State: closed - Opened by cdliang11 7 months ago - 1 comment

#30 - docs: fix link

Pull Request - State: closed - Opened by cdliang11 7 months ago - 1 comment

#29 - BUG: MISSING QUOTATION MARKS AND LINE BREAKS

Issue - State: closed - Opened by invisifire 7 months ago - 2 comments

#28 - build error：ERROR: An error occurred during the fetch of repository 'pip_gpu_cuda12_torch

Issue - State: closed - Opened by FightingMan 7 months ago - 2 comments

#27 - random_seed未生效

Issue - State: closed - Opened by frankang 7 months ago - 1 comment

#26 - [bug ?] mega_transformer/models/llava.py中encode_images方法

Issue - State: closed - Opened by samaritan1998 7 months ago - 1 comment

#25 - #RTP-LLM Developer Event# 春季限定活动，捉bug送美味咖啡☕️

Issue - State: open - Opened by tt0718 7 months ago - 2 comments

#24 - KeyError: 'MODEL_TYPE'

Issue - State: closed - Opened by York-RDWang 7 months ago - 1 comment

#23 - support Yi-Vl

Issue - State: open - Opened by Lzhang-hub 7 months ago - 5 comments

#22 - NameError: name 'Middleware'is not defined, Did you mean: 'CoRsMiddleware'?

Issue - State: closed - Opened by zzhdbw 7 months ago - 1 comment

#21 - doc: update cuda12 dep file path

Pull Request - State: closed - Opened by gujingit 7 months ago - 1 comment

GitHub / alibaba/rtp-llm issues and pull requests