NVIDIA/FasterTransformer issues and pull requests

#795 - Can FasterTransformer support SAM2 ( Meta segment anything model 2)?

Issue - State: open - Opened by jackwei86 2 months ago

#794 - How to implement a gemm with FP16 and INT4 using kernel in FasterTransformer/src/fastertransformer/kernels/cutlass_kernels/fpA_intB_gemm

Issue - State: open - Opened by AkatsukiChiri 4 months ago

#793 - An error occurred for the specific cuda version

Issue - State: open - Opened by CSLiuPeng 6 months ago
Labels: bug

#792 - can be used in diffusion models,like sd and sdxl? how?where is the demos?tks

Issue - State: open - Opened by henbucuoshanghai 7 months ago
Labels: bug

#791 - fix: fix position_encoding_table memory error.

Pull Request - State: open - Opened by johnson-magic 8 months ago

#790 - bug: memory of position_encoding_table is not malloced correctly.

Issue - State: open - Opened by johnson-magic 8 months ago
Labels: bug

#789 - error: ‘CUDNN_DATA_BFLOAT16’ was not declared in this scope; did you mean ‘CUDNN_DATA_FLOAT’

Issue - State: closed - Opened by johnson-magic 9 months ago

#788 - How to know the correspondence between versions vcr.io/nvidia/pytorch:xx.xx-py3 and pytorch?

Issue - State: open - Opened by johnson-magic 9 months ago

#787 - what is the mean of EFF-FT?

Issue - State: open - Opened by johnson-magic 9 months ago

#786 - Are `fuseQKV masked attention` and Flash Attention the same?

Issue - State: open - Opened by likejazz 9 months ago

#785 - M2M

Pull Request - State: closed - Opened by sfc-gh-ybsat 10 months ago - 1 comment

#784 - on H800 can not exec nvidia/pytorch:23.09-py3 container success

Issue - State: open - Opened by chenglinjun 11 months ago

#783 - Confidence is not returned in the decoding example？

Issue - State: open - Opened by liuzhuang1024 11 months ago
Labels: bug

#782 - multi_block_mode performance issue

Issue - State: closed - Opened by akhoroshev 11 months ago - 1 comment

#782 - multi_block_mode performance issue

Issue - State: closed - Opened by akhoroshev 11 months ago - 1 comment

#781 - Does FasterTransformer support multi-stream pipeline parallelism ?

Issue - State: open - Opened by FlyingPotatoZ 12 months ago

#781 - Does FasterTransformer support multi-stream pipeline parallelism ?

Issue - State: open - Opened by FlyingPotatoZ 12 months ago

#780 - Free memory buffer - Llama

Pull Request - State: closed - Opened by sfc-gh-ybsat about 1 year ago

#780 - Free memory buffer - Llama

Pull Request - State: closed - Opened by sfc-gh-ybsat about 1 year ago

#779 - error You need C++17 to compile PyTorch

Issue - State: open - Opened by ranggihwang about 1 year ago
Labels: bug

#779 - error You need C++17 to compile PyTorch

Issue - State: open - Opened by ranggihwang about 1 year ago
Labels: bug

#778 - can support decoder only bart? such as MBartForCausalLM

Issue - State: open - Opened by sjtu-cz about 1 year ago

#778 - can support decoder only bart? such as MBartForCausalLM

Issue - State: open - Opened by sjtu-cz about 1 year ago

#777 - repetition_penalty logic in FT has bug

Issue - State: closed - Opened by hezeli123 about 1 year ago - 1 comment
Labels: bug

#777 - repetition_penalty logic in FT has bug

Issue - State: closed - Opened by hezeli123 about 1 year ago - 1 comment
Labels: bug

#776 - Update README.md

Pull Request - State: open - Opened by eltociear about 1 year ago

#776 - Update README.md

Pull Request - State: open - Opened by eltociear about 1 year ago

#775 - Sparsity support

Issue - State: open - Opened by zhang662817 about 1 year ago
Labels: bug

#775 - Sparsity support

Issue - State: open - Opened by zhang662817 about 1 year ago
Labels: bug

#774 - How to get started?

Issue - State: open - Opened by turbobuilt about 1 year ago

#774 - How to get started?

Issue - State: open - Opened by turbobuilt about 1 year ago

#773 - Fix shape mismatch on the masked_tokens param in decoder masked multi-head attention kernel.

Pull Request - State: open - Opened by FengDSP about 1 year ago

#773 - Fix shape mismatch on the masked_tokens param in decoder masked multi-head attention kernel.

Pull Request - State: open - Opened by FengDSP about 1 year ago

#772 - How to serving multi-gpu inference？

Issue - State: closed - Opened by Alone-wl about 1 year ago - 1 comment

#772 - How to serving multi-gpu inference？

Issue - State: closed - Opened by Alone-wl about 1 year ago - 1 comment

#771 - Is llama2 70b supported? Do you know minimal configuration?

Issue - State: open - Opened by ChristineSeven about 1 year ago - 1 comment

#771 - Is llama2 70b supported? Do you know minimal configuration?

Issue - State: open - Opened by ChristineSeven about 1 year ago - 1 comment

#770 - Include stdio.h

Pull Request - State: open - Opened by JihaoXin about 1 year ago - 2 comments

#770 - Include stdio.h

Pull Request - State: open - Opened by JihaoXin about 1 year ago - 2 comments

#769 - Supporting for expert parallelism in MoE inference

Issue - State: open - Opened by iteratorlee about 1 year ago

#769 - Supporting for expert parallelism in MoE inference

Issue - State: open - Opened by iteratorlee about 1 year ago

#768 - Whether fastertransformer supports gpt-2 classification model, such as GPT2ForSequenceClassification？

Issue - State: open - Opened by cabbagetalk about 1 year ago

#768 - Whether fastertransformer supports gpt-2 classification model, such as GPT2ForSequenceClassification？

Issue - State: open - Opened by cabbagetalk about 1 year ago

#767 - cuSPARSELt is slower?

Issue - State: open - Opened by BDHU about 1 year ago - 1 comment

#766 - Incorrect inline ptx device assembly code usage

Issue - State: open - Opened by zhiweij1 about 1 year ago
Labels: bug

#766 - Incorrect inline ptx device assembly code usage

Issue - State: open - Opened by zhiweij1 about 1 year ago
Labels: bug

#765 - CUDA code compile error with clang: function template partial specialization is not allowed

Issue - State: open - Opened by zhiweij1 about 1 year ago
Labels: bug

#764 - How to calculate local batch size?

Issue - State: open - Opened by fotstrt about 1 year ago

#764 - How to calculate local batch size?

Issue - State: open - Opened by fotstrt about 1 year ago

#763 - src/fastertransformer/kernels/decoder_masked_multihead_attention /decoder_masked_multihead_attention_template.hpp:36 open this macro definition, it'll find a build error

Issue - State: open - Opened by pengl about 1 year ago
Labels: bug

#763 - src/fastertransformer/kernels/decoder_masked_multihead_attention /decoder_masked_multihead_attention_template.hpp:36 open this macro definition, it'll find a build error

Issue - State: open - Opened by pengl about 1 year ago
Labels: bug

#762 - Ft llama opt

Pull Request - State: open - Opened by dypshong about 1 year ago

#762 - Ft llama opt

Pull Request - State: open - Opened by dypshong about 1 year ago

#761 - terminate called after throwing an instance of 'std::runtime_error'

Issue - State: open - Opened by HalFTeen about 1 year ago

#760 - fastertransformer/utils/nccl_utils.cc:62 'unhandled cuda error'

Issue - State: open - Opened by wangweiwei1188 about 1 year ago
Labels: bug

#760 - fastertransformer/utils/nccl_utils.cc:62 'unhandled cuda error'

Issue - State: open - Opened by wangweiwei1188 about 1 year ago
Labels: bug

#759 - Support for "no_repeat_ngram_size" parameter for generation

Issue - State: open - Opened by shreysingla11 about 1 year ago - 2 comments

#759 - Support for "no_repeat_ngram_size" parameter for generation

Issue - State: open - Opened by shreysingla11 about 1 year ago - 2 comments

#758 - Does FasterTransformer use FlashAttention?

Issue - State: open - Opened by niyunsheng about 1 year ago

#758 - Does FasterTransformer use FlashAttention?

Issue - State: open - Opened by niyunsheng about 1 year ago

#757 - Which part should I modify to achieve inference pipeline schedule (like micro-batch)?

Issue - State: open - Opened by dannyxiaocn about 1 year ago

#757 - Which part should I modify to achieve inference pipeline schedule (like micro-batch)?

Issue - State: open - Opened by dannyxiaocn about 1 year ago

#756 - Support Seq length up to 8K

Pull Request - State: open - Opened by zhen-jia about 1 year ago

#756 - Support Seq length up to 8K

Pull Request - State: open - Opened by zhen-jia about 1 year ago

#755 - [cmake] fix cmake policy for ENABLE_FP8

Pull Request - State: closed - Opened by DefTruth about 1 year ago

#755 - [cmake] fix cmake policy for ENABLE_FP8

Pull Request - State: closed - Opened by DefTruth about 1 year ago

#754 - flashattention only enabled for gpt-styled models

Issue - State: open - Opened by flexwang about 1 year ago - 7 comments

#754 - flashattention only enabled for gpt-styled models

Issue - State: open - Opened by flexwang about 1 year ago - 7 comments

#753 - How to get a npz file that satisfy the input requirement?

Issue - State: open - Opened by jy00161yang about 1 year ago - 1 comment
Labels: bug

#753 - How to get a npz file that satisfy the input requirement?

Issue - State: open - Opened by jy00161yang about 1 year ago - 1 comment
Labels: bug

#752 - [Long seq length] GPT Seq length constrain

Issue - State: open - Opened by zhen-jia about 1 year ago - 14 comments

#752 - [Long seq length] GPT Seq length constrain

Issue - State: open - Opened by zhen-jia about 1 year ago - 14 comments

#751 - specify the recognition language for Whisper

Issue - State: open - Opened by echodjx about 1 year ago

#751 - specify the recognition language for Whisper

Issue - State: open - Opened by echodjx about 1 year ago

#750 - [BugFix] GPT inference error when pipeline_para_size > 1 and int8_mode != 0

Pull Request - State: open - Opened by 00why00 about 1 year ago

#750 - [BugFix] GPT inference error when pipeline_para_size > 1 and int8_mode != 0

Pull Request - State: open - Opened by 00why00 about 1 year ago

#749 - Is it possible to serve GPT-NeoX ONNX exported through optimum?

Issue - State: open - Opened by sonientaegi about 1 year ago

#749 - Is it possible to serve GPT-NeoX ONNX exported through optimum?

Issue - State: open - Opened by sonientaegi about 1 year ago

#748 - [feature request] transformer on orin

Issue - State: open - Opened by superpigforever about 1 year ago

#748 - [feature request] transformer on orin

Issue - State: open - Opened by superpigforever about 1 year ago

#747 - How to run multi_gpu_gpt_examples.py without mpirun/mpiexe

Issue - State: closed - Opened by ZZWHU about 1 year ago

#746 - core dumped of swin model

Issue - State: open - Opened by chiemon about 1 year ago - 1 comment
Labels: bug

#746 - core dumped of swin model

Issue - State: open - Opened by chiemon about 1 year ago - 1 comment
Labels: bug

#744 - Failed building t5 model in FastTransformer (Reached 82% then stopped)

Issue - State: open - Opened by EmanElrefai12 about 1 year ago - 3 comments
Labels: bug

#744 - Failed building t5 model in FastTransformer (Reached 82% then stopped)

Issue - State: open - Opened by EmanElrefai12 about 1 year ago - 3 comments
Labels: bug

#736 - Using faster transformers to infer the bloom model, the accuracy rate is 0

Issue - State: open - Opened by hurun over 1 year ago - 2 comments
Labels: bug

#736 - Using faster transformers to infer the bloom model, the accuracy rate is 0

Issue - State: open - Opened by hurun over 1 year ago - 2 comments
Labels: bug

#735 - OSError: lib/libth_transformer.so: cannot open shared object file: No such file or directory

Issue - State: open - Opened by arnabmanna619 over 1 year ago - 1 comment

#735 - OSError: lib/libth_transformer.so: cannot open shared object file: No such file or directory

Issue - State: open - Opened by arnabmanna619 over 1 year ago - 1 comment

#734 - TP=2， Loss of accuracy

Issue - State: open - Opened by coderchem over 1 year ago - 2 comments

#734 - TP=2， Loss of accuracy

Issue - State: open - Opened by coderchem over 1 year ago - 2 comments

#730 - Compatibility issue with CUDA 12.2

Issue - State: open - Opened by MinghaoYan over 1 year ago - 6 comments
Labels: bug

#729 - llama support inference？

Issue - State: open - Opened by double-vin over 1 year ago - 2 comments

#729 - llama support inference？

Issue - State: open - Opened by double-vin over 1 year ago - 2 comments

#728 - [Question] Is it possible to use my own pretrained weights for ViT QAT

Issue - State: open - Opened by proevgenii over 1 year ago - 3 comments
Labels: bug

#728 - [Question] Is it possible to use my own pretrained weights for ViT QAT

Issue - State: open - Opened by proevgenii over 1 year ago - 3 comments
Labels: bug

#727 - Are MQA and GQA in development?

Issue - State: open - Opened by ljayx over 1 year ago - 8 comments

#727 - Are MQA and GQA in development?

Issue - State: open - Opened by ljayx over 1 year ago - 8 comments

#720 - docker/Dockerfile.torch occurs errors

Issue - State: closed - Opened by b3y0nd over 1 year ago - 4 comments
Labels: bug

#720 - docker/Dockerfile.torch occurs errors

Issue - State: closed - Opened by b3y0nd over 1 year ago - 4 comments
Labels: bug

GitHub / NVIDIA/FasterTransformer issues and pull requests