triton-inference-server/fastertransformer_backend issues and pull requests

#177 - Unable to Run triton inference testing on 2 COS VM nodes.

Issue - State: open - Opened by Yangyixin27 4 months ago

#110 - docs: fix README.md

Pull Request - State: open - Opened by lkm2835 over 2 years ago

#109 - When I compile ft_backend based on cuda10.2 report nvcc fatal : Unsupported gpu architecture 'compute_80'

Issue - State: closed - Opened by nercoeus over 2 years ago - 2 comments
Labels: bug

#108 - could end_to_end_test.py with model_name 'ensemble' support decoupled mode

Issue - State: open - Opened by jimmyforrest over 2 years ago - 6 comments
Labels: question

#107 - Support dockerhub push

Pull Request - State: closed - Opened by wsxiaoys over 2 years ago

#106 - Support dockerhub push

Pull Request - State: closed - Opened by wsxiaoys over 2 years ago

#103 - E0315 1107 server.cc:201] Failed to finalize CUDA memory manager: CNMEM_STATUS_CUDA_ERROR

Issue - State: open - Opened by WangYizhang01 over 2 years ago
Labels: bug

#102 - I don't know the cause of this error.

Issue - State: closed - Opened by amazingkmy over 2 years ago - 8 comments
Labels: bug

#101 - CUDA architecture ignored when passed to Cmake

Issue - State: open - Opened by hillct over 2 years ago - 5 comments
Labels: bug

#100 - CPU maxed out, no GPU utilization, inference never completing

Issue - State: closed - Opened by zoltan-fedor over 2 years ago - 1 comment
Labels: bug

#99 - gptj: fix abysmally slow postprocessor performance; don't read a file for each new batch

Pull Request - State: closed - Opened by git-bruh over 2 years ago

#98 - cuda function architecture error when trying to query the triton server .

Issue - State: closed - Opened by gd1m3y over 2 years ago - 4 comments

#97 - triton server crashed after reload the same model

Issue - State: open - Opened by heiruwu over 2 years ago - 2 comments
Labels: bug

#95 - Flan-T5 quality decreases with bigger models when using fastertransformer

Issue - State: open - Opened by lakshaykc over 2 years ago - 10 comments
Labels: bug

#94 - Some docker build fixes

Pull Request - State: closed - Opened by tanmayv25 over 2 years ago - 2 comments

#93 - repo fails to build using Triton Image 23.01

Issue - State: open - Opened by Chris113113 over 2 years ago - 2 comments
Labels: bug

#92 - Use the Triton client to call the interface asynchronously and call the interface concurrently, and the result is empty

Issue - State: closed - Opened by PAOPAO6 over 2 years ago - 2 comments
Labels: bug

#91 - GPT-J streaming: getting garbage response

Issue - State: open - Opened by vax-dev over 2 years ago - 1 comment
Labels: bug

#90 - Dynamic batching is not working for gptj

Issue - State: closed - Opened by PoodleWang over 2 years ago - 2 comments
Labels: bug

#89 - fix error handling

Pull Request - State: closed - Opened by rr0gi over 2 years ago - 1 comment

#88 - Getting empty response from GPT-J Model

Issue - State: open - Opened by vax-dev over 2 years ago - 8 comments
Labels: bug

#86 - Serving large models with FT backend keeps Triton server crashing and restarting

Issue - State: open - Opened by RajeshThallam over 2 years ago

#85 - Ragged Batching on Megatron Fast Transformer Backend

Issue - State: open - Opened by mshuffett over 2 years ago - 4 comments

#84 - feat: update v1.4

Pull Request - State: closed - Opened by byshiue over 2 years ago

#83 - Create multistage build script for docker build

Pull Request - State: closed - Opened by jbkyang-nvi over 2 years ago

#82 - Supporting for Flan-t5 with gated activation and non-shared embeddings

Issue - State: closed - Opened by LydiaXiaohongLi over 2 years ago - 3 comments

#81 - T5 cross_attention output cannot be accessed

Issue - State: open - Opened by JustinAWei over 2 years ago - 1 comment
Labels: bug

#80 - Not getting response with warning "response is nullptr"

Issue - State: open - Opened by t13m over 2 years ago - 1 comment
Labels: bug

#79 - How can I get the logits of all tokens in vocab at each step?

Issue - State: open - Opened by kevinlee819 over 2 years ago - 6 comments

#78 - After triton fastertransformer backend, the inference speed is severely reduced

Issue - State: closed - Opened by PAOPAO6 over 2 years ago - 34 comments
Labels: bug

#77 - server crashs when traffic is a little bit high

Issue - State: open - Opened by rahuan over 2 years ago - 10 comments
Labels: bug

#76 - How much VRAM BLOOM consumes?

Issue - State: open - Opened by pai4451 over 2 years ago - 6 comments

#75 - feat: update v1.3 codes

Pull Request - State: closed - Opened by byshiue over 2 years ago

#74 - [ERROR] Does not find the section encoder with name relative_attention_num_buckets_or_max_pos_seq_len

Issue - State: closed - Opened by 520jefferson over 2 years ago - 17 comments
Labels: bug

#73 - Config.pbtxt for all_models/t5/fastertransformer incorrect

Issue - State: open - Opened by dhaval24 over 2 years ago - 1 comment
Labels: bug

#72 - dose support have many same model instance in one GPU device?

Issue - State: closed - Opened by changleilei over 2 years ago - 5 comments

#69 - Support BLOOM model?

Issue - State: closed - Opened by pai4451 over 2 years ago - 4 comments

#68 - did fastertransformer support version nvcr.io/nvidia/tritonserver:21.07-py3

Issue - State: closed - Opened by changleilei over 2 years ago - 2 comments
Labels: bug

#67 - How to support different models with different tensor_para_size?

Issue - State: open - Opened by TopIdiot over 2 years ago - 29 comments

#66 - T5: Triton Model Repository (containing model weights and configuration) on S3 doesn't work as expected

Issue - State: open - Opened by dhaval24 over 2 years ago - 5 comments
Labels: bug

#64 - T5 not performing as expeceted

Issue - State: open - Opened by nrakltx over 2 years ago - 3 comments
Labels: bug

#63 - Multi-instance inference fails in (n-1)/n runs (where n is a number gpus/instances)

Issue - State: open - Opened by timofeev1995 almost 3 years ago - 29 comments

#62 - Memory usage not going up with model instances

Issue - State: open - Opened by samipdahalr almost 3 years ago - 1 comment

#61 - Can't deploy multiple version of BERT.

Issue - State: closed - Opened by ogis-uno almost 3 years ago - 10 comments
Labels: bug

#60 - Fastertransformer BERT returns wrong value in my environment.

Issue - State: closed - Opened by ogis-uno almost 3 years ago - 7 comments
Labels: bug

#59 - Can't re-load any T5 model after a first load/unload iteration

Issue - State: open - Opened by Thytu almost 3 years ago - 5 comments
Labels: bug

#58 - build: ci

Pull Request - State: closed - Opened by Thytu almost 3 years ago - 1 comment

#57 - Request to support GCS file path

Issue - State: open - Opened by aasthajh almost 3 years ago - 2 comments

#56 - docs: fix formating in README

Pull Request - State: closed - Opened by Thytu almost 3 years ago

#55 - Is there any kind of caching?

Issue - State: closed - Opened by timofeev1995 almost 3 years ago - 2 comments

#54 - GPTJ end_id usage and behavior

Issue - State: closed - Opened by timofeev1995 almost 3 years ago - 3 comments

#53 - Unexpected behavior of batched inference of GPT-J

Issue - State: closed - Opened by AlekseyKorshuk almost 3 years ago - 24 comments
Labels: bug

#52 - Can't run multi-node GPTJ inference

Issue - State: open - Opened by BDHU almost 3 years ago - 11 comments

#51 - Adding option in identity_test.py client to supported decoupled=True

Pull Request - State: closed - Opened by pcastonguay almost 3 years ago

#49 - Using GEMM files in fastertransformer_backend.

Issue - State: closed - Opened by SnoozingSimian almost 3 years ago - 3 comments

#46 - Recommendation for the complete BERT model deployment on Triton + fastertransformer backend

Issue - State: closed - Opened by vblagoje almost 3 years ago - 4 comments
Labels: bug

#45 - GPT-J Preprocessing Incorrectly Tokenizes `<|endoftext|>`

Issue - State: open - Opened by mitchellgordon95 almost 3 years ago - 8 comments
Labels: bug

#44 - Streaming throwing queue.get() error

Issue - State: open - Opened by rtalaricw almost 3 years ago - 2 comments
Labels: bug

#43 - GPT-NeoX throws Segmentation Fault (Signal 6)

Issue - State: closed - Opened by rtalaricw almost 3 years ago - 15 comments

#42 - Byshiue patch 1

Pull Request - State: closed - Opened by byshiue almost 3 years ago

#41 - Crash GPT-J if 'output0_len' is greater than 240.

Issue - State: closed - Opened by daemyung almost 3 years ago - 4 comments
Labels: bug

#40 - Crash GPT-J on mGPU

Issue - State: closed - Opened by daemyung almost 3 years ago - 10 comments
Labels: bug

#39 - Can you shader data.json to run perf_analyzer?

Issue - State: closed - Opened by daemyung almost 3 years ago - 2 comments
Labels: bug

#38 - Added fauxpilot changes

Pull Request - State: closed - Opened by lucataco almost 3 years ago

#37 - Support mt5 (t5 v1.1)?

Issue - State: closed - Opened by hong8c almost 3 years ago - 3 comments

#36 - Update CMakeLists.txt

Pull Request - State: closed - Opened by byshiue almost 3 years ago

#35 - Does FT supports serving multiple models concurrently?

Issue - State: closed - Opened by PKUFlyingPig almost 3 years ago - 1 comment

#34 - Failed to run FasterTransformer BERT Triton Backend with multiple instances.

Issue - State: closed - Opened by PKUFlyingPig almost 3 years ago - 21 comments
Labels: bug

#33 - Pipeline parallelism does not work for FasterTransformer BERT Triton Backend.

Issue - State: closed - Opened by PKUFlyingPig almost 3 years ago - 14 comments
Labels: bug

#32 - t5_guide.md shows 0 BLEU score

Issue - State: closed - Opened by hong8c almost 3 years ago - 4 comments
Labels: bug

#31 - feat: update v1.2

Pull Request - State: closed - Opened by byshiue almost 3 years ago

#30 - Spelling

Pull Request - State: closed - Opened by jsoref almost 3 years ago - 1 comment

#29 - FT backend crashes Triton server if batch size is too large

Issue - State: open - Opened by moyix almost 3 years ago
Labels: bug

#28 - FasterTransformer freezes on 4 GPUs while running GPT with NCCL_LAUNCH_MODE=GROUP

Issue - State: closed - Opened by saramcallister about 3 years ago - 8 comments
Labels: bug

#27 - FasterTransformer freezes on 4 GPUs while running GPT with NCCL_LAUNCH_MODE=GROUP

Issue - State: closed - Opened by saramcallister about 3 years ago - 2 comments

#26 - Streaming for fastertransformer using GPRC

Issue - State: closed - Opened by rtalaricw about 3 years ago - 6 comments

#25 - Results output same value with zero probability in GPTJ-6B

Issue - State: closed - Opened by rtalaricw about 3 years ago - 16 comments

#24 - Segmentation fault: address not mapped to object at address (nil)

Issue - State: closed - Opened by shimoshida about 3 years ago - 8 comments

#23 - Dynamic Batching with Different Sized Context (Ragged)

Issue - State: closed - Opened by jimwu6 about 3 years ago - 4 comments

#22 - Merge v1.1 branch to main branch

Pull Request - State: closed - Opened by byshiue over 3 years ago

#21 - Allow mT5 support alongside T5

Issue - State: closed - Opened by RegaliaXYZ over 3 years ago - 3 comments

#20 - dynamic_batching with model config

Issue - State: closed - Opened by hajime9652 over 3 years ago - 2 comments

#19 - FasterTransformer might freeze after few requests

Issue - State: closed - Opened by jimwu6 over 3 years ago - 4 comments

#18 - does it also support general transformer encoders like bert?

Issue - State: closed - Opened by zhanghaoie over 3 years ago - 3 comments

#17 - Fix config.pbtxt file path in README

Pull Request - State: closed - Opened by jimwu6 over 3 years ago

#16 - Error if Triton Binary is started early

Issue - State: closed - Opened by jimwu6 over 3 years ago - 2 comments

#15 - will FT5.0 be supported ?

Issue - State: closed - Opened by 520jefferson over 3 years ago - 2 comments

#14 - Install Go 1.16 with precompiled binary

Pull Request - State: closed - Opened by jimwu6 over 3 years ago - 1 comment