An open API service for providing issue and pull request metadata for open source projects.

GitHub / triton-inference-server/fastertransformer_backend issues and pull requests

#110 - docs: fix README.md

Pull Request - State: open - Opened by lkm2835 over 2 years ago

#108 - could end_to_end_test.py with model_name 'ensemble' support decoupled mode

Issue - State: open - Opened by jimmyforrest over 2 years ago - 6 comments
Labels: question

#107 - Support dockerhub push

Pull Request - State: closed - Opened by wsxiaoys over 2 years ago

#106 - Support dockerhub push

Pull Request - State: closed - Opened by wsxiaoys over 2 years ago

#102 - I don't know the cause of this error.

Issue - State: closed - Opened by amazingkmy over 2 years ago - 8 comments
Labels: bug

#101 - CUDA architecture ignored when passed to Cmake

Issue - State: open - Opened by hillct over 2 years ago - 5 comments
Labels: bug

#100 - CPU maxed out, no GPU utilization, inference never completing

Issue - State: closed - Opened by zoltan-fedor over 2 years ago - 1 comment
Labels: bug

#98 - cuda function architecture error when trying to query the triton server .

Issue - State: closed - Opened by gd1m3y over 2 years ago - 4 comments

#97 - triton server crashed after reload the same model

Issue - State: open - Opened by heiruwu over 2 years ago - 2 comments
Labels: bug

#95 - Flan-T5 quality decreases with bigger models when using fastertransformer

Issue - State: open - Opened by lakshaykc over 2 years ago - 10 comments
Labels: bug

#94 - Some docker build fixes

Pull Request - State: closed - Opened by tanmayv25 over 2 years ago - 2 comments

#93 - repo fails to build using Triton Image 23.01

Issue - State: open - Opened by Chris113113 over 2 years ago - 2 comments
Labels: bug

#91 - GPT-J streaming: getting garbage response

Issue - State: open - Opened by vax-dev over 2 years ago - 1 comment
Labels: bug

#90 - Dynamic batching is not working for gptj

Issue - State: closed - Opened by PoodleWang over 2 years ago - 2 comments
Labels: bug

#89 - fix error handling

Pull Request - State: closed - Opened by rr0gi over 2 years ago - 1 comment

#88 - Getting empty response from GPT-J Model

Issue - State: open - Opened by vax-dev over 2 years ago - 8 comments
Labels: bug

#85 - Ragged Batching on Megatron Fast Transformer Backend

Issue - State: open - Opened by mshuffett over 2 years ago - 4 comments

#84 - feat: update v1.4

Pull Request - State: closed - Opened by byshiue over 2 years ago

#83 - Create multistage build script for docker build

Pull Request - State: closed - Opened by jbkyang-nvi over 2 years ago

#82 - Supporting for Flan-t5 with gated activation and non-shared embeddings

Issue - State: closed - Opened by LydiaXiaohongLi over 2 years ago - 3 comments

#81 - T5 cross_attention output cannot be accessed

Issue - State: open - Opened by JustinAWei over 2 years ago - 1 comment
Labels: bug

#80 - Not getting response with warning "response is nullptr"

Issue - State: open - Opened by t13m over 2 years ago - 1 comment
Labels: bug

#79 - How can I get the logits of all tokens in vocab at each step?

Issue - State: open - Opened by kevinlee819 over 2 years ago - 6 comments

#78 - After triton fastertransformer backend, the inference speed is severely reduced

Issue - State: closed - Opened by PAOPAO6 over 2 years ago - 34 comments
Labels: bug

#77 - server crashs when traffic is a little bit high

Issue - State: open - Opened by rahuan over 2 years ago - 10 comments
Labels: bug

#76 - How much VRAM BLOOM consumes?

Issue - State: open - Opened by pai4451 over 2 years ago - 6 comments

#75 - feat: update v1.3 codes

Pull Request - State: closed - Opened by byshiue over 2 years ago

#73 - Config.pbtxt for all_models/t5/fastertransformer incorrect

Issue - State: open - Opened by dhaval24 over 2 years ago - 1 comment
Labels: bug

#72 - dose support have many same model instance in one GPU device?

Issue - State: closed - Opened by changleilei over 2 years ago - 5 comments

#69 - Support BLOOM model?

Issue - State: closed - Opened by pai4451 over 2 years ago - 4 comments

#68 - did fastertransformer support version nvcr.io/nvidia/tritonserver:21.07-py3

Issue - State: closed - Opened by changleilei over 2 years ago - 2 comments
Labels: bug

#67 - How to support different models with different tensor_para_size?

Issue - State: open - Opened by TopIdiot over 2 years ago - 29 comments

#64 - T5 not performing as expeceted

Issue - State: open - Opened by nrakltx over 2 years ago - 3 comments
Labels: bug

#62 - Memory usage not going up with model instances

Issue - State: open - Opened by samipdahalr almost 3 years ago - 1 comment

#61 - Can't deploy multiple version of BERT.

Issue - State: closed - Opened by ogis-uno almost 3 years ago - 10 comments
Labels: bug

#60 - Fastertransformer BERT returns wrong value in my environment.

Issue - State: closed - Opened by ogis-uno almost 3 years ago - 7 comments
Labels: bug

#59 - Can't re-load any T5 model after a first load/unload iteration

Issue - State: open - Opened by Thytu almost 3 years ago - 5 comments
Labels: bug

#58 - build: ci

Pull Request - State: closed - Opened by Thytu almost 3 years ago - 1 comment

#57 - Request to support GCS file path

Issue - State: open - Opened by aasthajh almost 3 years ago - 2 comments

#56 - docs: fix formating in README

Pull Request - State: closed - Opened by Thytu almost 3 years ago

#55 - Is there any kind of caching?

Issue - State: closed - Opened by timofeev1995 almost 3 years ago - 2 comments

#54 - GPTJ end_id usage and behavior

Issue - State: closed - Opened by timofeev1995 almost 3 years ago - 3 comments

#53 - Unexpected behavior of batched inference of GPT-J

Issue - State: closed - Opened by AlekseyKorshuk almost 3 years ago - 24 comments
Labels: bug

#52 - Can't run multi-node GPTJ inference

Issue - State: open - Opened by BDHU almost 3 years ago - 11 comments

#51 - Adding option in identity_test.py client to supported decoupled=True

Pull Request - State: closed - Opened by pcastonguay almost 3 years ago

#49 - Using GEMM files in fastertransformer_backend.

Issue - State: closed - Opened by SnoozingSimian almost 3 years ago - 3 comments

#46 - Recommendation for the complete BERT model deployment on Triton + fastertransformer backend

Issue - State: closed - Opened by vblagoje almost 3 years ago - 4 comments
Labels: bug

#45 - GPT-J Preprocessing Incorrectly Tokenizes `<|endoftext|>`

Issue - State: open - Opened by mitchellgordon95 almost 3 years ago - 8 comments
Labels: bug

#44 - Streaming throwing queue.get() error

Issue - State: open - Opened by rtalaricw almost 3 years ago - 2 comments
Labels: bug

#43 - GPT-NeoX throws Segmentation Fault (Signal 6)

Issue - State: closed - Opened by rtalaricw almost 3 years ago - 15 comments

#42 - Byshiue patch 1

Pull Request - State: closed - Opened by byshiue almost 3 years ago

#41 - Crash GPT-J if 'output0_len' is greater than 240.

Issue - State: closed - Opened by daemyung almost 3 years ago - 4 comments
Labels: bug

#40 - Crash GPT-J on mGPU

Issue - State: closed - Opened by daemyung almost 3 years ago - 10 comments
Labels: bug

#39 - Can you shader data.json to run perf_analyzer?

Issue - State: closed - Opened by daemyung almost 3 years ago - 2 comments
Labels: bug

#38 - Added fauxpilot changes

Pull Request - State: closed - Opened by lucataco almost 3 years ago

#37 - Support mt5 (t5 v1.1)?

Issue - State: closed - Opened by hong8c almost 3 years ago - 3 comments

#36 - Update CMakeLists.txt

Pull Request - State: closed - Opened by byshiue almost 3 years ago

#35 - Does FT supports serving multiple models concurrently?

Issue - State: closed - Opened by PKUFlyingPig almost 3 years ago - 1 comment

#34 - Failed to run FasterTransformer BERT Triton Backend with multiple instances.

Issue - State: closed - Opened by PKUFlyingPig almost 3 years ago - 21 comments
Labels: bug

#33 - Pipeline parallelism does not work for FasterTransformer BERT Triton Backend.

Issue - State: closed - Opened by PKUFlyingPig almost 3 years ago - 14 comments
Labels: bug

#32 - t5_guide.md shows 0 BLEU score

Issue - State: closed - Opened by hong8c almost 3 years ago - 4 comments
Labels: bug

#31 - feat: update v1.2

Pull Request - State: closed - Opened by byshiue almost 3 years ago

#30 - Spelling

Pull Request - State: closed - Opened by jsoref almost 3 years ago - 1 comment

#29 - FT backend crashes Triton server if batch size is too large

Issue - State: open - Opened by moyix almost 3 years ago
Labels: bug

#28 - FasterTransformer freezes on 4 GPUs while running GPT with NCCL_LAUNCH_MODE=GROUP

Issue - State: closed - Opened by saramcallister about 3 years ago - 8 comments
Labels: bug

#26 - Streaming for fastertransformer using GPRC

Issue - State: closed - Opened by rtalaricw about 3 years ago - 6 comments

#25 - Results output same value with zero probability in GPTJ-6B

Issue - State: closed - Opened by rtalaricw about 3 years ago - 16 comments

#24 - Segmentation fault: address not mapped to object at address (nil)

Issue - State: closed - Opened by shimoshida about 3 years ago - 8 comments

#23 - Dynamic Batching with Different Sized Context (Ragged)

Issue - State: closed - Opened by jimwu6 about 3 years ago - 4 comments

#22 - Merge v1.1 branch to main branch

Pull Request - State: closed - Opened by byshiue over 3 years ago

#21 - Allow mT5 support alongside T5

Issue - State: closed - Opened by RegaliaXYZ over 3 years ago - 3 comments

#20 - dynamic_batching with model config

Issue - State: closed - Opened by hajime9652 over 3 years ago - 2 comments

#19 - FasterTransformer might freeze after few requests

Issue - State: closed - Opened by jimwu6 over 3 years ago - 4 comments

#18 - does it also support general transformer encoders like bert?

Issue - State: closed - Opened by zhanghaoie over 3 years ago - 3 comments

#17 - Fix config.pbtxt file path in README

Pull Request - State: closed - Opened by jimwu6 over 3 years ago

#16 - Error if Triton Binary is started early

Issue - State: closed - Opened by jimwu6 over 3 years ago - 2 comments

#15 - will FT5.0 be supported ?

Issue - State: closed - Opened by 520jefferson over 3 years ago - 2 comments

#14 - Install Go 1.16 with precompiled binary

Pull Request - State: closed - Opened by jimwu6 over 3 years ago - 1 comment

#13 - update identity_test script

Pull Request - State: closed - Opened by yuanzhedong almost 4 years ago

#12 - use nvidia-smi to track mem usage

Pull Request - State: closed - Opened by yuanzhedong about 4 years ago

#11 - Refine benchmark script with mem usage

Pull Request - State: closed - Opened by yuanzhedong about 4 years ago

#10 - refine benchmark script

Pull Request - State: closed - Opened by yuanzhedong about 4 years ago

#9 - add script to benchmark latency on single node

Pull Request - State: closed - Opened by yuanzhedong about 4 years ago

#8 - add more params to identity_test.py

Pull Request - State: closed - Opened by yuanzhedong about 4 years ago

#7 - feat: Support multi-node serving

Pull Request - State: closed - Opened by byshiue about 4 years ago

#6 - V1.1 dev - Add Multi-Node Support

Pull Request - State: closed - Opened by PerkzZheng about 4 years ago

#4 - Triton backend API version issue

Pull Request - State: closed - Opened by GwangsooHong about 4 years ago - 2 comments

#3 - Triton backend API version issue

Pull Request - State: closed - Opened by GwangsooHong about 4 years ago

#2 - V1.0 dev

Pull Request - State: closed - Opened by byshiue over 4 years ago