GitHub / triton-inference-server/fastertransformer_backend issues and pull requests
#177 - Unable to Run triton inference testing on 2 COS VM nodes.
Issue -
State: open - Opened by Yangyixin27 4 months ago
#110 - docs: fix README.md
Pull Request -
State: open - Opened by lkm2835 over 2 years ago
#109 - When I compile ft_backend based on cuda10.2 report nvcc fatal : Unsupported gpu architecture 'compute_80'
Issue -
State: closed - Opened by nercoeus over 2 years ago
- 2 comments
Labels: bug
#108 - could end_to_end_test.py with model_name 'ensemble' support decoupled mode
Issue -
State: open - Opened by jimmyforrest over 2 years ago
- 6 comments
Labels: question
#107 - Support dockerhub push
Pull Request -
State: closed - Opened by wsxiaoys over 2 years ago
#106 - Support dockerhub push
Pull Request -
State: closed - Opened by wsxiaoys over 2 years ago
#103 - E0315 1107 server.cc:201] Failed to finalize CUDA memory manager: CNMEM_STATUS_CUDA_ERROR
Issue -
State: open - Opened by WangYizhang01 over 2 years ago
Labels: bug
#102 - I don't know the cause of this error.
Issue -
State: closed - Opened by amazingkmy over 2 years ago
- 8 comments
Labels: bug
#101 - CUDA architecture ignored when passed to Cmake
Issue -
State: open - Opened by hillct over 2 years ago
- 5 comments
Labels: bug
#100 - CPU maxed out, no GPU utilization, inference never completing
Issue -
State: closed - Opened by zoltan-fedor over 2 years ago
- 1 comment
Labels: bug
#99 - gptj: fix abysmally slow postprocessor performance; don't read a file for each new batch
Pull Request -
State: closed - Opened by git-bruh over 2 years ago
#98 - cuda function architecture error when trying to query the triton server .
Issue -
State: closed - Opened by gd1m3y over 2 years ago
- 4 comments
#97 - triton server crashed after reload the same model
Issue -
State: open - Opened by heiruwu over 2 years ago
- 2 comments
Labels: bug
#95 - Flan-T5 quality decreases with bigger models when using fastertransformer
Issue -
State: open - Opened by lakshaykc over 2 years ago
- 10 comments
Labels: bug
#94 - Some docker build fixes
Pull Request -
State: closed - Opened by tanmayv25 over 2 years ago
- 2 comments
#93 - repo fails to build using Triton Image 23.01
Issue -
State: open - Opened by Chris113113 over 2 years ago
- 2 comments
Labels: bug
#92 - Use the Triton client to call the interface asynchronously and call the interface concurrently, and the result is empty
Issue -
State: closed - Opened by PAOPAO6 over 2 years ago
- 2 comments
Labels: bug
#91 - GPT-J streaming: getting garbage response
Issue -
State: open - Opened by vax-dev over 2 years ago
- 1 comment
Labels: bug
#90 - Dynamic batching is not working for gptj
Issue -
State: closed - Opened by PoodleWang over 2 years ago
- 2 comments
Labels: bug
#89 - fix error handling
Pull Request -
State: closed - Opened by rr0gi over 2 years ago
- 1 comment
#88 - Getting empty response from GPT-J Model
Issue -
State: open - Opened by vax-dev over 2 years ago
- 8 comments
Labels: bug
#86 - Serving large models with FT backend keeps Triton server crashing and restarting
Issue -
State: open - Opened by RajeshThallam over 2 years ago
#85 - Ragged Batching on Megatron Fast Transformer Backend
Issue -
State: open - Opened by mshuffett over 2 years ago
- 4 comments
#84 - feat: update v1.4
Pull Request -
State: closed - Opened by byshiue over 2 years ago
#83 - Create multistage build script for docker build
Pull Request -
State: closed - Opened by jbkyang-nvi over 2 years ago
#82 - Supporting for Flan-t5 with gated activation and non-shared embeddings
Issue -
State: closed - Opened by LydiaXiaohongLi over 2 years ago
- 3 comments
#81 - T5 cross_attention output cannot be accessed
Issue -
State: open - Opened by JustinAWei over 2 years ago
- 1 comment
Labels: bug
#80 - Not getting response with warning "response is nullptr"
Issue -
State: open - Opened by t13m over 2 years ago
- 1 comment
Labels: bug
#79 - How can I get the logits of all tokens in vocab at each step?
Issue -
State: open - Opened by kevinlee819 over 2 years ago
- 6 comments
#78 - After triton fastertransformer backend, the inference speed is severely reduced
Issue -
State: closed - Opened by PAOPAO6 over 2 years ago
- 34 comments
Labels: bug
#77 - server crashs when traffic is a little bit high
Issue -
State: open - Opened by rahuan over 2 years ago
- 10 comments
Labels: bug
#76 - How much VRAM BLOOM consumes?
Issue -
State: open - Opened by pai4451 over 2 years ago
- 6 comments
#75 - feat: update v1.3 codes
Pull Request -
State: closed - Opened by byshiue over 2 years ago
#74 - [ERROR] Does not find the section encoder with name relative_attention_num_buckets_or_max_pos_seq_len
Issue -
State: closed - Opened by 520jefferson over 2 years ago
- 17 comments
Labels: bug
#73 - Config.pbtxt for all_models/t5/fastertransformer incorrect
Issue -
State: open - Opened by dhaval24 over 2 years ago
- 1 comment
Labels: bug
#72 - dose support have many same model instance in one GPU device?
Issue -
State: closed - Opened by changleilei over 2 years ago
- 5 comments
#69 - Support BLOOM model?
Issue -
State: closed - Opened by pai4451 over 2 years ago
- 4 comments
#68 - did fastertransformer support version nvcr.io/nvidia/tritonserver:21.07-py3
Issue -
State: closed - Opened by changleilei over 2 years ago
- 2 comments
Labels: bug
#67 - How to support different models with different tensor_para_size?
Issue -
State: open - Opened by TopIdiot over 2 years ago
- 29 comments
#66 - T5: Triton Model Repository (containing model weights and configuration) on S3 doesn't work as expected
Issue -
State: open - Opened by dhaval24 over 2 years ago
- 5 comments
Labels: bug
#64 - T5 not performing as expeceted
Issue -
State: open - Opened by nrakltx over 2 years ago
- 3 comments
Labels: bug
#63 - Multi-instance inference fails in (n-1)/n runs (where n is a number gpus/instances)
Issue -
State: open - Opened by timofeev1995 almost 3 years ago
- 29 comments
#62 - Memory usage not going up with model instances
Issue -
State: open - Opened by samipdahalr almost 3 years ago
- 1 comment
#61 - Can't deploy multiple version of BERT.
Issue -
State: closed - Opened by ogis-uno almost 3 years ago
- 10 comments
Labels: bug
#60 - Fastertransformer BERT returns wrong value in my environment.
Issue -
State: closed - Opened by ogis-uno almost 3 years ago
- 7 comments
Labels: bug
#59 - Can't re-load any T5 model after a first load/unload iteration
Issue -
State: open - Opened by Thytu almost 3 years ago
- 5 comments
Labels: bug
#58 - build: ci
Pull Request -
State: closed - Opened by Thytu almost 3 years ago
- 1 comment
#57 - Request to support GCS file path
Issue -
State: open - Opened by aasthajh almost 3 years ago
- 2 comments
#56 - docs: fix formating in README
Pull Request -
State: closed - Opened by Thytu almost 3 years ago
#55 - Is there any kind of caching?
Issue -
State: closed - Opened by timofeev1995 almost 3 years ago
- 2 comments
#54 - GPTJ end_id usage and behavior
Issue -
State: closed - Opened by timofeev1995 almost 3 years ago
- 3 comments
#53 - Unexpected behavior of batched inference of GPT-J
Issue -
State: closed - Opened by AlekseyKorshuk almost 3 years ago
- 24 comments
Labels: bug
#52 - Can't run multi-node GPTJ inference
Issue -
State: open - Opened by BDHU almost 3 years ago
- 11 comments
#51 - Adding option in identity_test.py client to supported decoupled=True
Pull Request -
State: closed - Opened by pcastonguay almost 3 years ago
#49 - Using GEMM files in fastertransformer_backend.
Issue -
State: closed - Opened by SnoozingSimian almost 3 years ago
- 3 comments
#46 - Recommendation for the complete BERT model deployment on Triton + fastertransformer backend
Issue -
State: closed - Opened by vblagoje almost 3 years ago
- 4 comments
Labels: bug
#45 - GPT-J Preprocessing Incorrectly Tokenizes `<|endoftext|>`
Issue -
State: open - Opened by mitchellgordon95 almost 3 years ago
- 8 comments
Labels: bug
#44 - Streaming throwing queue.get() error
Issue -
State: open - Opened by rtalaricw almost 3 years ago
- 2 comments
Labels: bug
#43 - GPT-NeoX throws Segmentation Fault (Signal 6)
Issue -
State: closed - Opened by rtalaricw almost 3 years ago
- 15 comments
#42 - Byshiue patch 1
Pull Request -
State: closed - Opened by byshiue almost 3 years ago
#41 - Crash GPT-J if 'output0_len' is greater than 240.
Issue -
State: closed - Opened by daemyung almost 3 years ago
- 4 comments
Labels: bug
#40 - Crash GPT-J on mGPU
Issue -
State: closed - Opened by daemyung almost 3 years ago
- 10 comments
Labels: bug
#39 - Can you shader data.json to run perf_analyzer?
Issue -
State: closed - Opened by daemyung almost 3 years ago
- 2 comments
Labels: bug
#38 - Added fauxpilot changes
Pull Request -
State: closed - Opened by lucataco almost 3 years ago
#37 - Support mt5 (t5 v1.1)?
Issue -
State: closed - Opened by hong8c almost 3 years ago
- 3 comments
#36 - Update CMakeLists.txt
Pull Request -
State: closed - Opened by byshiue almost 3 years ago
#35 - Does FT supports serving multiple models concurrently?
Issue -
State: closed - Opened by PKUFlyingPig almost 3 years ago
- 1 comment
#34 - Failed to run FasterTransformer BERT Triton Backend with multiple instances.
Issue -
State: closed - Opened by PKUFlyingPig almost 3 years ago
- 21 comments
Labels: bug
#33 - Pipeline parallelism does not work for FasterTransformer BERT Triton Backend.
Issue -
State: closed - Opened by PKUFlyingPig almost 3 years ago
- 14 comments
Labels: bug
#32 - t5_guide.md shows 0 BLEU score
Issue -
State: closed - Opened by hong8c almost 3 years ago
- 4 comments
Labels: bug
#31 - feat: update v1.2
Pull Request -
State: closed - Opened by byshiue almost 3 years ago
#30 - Spelling
Pull Request -
State: closed - Opened by jsoref almost 3 years ago
- 1 comment
#29 - FT backend crashes Triton server if batch size is too large
Issue -
State: open - Opened by moyix almost 3 years ago
Labels: bug
#28 - FasterTransformer freezes on 4 GPUs while running GPT with NCCL_LAUNCH_MODE=GROUP
Issue -
State: closed - Opened by saramcallister about 3 years ago
- 8 comments
Labels: bug
#27 - FasterTransformer freezes on 4 GPUs while running GPT with NCCL_LAUNCH_MODE=GROUP
Issue -
State: closed - Opened by saramcallister about 3 years ago
- 2 comments
#26 - Streaming for fastertransformer using GPRC
Issue -
State: closed - Opened by rtalaricw about 3 years ago
- 6 comments
#25 - Results output same value with zero probability in GPTJ-6B
Issue -
State: closed - Opened by rtalaricw about 3 years ago
- 16 comments
#24 - Segmentation fault: address not mapped to object at address (nil)
Issue -
State: closed - Opened by shimoshida about 3 years ago
- 8 comments
#23 - Dynamic Batching with Different Sized Context (Ragged)
Issue -
State: closed - Opened by jimwu6 about 3 years ago
- 4 comments
#22 - Merge v1.1 branch to main branch
Pull Request -
State: closed - Opened by byshiue over 3 years ago
#21 - Allow mT5 support alongside T5
Issue -
State: closed - Opened by RegaliaXYZ over 3 years ago
- 3 comments
#20 - dynamic_batching with model config
Issue -
State: closed - Opened by hajime9652 over 3 years ago
- 2 comments
#19 - FasterTransformer might freeze after few requests
Issue -
State: closed - Opened by jimwu6 over 3 years ago
- 4 comments
#18 - does it also support general transformer encoders like bert?
Issue -
State: closed - Opened by zhanghaoie over 3 years ago
- 3 comments
#17 - Fix config.pbtxt file path in README
Pull Request -
State: closed - Opened by jimwu6 over 3 years ago
#16 - Error if Triton Binary is started early
Issue -
State: closed - Opened by jimwu6 over 3 years ago
- 2 comments
#15 - will FT5.0 be supported ?
Issue -
State: closed - Opened by 520jefferson over 3 years ago
- 2 comments
#14 - Install Go 1.16 with precompiled binary
Pull Request -
State: closed - Opened by jimwu6 over 3 years ago
- 1 comment
#13 - update identity_test script
Pull Request -
State: closed - Opened by yuanzhedong almost 4 years ago
#12 - use nvidia-smi to track mem usage
Pull Request -
State: closed - Opened by yuanzhedong about 4 years ago
#11 - Refine benchmark script with mem usage
Pull Request -
State: closed - Opened by yuanzhedong about 4 years ago
#10 - refine benchmark script
Pull Request -
State: closed - Opened by yuanzhedong about 4 years ago
#9 - add script to benchmark latency on single node
Pull Request -
State: closed - Opened by yuanzhedong about 4 years ago
#8 - add more params to identity_test.py
Pull Request -
State: closed - Opened by yuanzhedong about 4 years ago
#7 - feat: Support multi-node serving
Pull Request -
State: closed - Opened by byshiue about 4 years ago
#6 - V1.1 dev - Add Multi-Node Support
Pull Request -
State: closed - Opened by PerkzZheng about 4 years ago
#5 - Fix backend naming to use root 'fastertransformer' instead of 'transformer'
Pull Request -
State: closed - Opened by deadeyegoodwin about 4 years ago
#4 - Triton backend API version issue
Pull Request -
State: closed - Opened by GwangsooHong about 4 years ago
- 2 comments
#3 - Triton backend API version issue
Pull Request -
State: closed - Opened by GwangsooHong about 4 years ago
#2 - V1.0 dev
Pull Request -
State: closed - Opened by byshiue over 4 years ago