Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / NVIDIA/TransformerEngine issues and pull requests
#1151 - question about Model FLOPs Utilization
Issue -
State: closed - Opened by jinz2014 2 months ago
- 4 comments
Labels: question
#1150 - layer normalization after Linear
Issue -
State: closed - Opened by ftgreat 3 months ago
- 2 comments
Labels: question
#1149 - [PyTorch/C] Exposed Userbuffers configuration option to control comm and compute stream priorities
Pull Request -
State: open - Opened by denera 3 months ago
Labels: enhancement
#1148 - Improvements for building wheels
Pull Request -
State: closed - Opened by ksivaman 3 months ago
- 4 comments
Labels: build, 1.10.0
#1147 - Unable to import transformer_engine.pytorch using TE v1.9.0
Issue -
State: closed - Opened by snarayan21 3 months ago
- 1 comment
#1146 - [PyTorch] Add contiguous check for `te_grouped_gemm`
Pull Request -
State: closed - Opened by BeingGod 3 months ago
- 2 comments
#1145 - [PyTorch] Remove `dtype` from args of permutation
Pull Request -
State: closed - Opened by yaox12 3 months ago
- 2 comments
#1144 - Dose the FA3 commit of TE support bf16 or mixed precision?
Issue -
State: open - Opened by Desperadoze 3 months ago
#1143 - [PyTorch] Avoid saving fp8_tensors in certain scenarios
Pull Request -
State: open - Opened by cyanguwa 3 months ago
#1142 - [PyTorch] Userbuffers support in operation-based API
Pull Request -
State: open - Opened by timmoon10 3 months ago
- 4 comments
#1141 - [PyTorch] Fix FP8 logic related to FA2/FA3
Pull Request -
State: closed - Opened by cyanguwa 3 months ago
- 6 comments
Labels: 1.11
#1140 - Norms Refractor
Pull Request -
State: open - Opened by phu0ngng 3 months ago
- 1 comment
#1139 - Don't save fp8 q/k/v/out tensors when using bf16 bprop
Pull Request -
State: open - Opened by guyueh1 3 months ago
- 1 comment
#1138 - Fix param input order for cudagraph
Pull Request -
State: open - Opened by yifeis-nv 3 months ago
- 2 comments
Labels: bug
#1137 - [PyTorch] Remove some direct calls to PyTorch extensions in `Float8Tensor`
Pull Request -
State: closed - Opened by timmoon10 3 months ago
- 2 comments
#1136 - Hide non-necessary symbols from shared object
Pull Request -
State: closed - Opened by ksivaman 3 months ago
- 2 comments
Labels: bug, build, 1.10.0
#1135 - fp8_model_init doesn't work with DDP
Issue -
State: open - Opened by MaciejBalaNV 3 months ago
- 3 comments
#1134 - Fix QKV dtype in the bwd of FP8+CP
Pull Request -
State: closed - Opened by xrennvidia 3 months ago
- 5 comments
Labels: 1.10.0
#1133 - Bump cudnn-frontend version to 1.6.1
Pull Request -
State: closed - Opened by ksivaman 3 months ago
#1132 - RMSNorm precision different from HF implementation
Issue -
State: open - Opened by void-main 3 months ago
- 5 comments
#1131 - Added offloading support FP8 attention
Pull Request -
State: closed - Opened by sanandaraj5597 3 months ago
- 2 comments
#1130 - don't put master_param to state if None
Pull Request -
State: closed - Opened by akoumpa 3 months ago
- 3 comments
#1129 - [PyTorch] Implement Fp8 padding and unpadding module
Pull Request -
State: closed - Opened by BeingGod 3 months ago
- 5 comments
#1128 - [PyTorch] Propagate fp8 scale-inverse modification to `GroupedLinear`
Pull Request -
State: closed - Opened by yaox12 3 months ago
- 8 comments
#1127 - [PyTorch] Proxy class for low-precision tensor
Pull Request -
State: closed - Opened by timmoon10 3 months ago
- 5 comments
#1126 - Let user limit number of architectures, to improve build time
Pull Request -
State: closed - Opened by hXl3s 3 months ago
- 1 comment
#1125 - Transformer Engine using FlashAttention V3
Issue -
State: open - Opened by heavyrain-lzy 3 months ago
- 1 comment
#1124 - Re-add framework specific required dependencies for source build
Pull Request -
State: closed - Opened by ksivaman 3 months ago
Labels: bug, build, 1.10.0
#1123 - how to use TransformerEngine without flash attention
Issue -
State: closed - Opened by ben-8878 3 months ago
- 4 comments
#1121 - Add high_precision_init_val to model params when using fp8_model_init
Pull Request -
State: open - Opened by kunlunl 3 months ago
- 8 comments
#1120 - [PyTorch] make GroupedLinear inp support collection of torch.Tensor
Pull Request -
State: closed - Opened by BeingGod 3 months ago
- 7 comments
#1119 - TransformerEngine FP8 is slower & more memory intensive than FlashAttention FP16?
Issue -
State: closed - Opened by darius-lam 3 months ago
- 4 comments
#1117 - [PyTorch] Debug CUDA graph support with operation-based API
Pull Request -
State: open - Opened by timmoon10 3 months ago
- 5 comments
Labels: bug
#1116 - How to debug CUDNN_STATUS_EXECUTION_FAILED?
Issue -
State: open - Opened by vedantroy 3 months ago
- 7 comments
#1114 - Add FP8 support to CP implementation with KV P2P
Pull Request -
State: closed - Opened by xrennvidia 3 months ago
- 5 comments
Labels: 1.10.0
#1108 - Update cudnn-frontend to v1.6.1
Pull Request -
State: closed - Opened by cyanguwa 3 months ago
- 4 comments
#1107 - Jax example cleanup and replace pjit with jit.
Pull Request -
State: closed - Opened by nouiz 3 months ago
- 4 comments
#1106 - [JAX] Context Parallel Attention with All-Gather
Pull Request -
State: closed - Opened by mgoldfarb-nvidia 3 months ago
- 9 comments
#1100 - [PyTorch] FP8 MHA with RoPE and Miscellaneous Improvements
Pull Request -
State: closed - Opened by yaox12 3 months ago
- 13 comments
#1086 - [ERROR] in the last step of `pip install . `
Issue -
State: closed - Opened by wplf 3 months ago
- 5 comments
#1083 - Update FP8 scale-inverse in kernels with FP8 output
Pull Request -
State: closed - Opened by timmoon10 3 months ago
- 6 comments
Labels: performance
#1077 - stuck at building wheel
Issue -
State: closed - Opened by neurosynapse 3 months ago
- 4 comments
#1073 - [PyTorch] Add support for padding mask in `UnfusedDotProductAttention`
Pull Request -
State: closed - Opened by cyanguwa 3 months ago
- 7 comments
Labels: 1.10.0
#1071 - When will comm-gemm-overlap support multi nodes?
Issue -
State: open - Opened by umiswing 3 months ago
- 6 comments
#1070 - AttnFuncWithCP with seq_len==1 breaks
Issue -
State: closed - Opened by MaciejBalaNV 3 months ago
- 4 comments
#1067 - [C/PyTorch] Userbuffers and comm+GEMM overlap algorithms refactored and moved to TE/common
Pull Request -
State: open - Opened by denera 3 months ago
Labels: enhancement
#1063 - [PyTorch] Debug checkpointing with operation-based API
Pull Request -
State: open - Opened by timmoon10 4 months ago
- 3 comments
Labels: bug
#1043 - Error pre-training BERT
Issue -
State: open - Opened by fabiancpl 4 months ago
- 1 comment
#1033 - [PyTorch] Normalization ops
Pull Request -
State: open - Opened by timmoon10 4 months ago
- 11 comments
Labels: enhancement
#1019 - Add support for flash-attn 3
Pull Request -
State: closed - Opened by cyanguwa 4 months ago
- 5 comments
#1014 - AttributeError: module 'transformer_engine' has no attribute 'pytorch'
Issue -
State: open - Opened by Lzhang-hub 4 months ago
- 4 comments
#1011 - Could not work , even use the official script
Issue -
State: open - Opened by hellangleZ 4 months ago
- 5 comments
#978 - Building wheel error during installation
Issue -
State: closed - Opened by Drzhishi 4 months ago
- 3 comments
Labels: bug, build
#972 - no boost in performance with Ada GPU
Issue -
State: open - Opened by saurabh-kataria 5 months ago
- 1 comment
Labels: performance
#965 - How to cast 16/32-bit to FP8?
Issue -
State: closed - Opened by mxjmtxrm 5 months ago
- 3 comments
Labels: question
#946 - [TE/JAX] Prototype for New XLA Custom Calls with FFI
Pull Request -
State: closed - Opened by phu0ngng 5 months ago
- 2 comments
Labels: enhancement, jax
#944 - Expose `rotary_base` as an arg instead of hardcoding
Pull Request -
State: closed - Opened by sudhakarsingh27 5 months ago
- 1 comment
#936 - [MoE][Common/PyTorch] Add permutation
Pull Request -
State: closed - Opened by StudyingShao 5 months ago
- 5 comments
Labels: enhancement
#930 - How to install with CuDNN 9.0+ ?
Issue -
State: closed - Opened by tianyan01 5 months ago
- 3 comments
#922 - How to use FP8 of TransformerEngine in inference
Issue -
State: open - Opened by Godlovecui 5 months ago
- 3 comments
#885 - [PyTorch] Add support for cuDNN FusedAttention + THD + CP
Pull Request -
State: closed - Opened by xrennvidia 5 months ago
- 19 comments
#856 - Cannot import and use transformer_engine after successful installation with No module named 'transformer_engine_extensions'
Issue -
State: closed - Opened by sam-h-bean 6 months ago
- 4 comments
Labels: bug, build
#762 - Could TransformerEngine work with Deepspeed Zero w/ offloading?
Issue -
State: open - Opened by leiwen83 7 months ago
- 1 comment
Labels: question
#700 - ERROR: Failed building wheel for transformer-engine
Issue -
State: closed - Opened by ShabnamRA 8 months ago
- 7 comments
Labels: build
#694 - main branch cannot compile due to incompatibility with the main branch of cudnn-frontend
Issue -
State: closed - Opened by lucifer1004 9 months ago
- 2 comments
Labels: build
#689 - Version constraint of `flash-attn` needs to be updated
Issue -
State: closed - Opened by lucifer1004 9 months ago
- 3 comments
#679 - [Feature Request] Grouped GEMM kernel
Issue -
State: open - Opened by LiyuanLucasLiu 9 months ago
- 1 comment
Labels: enhancement
#553 - installing error
Issue -
State: closed - Opened by foreverpiano 11 months ago
- 1 comment
#526 - Failed Installation
Issue -
State: closed - Opened by sudy-super 12 months ago
- 1 comment
#517 - [Common][PyTorch] Fused `apply_rotorary_pos_emb`
Pull Request -
State: closed - Opened by yaox12 almost 1 year ago
- 10 comments
#516 - question for building wheel for transformer-engine
Issue -
State: open - Opened by Mrzhang-dada about 1 year ago
- 6 comments
#459 - Failed building wheel for transformer-engine
Issue -
State: closed - Opened by RuslanSel about 1 year ago
- 1 comment
#359 - Optimize flash-attention transposes
Pull Request -
State: closed - Opened by ksivaman over 1 year ago
- 1 comment
#355 - Installation failed with cmake error
Issue -
State: closed - Opened by RuiWang1998 over 1 year ago
- 23 comments
#100 - Update PyTorch comm API
Pull Request -
State: closed - Opened by ksivaman over 1 year ago
- 1 comment
#99 - Fix FlashAttention tests
Pull Request -
State: closed - Opened by tcherckez-nvidia over 1 year ago
- 12 comments
#98 - Adding JAX to README.rst
Pull Request -
State: closed - Opened by mingxu1067 over 1 year ago
- 2 comments
#97 - Catch FP8 modulo16 error before cublas and fp8 kernels
Pull Request -
State: closed - Opened by ksivaman over 1 year ago
- 1 comment
#96 - [WIP] add cuDNN Flash Attention for FP8
Pull Request -
State: closed - Opened by cyanguwa over 1 year ago
#95 - Add a temporary workaround to layernorm ONNX export
Pull Request -
State: closed - Opened by nzmora-nvidia over 1 year ago
- 6 comments
#94 - Add an option to serialize test i/o to file
Pull Request -
State: closed - Opened by nzmora-nvidia over 1 year ago
- 1 comment
#93 - Raise autocast usage error
Pull Request -
State: closed - Opened by ksivaman over 1 year ago
- 4 comments
#92 - Move from Sphinx Autodoc to sphinx-autoapi
Pull Request -
State: closed - Opened by ptrendx over 1 year ago
- 1 comment
#91 - Fix the link to the documentation archives
Pull Request -
State: closed - Opened by ptrendx over 1 year ago
- 1 comment
#90 - deprecate qk layer scaling and fp32 softmax args
Pull Request -
State: closed - Opened by ksivaman over 1 year ago
- 2 comments
#89 - Adding slice to fix failure with multi-devices.
Pull Request -
State: closed - Opened by mingxu1067 over 1 year ago
- 1 comment
#88 - Exporting MajorShardingType, ShardingType and LayerNorm for TE/JAX.
Pull Request -
State: closed - Opened by mingxu1067 over 1 year ago
- 1 comment
#87 - Adding documents to TE/JAX
Pull Request -
State: closed - Opened by mingxu1067 over 1 year ago
- 10 comments
#86 - Separate linting passes for PyTorch and JAX
Pull Request -
State: closed - Opened by timmoon10 over 1 year ago
- 2 comments
Labels: enhancement
#85 - Add TensorFlow module and extensions
Pull Request -
State: closed - Opened by trevor-m over 1 year ago
- 7 comments
#84 - Fix flash attention
Pull Request -
State: closed - Opened by ksivaman over 1 year ago
- 5 comments
#83 - Fix unfused QKV params case; stack vs interleave option
Pull Request -
State: closed - Opened by ksivaman over 1 year ago
- 2 comments
#82 - 3rd party acknowledgements
Pull Request -
State: closed - Opened by ksivaman over 1 year ago
#81 - fix bug in non-FP8 nvfuser path
Pull Request -
State: closed - Opened by ksivaman over 1 year ago
- 1 comment
#80 - Relax checks for flash-attn
Pull Request -
State: closed - Opened by cyanguwa over 1 year ago
- 4 comments
#79 - Remove redundant AR for SP case
Pull Request -
State: closed - Opened by ksivaman over 1 year ago
- 4 comments
#78 - Move TE/PyTorch UT to tests/pytorch/
Pull Request -
State: closed - Opened by jeng1220 over 1 year ago
- 5 comments
#77 - Change version to 0.7.0dev
Pull Request -
State: closed - Opened by ksivaman over 1 year ago
#76 - Add an option to serialize test i/o to file
Pull Request -
State: closed - Opened by nzmora-nvidia over 1 year ago
- 4 comments
#75 - Support arbitrary output dtypes in PyT GEMM functions
Pull Request -
State: closed - Opened by timmoon10 over 1 year ago
- 3 comments
Labels: enhancement