GitHub / NVIDIA/TransformerEngine issues and pull requests
#1721 - [JAX] GroupedDense v.2 without dynamic shape
Pull Request -
State: open - Opened by phu0ngng 5 months ago
#1721 - [JAX] GroupedDense v.2 without dynamic shape
Pull Request -
State: open - Opened by phu0ngng 5 months ago
#1720 - Update recommended docker container in README
Pull Request -
State: closed - Opened by ksivaman 5 months ago
- 1 comment
#1720 - Update recommended docker container in README
Pull Request -
State: closed - Opened by ksivaman 5 months ago
- 1 comment
#1719 - [PyTorch] Update FSDP example instructions
Pull Request -
State: closed - Opened by ksivaman 5 months ago
#1718 - 🐛 Bug Report: Build Fails on `transformer_engine_torch` with CUDA 12.4 and Conda Modular CUDA
Issue -
State: open - Opened by ghoshsoumyajit7 5 months ago
Labels: bug
#1717 - Support `nvidia-cu*` wheels for core lib compilation; miscellaneous build improvements
Pull Request -
State: closed - Opened by ksivaman 5 months ago
- 3 comments
#1717 - Support `nvidia-cu*` wheels for core lib compilation; miscellaneous build improvements
Pull Request -
State: closed - Opened by ksivaman 5 months ago
- 3 comments
#1716 - Introduce nvte_memset to provide a fill kernel that is faster than cudaMemsetAsync for small sizes
Pull Request -
State: closed - Opened by jberchtold-nvidia 5 months ago
- 1 comment
#1716 - Introduce nvte_memset to provide a fill kernel that is faster than cudaMemsetAsync for small sizes
Pull Request -
State: closed - Opened by jberchtold-nvidia 5 months ago
- 1 comment
#1715 - [PyTorch] Fix cuBLAS workspace leak in applications that initialize+destroy Userbuffers more than once
Pull Request -
State: closed - Opened by denera 5 months ago
- 3 comments
Labels: bug, 2.3.0
#1715 - [PyTorch] Fix cuBLAS workspace leak in applications that initialize+destroy Userbuffers more than once
Pull Request -
State: closed - Opened by denera 5 months ago
- 3 comments
Labels: bug, 2.3.0
#1714 - Add user to TE CI
Pull Request -
State: closed - Opened by ksivaman 5 months ago
#1713 - [PyTorch] Draft of weight offloading + fused wgrad accumulation
Pull Request -
State: closed - Opened by pggPL 5 months ago
#1712 - Warn when using fp8 weights + non-fp8 computation
Pull Request -
State: closed - Opened by kunlunl 5 months ago
- 2 comments
#1712 - Warn when using fp8 weights + non-fp8 computation
Pull Request -
State: closed - Opened by kunlunl 5 months ago
- 2 comments
#1711 - MXFP8 support in Userbuffers
Pull Request -
State: closed - Opened by timmoon10 5 months ago
- 10 comments
Labels: enhancement
#1711 - MXFP8 support in Userbuffers
Pull Request -
State: closed - Opened by timmoon10 5 months ago
- 10 comments
Labels: enhancement
#1710 - Check CUDA driver KMD version for multicast symbol support
Pull Request -
State: closed - Opened by nvcastet 5 months ago
- 1 comment
#1710 - Check CUDA driver KMD version for multicast symbol support
Pull Request -
State: closed - Opened by nvcastet 5 months ago
- 1 comment
#1709 - [JAX] Updated: unbalanced CP with THD format
Pull Request -
State: open - Opened by huanghua1994 5 months ago
- 2 comments
#1709 - [JAX] Updated: unbalanced CP with THD format
Pull Request -
State: open - Opened by huanghua1994 5 months ago
#1708 - Kwyss/new shape owns data
Pull Request -
State: open - Opened by kwyss-nvidia 5 months ago
#1708 - Kwyss/new shape owns data
Pull Request -
State: closed - Opened by kwyss-nvidia 5 months ago
- 5 comments
Labels: 2.4.0
#1707 - [PyTorch] FP8 Subchannel Recipe With FP8 Gather And Configurable Scaling Factor Tensor Swizzling
Pull Request -
State: closed - Opened by zhongbozhu 5 months ago
- 15 comments
Labels: enhancement, 2.5.0
#1707 - [PyTorch] FP8 Subchannel Recipe With FP8 Gather And Configurable Scaling Factor Tensor Swizzling
Pull Request -
State: open - Opened by zhongbozhu 5 months ago
- 5 comments
#1706 - make grouplinear accept the fp8 input
Pull Request -
State: open - Opened by Autumn1998 5 months ago
#1706 - make grouplinear accept the fp8 input
Pull Request -
State: open - Opened by Autumn1998 5 months ago
- 1 comment
#1705 - FSDP2 Deadlock with fp8_autocast
Issue -
State: open - Opened by cassanof 5 months ago
Labels: bug
#1704 - Refactor attention.py part 2
Pull Request -
State: closed - Opened by KshitijLakhani 5 months ago
- 3 comments
Labels: 2.3.0
#1704 - Refactor attention.py part 2
Pull Request -
State: closed - Opened by KshitijLakhani 5 months ago
- 3 comments
Labels: 2.3.0
#1703 - Revert "Allow NVTEShape to own data."
Pull Request -
State: closed - Opened by timmoon10 5 months ago
- 1 comment
Labels: bug, 2.3.0
#1703 - Revert "Allow NVTEShape to own data."
Pull Request -
State: closed - Opened by timmoon10 5 months ago
- 1 comment
Labels: bug, 2.3.0
#1702 - [C][PyTorch] Move cuda kernels from pytorch extensions to core
Pull Request -
State: open - Opened by ksivaman 5 months ago
#1702 - [C][PyTorch] Move cuda kernels from pytorch extensions to core part 1
Pull Request -
State: closed - Opened by ksivaman 5 months ago
- 3 comments
#1701 - FP4 Training
Issue -
State: open - Opened by cassanof 5 months ago
#1700 - [JAX] WAR for CuDNN MXFP8 norm incorrect result
Pull Request -
State: open - Opened by jberchtold-nvidia 5 months ago
- 1 comment
#1700 - [JAX] WAR for CuDNN MXFP8 norm incorrect result
Pull Request -
State: open - Opened by jberchtold-nvidia 5 months ago
#1699 - [JAX] Distributed Current Scaling
Pull Request -
State: closed - Opened by jberchtold-nvidia 5 months ago
- 5 comments
#1699 - [JAX] Distributed Current Scaling
Pull Request -
State: closed - Opened by jberchtold-nvidia 5 months ago
- 5 comments
#1698 - Dummy PR to test docs
Pull Request -
State: open - Opened by ksivaman 5 months ago
#1697 - [C][Jax] Move cuda kernels from Jax extensions to core
Pull Request -
State: closed - Opened by ksivaman 5 months ago
- 1 comment
#1697 - [C][Jax] Move cuda kernels from Jax extensions to core
Pull Request -
State: closed - Opened by ksivaman 5 months ago
- 1 comment
#1696 - [JAX] WAR for CuDNN MXFP8 norm incorrect result
Pull Request -
State: closed - Opened by jberchtold-nvidia 5 months ago
- 4 comments
Labels: 2.3.0
#1696 - [JAX] WAR for CuDNN MXFP8 norm incorrect result
Pull Request -
State: closed - Opened by jberchtold-nvidia 5 months ago
- 4 comments
Labels: 2.3.0
#1695 - Cpu reload double buffer
Pull Request -
State: open - Opened by sanandaraj5597 5 months ago
- 1 comment
#1694 - [JAX] Deprecate Praxis layers
Pull Request -
State: open - Opened by phu0ngng 5 months ago
#1694 - [JAX] Deprecate Praxis layers
Pull Request -
State: closed - Opened by phu0ngng 5 months ago
- 7 comments
Labels: 2.3.0
#1693 - [MXFP8] grad_output is quantized columnwise even if weight doesn't require gradients.
Issue -
State: open - Opened by kshitij12345 5 months ago
Labels: bug
#1692 - RuntimeError: /tmp/pip-req-build-iq_flo47/transformer_engine/common/util/cuda_runtime.cpp:118 in function operator(): CUDA Error: invalid argument
Issue -
State: open - Opened by Lynnzake 5 months ago
#1691 - Added attention offloading
Pull Request -
State: open - Opened by sanandaraj5597 5 months ago
#1691 - Added attention offloading
Pull Request -
State: open - Opened by sanandaraj5597 5 months ago
- 1 comment
#1690 - Support computing zero-centered gamma in compute dtype for CuDNN
Pull Request -
State: closed - Opened by jberchtold-nvidia 5 months ago
- 3 comments
#1690 - Support computing zero-centered gamma in compute dtype for CuDNN
Pull Request -
State: closed - Opened by jberchtold-nvidia 5 months ago
- 3 comments
#1689 - README.md - Installation section
Pull Request -
State: closed - Opened by sbhavani 5 months ago
- 1 comment
#1689 - README.md - Installation section
Pull Request -
State: closed - Opened by sbhavani 5 months ago
- 1 comment
#1688 - fp8_model_init does nothing when used with FSDP2
Issue -
State: open - Opened by MaciejBalaNV 5 months ago
Labels: bug
#1687 - fp8_model_init fails with MXFP8BlockScaling
Issue -
State: open - Opened by MaciejBalaNV 5 months ago
- 1 comment
Labels: bug
#1686 - [PyTorch] Bunch of memory management fixes
Pull Request -
State: closed - Opened by pggPL 5 months ago
- 9 comments
#1686 - [PyTorch] Bunch of memory management fixes
Pull Request -
State: closed - Opened by pggPL 5 months ago
- 9 comments
#1685 - kernel executed fail in multi_tensor_scale
Issue -
State: open - Opened by Louis-J 5 months ago
Labels: bug
#1684 - ImportError` with PyTorch 2.5.1 and Transformer Engine 2.1.0 on CUDA 12.4, Python 3.11
Issue -
State: closed - Opened by ghoshsoumyajit7 5 months ago
- 2 comments
Labels: bug
#1683 - [PyTorch] Move swizzle scaling factor to cpp
Pull Request -
State: closed - Opened by yaox12 5 months ago
- 2 comments
#1683 - [PyTorch] Move swizzle scaling factor to cpp
Pull Request -
State: closed - Opened by yaox12 5 months ago
- 2 comments
#1682 - Re Do symmetric memory merge request
Pull Request -
State: closed - Opened by wdykas 5 months ago
- 3 comments
#1682 - Re Do symmetric memory merge request
Pull Request -
State: closed - Opened by wdykas 5 months ago
- 3 comments
#1681 - Fix #1524 and other softmax mask functionality
Pull Request -
State: closed - Opened by KshitijLakhani 5 months ago
- 1 comment
Labels: bug, 2.3.0
#1681 - Fix #1524 and other softmax mask functionality
Pull Request -
State: closed - Opened by KshitijLakhani 5 months ago
- 1 comment
Labels: bug, 2.3.0
#1680 - Does transformerEngine support 2080ti?
Issue -
State: open - Opened by SeekPoint 5 months ago
Labels: bug
#1679 - [PyTorch] Fix for checkpointing for callables.
Pull Request -
State: open - Opened by pggPL 5 months ago
#1679 - [PyTorch] Fix for checkpointing for callables.
Pull Request -
State: closed - Opened by pggPL 5 months ago
- 1 comment
#1678 - [PyTorch] Deprecate the weight offloading
Pull Request -
State: closed - Opened by pggPL 5 months ago
- 1 comment
#1678 - [PyTorch] Deprecate the weight offloading
Pull Request -
State: closed - Opened by pggPL 5 months ago
- 1 comment
#1677 - [BUG] Inconsistent LayerNorm Parameter Gradient with TP+CP+FP8
Issue -
State: open - Opened by i-love-megatron 5 months ago
Labels: bug
#1676 - [PyTorch] Avoid unnecessary tensor usages when caching for linear op backward
Pull Request -
State: closed - Opened by timmoon10 5 months ago
- 2 comments
Labels: bug
#1676 - [PyTorch] Avoid unnecessary tensor usages when caching for linear op backward
Pull Request -
State: closed - Opened by timmoon10 5 months ago
- 2 comments
Labels: bug
#1675 - [JAX] Add collective GEMM without compute/communication overlap
Pull Request -
State: open - Opened by philipphack 5 months ago
#1674 - Allow NVTEShape to own data.
Pull Request -
State: closed - Opened by kwyss-nvidia 5 months ago
- 3 comments
Labels: bug, 2.3.0
#1673 - [JAX] Improving the test_multiprocessing_encoder.py run script
Pull Request -
State: closed - Opened by phu0ngng 5 months ago
- 6 comments
#1673 - [JAX] Improving the test_multiprocessing_encoder.py run script
Pull Request -
State: closed - Opened by phu0ngng 5 months ago
- 6 comments
#1672 - fix(grouped_gemm): fix error when at::from_blob pass zero shape
Pull Request -
State: closed - Opened by cos120 5 months ago
- 2 comments
#1672 - fix(grouped_gemm): fix error when at::from_blob pass zero shape
Pull Request -
State: closed - Opened by cos120 5 months ago
- 2 comments
#1671 - Added attention activation offloading support for TE v2.0
Pull Request -
State: closed - Opened by sanandaraj5597 5 months ago
- 1 comment
#1671 - Added attention activation offloading support for TE v2.0
Pull Request -
State: closed - Opened by sanandaraj5597 5 months ago
- 1 comment
#1670 - Make shape cache invalidation more conservative.
Pull Request -
State: closed - Opened by kwyss-nvidia 5 months ago
- 2 comments
#1670 - Make shape cache invalidation more conservative.
Pull Request -
State: open - Opened by kwyss-nvidia 5 months ago
#1669 - Add user to TE CI
Pull Request -
State: closed - Opened by ksivaman 5 months ago
#1669 - Add user to TE CI
Pull Request -
State: closed - Opened by ksivaman 5 months ago
#1668 - [PyTorch] More precise test for the CPU offloading.
Pull Request -
State: closed - Opened by pggPL 5 months ago
- 3 comments
#1668 - [PyTorch] More precise test for the CPU offloading.
Pull Request -
State: open - Opened by pggPL 5 months ago
#1667 - [QA] Encapsulate functions in test_utils.sh
Pull Request -
State: open - Opened by linxiddd 5 months ago
#1667 - [QA] Encapsulate functions in test_utils.sh
Pull Request -
State: open - Opened by linxiddd 5 months ago
#1666 - [JAX] GroupedQuantizer and GroupedScaledTensor
Pull Request -
State: closed - Opened by phu0ngng 5 months ago
- 13 comments
#1666 - [JAX] GroupedQuantizer and GroupedScaledTensor
Pull Request -
State: open - Opened by phu0ngng 5 months ago
- 1 comment
#1665 - [PyTorch] Add option in activation ops to cache input in FP8
Pull Request -
State: closed - Opened by timmoon10 5 months ago
- 1 comment
Labels: enhancement
#1665 - [PyTorch] Add option in activation ops to cache input in FP8
Pull Request -
State: closed - Opened by timmoon10 5 months ago
- 1 comment
Labels: enhancement
#1664 - [JAX] Update helper tests
Pull Request -
State: closed - Opened by jberchtold-nvidia 5 months ago
- 1 comment
#1664 - [JAX] Update helper tests
Pull Request -
State: closed - Opened by jberchtold-nvidia 5 months ago
- 1 comment
#1663 - [PyTorch] Draft of new weight offloading
Pull Request -
State: open - Opened by pggPL 5 months ago
#1663 - [Pytorch] Draft of new weight offloading
Pull Request -
State: open - Opened by pggPL 5 months ago