NVIDIA/TransformerEngine issues and pull requests

#1721 - [JAX] GroupedDense v.2 without dynamic shape

Pull Request - State: open - Opened by phu0ngng 5 months ago

#1721 - [JAX] GroupedDense v.2 without dynamic shape

Pull Request - State: open - Opened by phu0ngng 5 months ago

#1720 - Update recommended docker container in README

Pull Request - State: closed - Opened by ksivaman 5 months ago - 1 comment

#1720 - Update recommended docker container in README

Pull Request - State: closed - Opened by ksivaman 5 months ago - 1 comment

#1719 - [PyTorch] Update FSDP example instructions

Pull Request - State: closed - Opened by ksivaman 5 months ago

#1718 - 🐛 Bug Report: Build Fails on `transformer_engine_torch` with CUDA 12.4 and Conda Modular CUDA

Issue - State: open - Opened by ghoshsoumyajit7 5 months ago
Labels: bug

#1717 - Support `nvidia-cu*` wheels for core lib compilation; miscellaneous build improvements

Pull Request - State: closed - Opened by ksivaman 5 months ago - 3 comments

#1717 - Support `nvidia-cu*` wheels for core lib compilation; miscellaneous build improvements

Pull Request - State: closed - Opened by ksivaman 5 months ago - 3 comments

#1716 - Introduce nvte_memset to provide a fill kernel that is faster than cudaMemsetAsync for small sizes

Pull Request - State: closed - Opened by jberchtold-nvidia 5 months ago - 1 comment

#1716 - Introduce nvte_memset to provide a fill kernel that is faster than cudaMemsetAsync for small sizes

Pull Request - State: closed - Opened by jberchtold-nvidia 5 months ago - 1 comment

#1715 - [PyTorch] Fix cuBLAS workspace leak in applications that initialize+destroy Userbuffers more than once

Pull Request - State: closed - Opened by denera 5 months ago - 3 comments
Labels: bug, 2.3.0

#1715 - [PyTorch] Fix cuBLAS workspace leak in applications that initialize+destroy Userbuffers more than once

Pull Request - State: closed - Opened by denera 5 months ago - 3 comments
Labels: bug, 2.3.0

#1714 - Add user to TE CI

Pull Request - State: closed - Opened by ksivaman 5 months ago

#1713 - [PyTorch] Draft of weight offloading + fused wgrad accumulation

Pull Request - State: closed - Opened by pggPL 5 months ago

#1712 - Warn when using fp8 weights + non-fp8 computation

Pull Request - State: closed - Opened by kunlunl 5 months ago - 2 comments

#1712 - Warn when using fp8 weights + non-fp8 computation

Pull Request - State: closed - Opened by kunlunl 5 months ago - 2 comments

#1711 - MXFP8 support in Userbuffers

Pull Request - State: closed - Opened by timmoon10 5 months ago - 10 comments
Labels: enhancement

#1711 - MXFP8 support in Userbuffers

Pull Request - State: closed - Opened by timmoon10 5 months ago - 10 comments
Labels: enhancement

#1710 - Check CUDA driver KMD version for multicast symbol support

Pull Request - State: closed - Opened by nvcastet 5 months ago - 1 comment

#1710 - Check CUDA driver KMD version for multicast symbol support

Pull Request - State: closed - Opened by nvcastet 5 months ago - 1 comment

#1709 - [JAX] Updated: unbalanced CP with THD format

Pull Request - State: open - Opened by huanghua1994 5 months ago - 2 comments

#1709 - [JAX] Updated: unbalanced CP with THD format

Pull Request - State: open - Opened by huanghua1994 5 months ago

#1708 - Kwyss/new shape owns data

Pull Request - State: open - Opened by kwyss-nvidia 5 months ago

#1708 - Kwyss/new shape owns data

Pull Request - State: closed - Opened by kwyss-nvidia 5 months ago - 5 comments
Labels: 2.4.0

#1707 - [PyTorch] FP8 Subchannel Recipe With FP8 Gather And Configurable Scaling Factor Tensor Swizzling

Pull Request - State: closed - Opened by zhongbozhu 5 months ago - 15 comments
Labels: enhancement, 2.5.0

#1707 - [PyTorch] FP8 Subchannel Recipe With FP8 Gather And Configurable Scaling Factor Tensor Swizzling

Pull Request - State: open - Opened by zhongbozhu 5 months ago - 5 comments

#1706 - make grouplinear accept the fp8 input

Pull Request - State: open - Opened by Autumn1998 5 months ago

#1706 - make grouplinear accept the fp8 input

Pull Request - State: open - Opened by Autumn1998 5 months ago - 1 comment

#1705 - FSDP2 Deadlock with fp8_autocast

Issue - State: open - Opened by cassanof 5 months ago
Labels: bug

#1704 - Refactor attention.py part 2

Pull Request - State: closed - Opened by KshitijLakhani 5 months ago - 3 comments
Labels: 2.3.0

#1704 - Refactor attention.py part 2

Pull Request - State: closed - Opened by KshitijLakhani 5 months ago - 3 comments
Labels: 2.3.0

#1703 - Revert "Allow NVTEShape to own data."

Pull Request - State: closed - Opened by timmoon10 5 months ago - 1 comment
Labels: bug, 2.3.0

#1703 - Revert "Allow NVTEShape to own data."

Pull Request - State: closed - Opened by timmoon10 5 months ago - 1 comment
Labels: bug, 2.3.0

#1702 - [C][PyTorch] Move cuda kernels from pytorch extensions to core

Pull Request - State: open - Opened by ksivaman 5 months ago

#1702 - [C][PyTorch] Move cuda kernels from pytorch extensions to core part 1

Pull Request - State: closed - Opened by ksivaman 5 months ago - 3 comments

#1701 - FP4 Training

Issue - State: open - Opened by cassanof 5 months ago

#1700 - [JAX] WAR for CuDNN MXFP8 norm incorrect result

Pull Request - State: open - Opened by jberchtold-nvidia 5 months ago - 1 comment

#1700 - [JAX] WAR for CuDNN MXFP8 norm incorrect result

Pull Request - State: open - Opened by jberchtold-nvidia 5 months ago

#1699 - [JAX] Distributed Current Scaling

Pull Request - State: closed - Opened by jberchtold-nvidia 5 months ago - 5 comments

#1699 - [JAX] Distributed Current Scaling

Pull Request - State: closed - Opened by jberchtold-nvidia 5 months ago - 5 comments

#1698 - Dummy PR to test docs

Pull Request - State: open - Opened by ksivaman 5 months ago

#1697 - [C][Jax] Move cuda kernels from Jax extensions to core

Pull Request - State: closed - Opened by ksivaman 5 months ago - 1 comment

#1697 - [C][Jax] Move cuda kernels from Jax extensions to core

Pull Request - State: closed - Opened by ksivaman 5 months ago - 1 comment

#1696 - [JAX] WAR for CuDNN MXFP8 norm incorrect result

Pull Request - State: closed - Opened by jberchtold-nvidia 5 months ago - 4 comments
Labels: 2.3.0

#1696 - [JAX] WAR for CuDNN MXFP8 norm incorrect result

Pull Request - State: closed - Opened by jberchtold-nvidia 5 months ago - 4 comments
Labels: 2.3.0

#1695 - Cpu reload double buffer

Pull Request - State: open - Opened by sanandaraj5597 5 months ago - 1 comment

#1694 - [JAX] Deprecate Praxis layers

Pull Request - State: open - Opened by phu0ngng 5 months ago

#1694 - [JAX] Deprecate Praxis layers

Pull Request - State: closed - Opened by phu0ngng 5 months ago - 7 comments
Labels: 2.3.0

#1693 - [MXFP8] grad_output is quantized columnwise even if weight doesn't require gradients.

Issue - State: open - Opened by kshitij12345 5 months ago
Labels: bug

#1692 - RuntimeError: /tmp/pip-req-build-iq_flo47/transformer_engine/common/util/cuda_runtime.cpp:118 in function operator(): CUDA Error: invalid argument

Issue - State: open - Opened by Lynnzake 5 months ago

#1691 - Added attention offloading

Pull Request - State: open - Opened by sanandaraj5597 5 months ago

#1691 - Added attention offloading

Pull Request - State: open - Opened by sanandaraj5597 5 months ago - 1 comment

#1690 - Support computing zero-centered gamma in compute dtype for CuDNN

Pull Request - State: closed - Opened by jberchtold-nvidia 5 months ago - 3 comments

#1690 - Support computing zero-centered gamma in compute dtype for CuDNN

Pull Request - State: closed - Opened by jberchtold-nvidia 5 months ago - 3 comments

#1689 - README.md - Installation section

Pull Request - State: closed - Opened by sbhavani 5 months ago - 1 comment

#1689 - README.md - Installation section

Pull Request - State: closed - Opened by sbhavani 5 months ago - 1 comment

#1688 - fp8_model_init does nothing when used with FSDP2

Issue - State: open - Opened by MaciejBalaNV 5 months ago
Labels: bug

#1687 - fp8_model_init fails with MXFP8BlockScaling

Issue - State: open - Opened by MaciejBalaNV 5 months ago - 1 comment
Labels: bug

#1686 - [PyTorch] Bunch of memory management fixes

Pull Request - State: closed - Opened by pggPL 5 months ago - 9 comments

#1686 - [PyTorch] Bunch of memory management fixes

Pull Request - State: closed - Opened by pggPL 5 months ago - 9 comments

#1685 - kernel executed fail in multi_tensor_scale

Issue - State: open - Opened by Louis-J 5 months ago
Labels: bug

#1684 - ImportError` with PyTorch 2.5.1 and Transformer Engine 2.1.0 on CUDA 12.4, Python 3.11

Issue - State: closed - Opened by ghoshsoumyajit7 5 months ago - 2 comments
Labels: bug

#1683 - [PyTorch] Move swizzle scaling factor to cpp

Pull Request - State: closed - Opened by yaox12 5 months ago - 2 comments

#1683 - [PyTorch] Move swizzle scaling factor to cpp

Pull Request - State: closed - Opened by yaox12 5 months ago - 2 comments

#1682 - Re Do symmetric memory merge request

Pull Request - State: closed - Opened by wdykas 5 months ago - 3 comments

#1682 - Re Do symmetric memory merge request

Pull Request - State: closed - Opened by wdykas 5 months ago - 3 comments

#1681 - Fix #1524 and other softmax mask functionality

Pull Request - State: closed - Opened by KshitijLakhani 5 months ago - 1 comment
Labels: bug, 2.3.0

#1681 - Fix #1524 and other softmax mask functionality

Pull Request - State: closed - Opened by KshitijLakhani 5 months ago - 1 comment
Labels: bug, 2.3.0

#1680 - Does transformerEngine support 2080ti?

Issue - State: open - Opened by SeekPoint 5 months ago
Labels: bug

#1679 - [PyTorch] Fix for checkpointing for callables.

Pull Request - State: open - Opened by pggPL 5 months ago

#1679 - [PyTorch] Fix for checkpointing for callables.

Pull Request - State: closed - Opened by pggPL 5 months ago - 1 comment

#1678 - [PyTorch] Deprecate the weight offloading

Pull Request - State: closed - Opened by pggPL 5 months ago - 1 comment

#1678 - [PyTorch] Deprecate the weight offloading

Pull Request - State: closed - Opened by pggPL 5 months ago - 1 comment

#1677 - [BUG] Inconsistent LayerNorm Parameter Gradient with TP+CP+FP8

Issue - State: open - Opened by i-love-megatron 5 months ago
Labels: bug

#1676 - [PyTorch] Avoid unnecessary tensor usages when caching for linear op backward

Pull Request - State: closed - Opened by timmoon10 5 months ago - 2 comments
Labels: bug

#1676 - [PyTorch] Avoid unnecessary tensor usages when caching for linear op backward

Pull Request - State: closed - Opened by timmoon10 5 months ago - 2 comments
Labels: bug

#1675 - [JAX] Add collective GEMM without compute/communication overlap

Pull Request - State: open - Opened by philipphack 5 months ago

#1674 - Allow NVTEShape to own data.

Pull Request - State: closed - Opened by kwyss-nvidia 5 months ago - 3 comments
Labels: bug, 2.3.0

#1673 - [JAX] Improving the test_multiprocessing_encoder.py run script

Pull Request - State: closed - Opened by phu0ngng 5 months ago - 6 comments

#1673 - [JAX] Improving the test_multiprocessing_encoder.py run script

Pull Request - State: closed - Opened by phu0ngng 5 months ago - 6 comments

#1672 - fix(grouped_gemm): fix error when at::from_blob pass zero shape

Pull Request - State: closed - Opened by cos120 5 months ago - 2 comments

#1672 - fix(grouped_gemm): fix error when at::from_blob pass zero shape

Pull Request - State: closed - Opened by cos120 5 months ago - 2 comments

#1671 - Added attention activation offloading support for TE v2.0

Pull Request - State: closed - Opened by sanandaraj5597 5 months ago - 1 comment

#1671 - Added attention activation offloading support for TE v2.0

Pull Request - State: closed - Opened by sanandaraj5597 5 months ago - 1 comment

#1670 - Make shape cache invalidation more conservative.

Pull Request - State: closed - Opened by kwyss-nvidia 5 months ago - 2 comments

#1670 - Make shape cache invalidation more conservative.

Pull Request - State: open - Opened by kwyss-nvidia 5 months ago

#1669 - Add user to TE CI

Pull Request - State: closed - Opened by ksivaman 5 months ago

#1669 - Add user to TE CI

Pull Request - State: closed - Opened by ksivaman 5 months ago

#1668 - [PyTorch] More precise test for the CPU offloading.

Pull Request - State: closed - Opened by pggPL 5 months ago - 3 comments

#1668 - [PyTorch] More precise test for the CPU offloading.

Pull Request - State: open - Opened by pggPL 5 months ago

#1667 - [QA] Encapsulate functions in test_utils.sh

Pull Request - State: open - Opened by linxiddd 5 months ago

#1667 - [QA] Encapsulate functions in test_utils.sh

Pull Request - State: open - Opened by linxiddd 5 months ago

#1666 - [JAX] GroupedQuantizer and GroupedScaledTensor

Pull Request - State: closed - Opened by phu0ngng 5 months ago - 13 comments

#1666 - [JAX] GroupedQuantizer and GroupedScaledTensor

Pull Request - State: open - Opened by phu0ngng 5 months ago - 1 comment

#1665 - [PyTorch] Add option in activation ops to cache input in FP8

Pull Request - State: closed - Opened by timmoon10 5 months ago - 1 comment
Labels: enhancement

#1665 - [PyTorch] Add option in activation ops to cache input in FP8

Pull Request - State: closed - Opened by timmoon10 5 months ago - 1 comment
Labels: enhancement

#1664 - [JAX] Update helper tests

Pull Request - State: closed - Opened by jberchtold-nvidia 5 months ago - 1 comment

#1664 - [JAX] Update helper tests

Pull Request - State: closed - Opened by jberchtold-nvidia 5 months ago - 1 comment

#1663 - [PyTorch] Draft of new weight offloading

Pull Request - State: open - Opened by pggPL 5 months ago

#1663 - [Pytorch] Draft of new weight offloading

Pull Request - State: open - Opened by pggPL 5 months ago

GitHub / NVIDIA/TransformerEngine issues and pull requests