Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / NVIDIA/TransformerEngine issues and pull requests

#1253 - Draft: reduce cudagraph mem via preoallcations

Pull Request - State: open - Opened by JimmyZhang12 27 days ago

#1252 - [PyTorch] Build custom ORT ops before running ONNX export tests

Pull Request - State: open - Opened by timmoon10 27 days ago - 1 comment
Labels: bug, testing

#1251 - [pyTorch] Infrastructure for C++ QuantizedTensor

Pull Request - State: open - Opened by ptrendx 28 days ago

#1250 - Fix layernorm fsdp

Pull Request - State: open - Opened by eljandoubi 28 days ago

#1249 - TransformerEngine install fails with no clear cause

Issue - State: open - Opened by sytelus 28 days ago - 1 comment
Labels: bug, build

#1248 - fused out correction in CP

Pull Request - State: open - Opened by xiaoyao0115 28 days ago

#1247 - FSDP with FP8 is not working

Issue - State: open - Opened by eljandoubi 28 days ago

#1246 - [Bugfix] Fix bias for 0-dim tensors in gemm

Pull Request - State: open - Opened by yaox12 about 1 month ago

#1245 - Lower precision RoPE computation leads to training instability

Issue - State: open - Opened by viclzhu about 1 month ago - 2 comments

#1244 - [C] Add `max_t` support for THD

Pull Request - State: open - Opened by cyanguwa about 1 month ago

#1243 - fused out correction in CP

Pull Request - State: closed - Opened by xiaoyao0115 about 1 month ago

#1242 - fix assertion bug for SWA API in TE-JAX

Pull Request - State: open - Opened by kocchop about 1 month ago - 8 comments

#1241 - How about the torch.compile in TransformerEngine ?

Issue - State: open - Opened by south-ocean about 1 month ago - 2 comments
Labels: question

#1240 - Do not link against CUDA driver when building

Pull Request - State: closed - Opened by timmoon10 about 1 month ago - 1 comment
Labels: bug, build

#1239 - Fused out correction

Pull Request - State: closed - Opened by xiaoyao0115 about 1 month ago

#1238 - [PyTorch] Let Fused RoPE support CP with THD format

Pull Request - State: closed - Opened by yaox12 about 1 month ago - 4 comments

#1236 - Bug in TransformerEngine v1.11 for PyTorch when using flash-attn>=2.5.7

Issue - State: open - Opened by saimidu about 1 month ago - 2 comments

#1235 - [PyTorch] Failed running call_method movedim

Issue - State: open - Opened by RedRAINXXXX about 1 month ago - 2 comments

#1234 - Save CUDA Graph memory by reusing input and output tensors

Pull Request - State: open - Opened by buptzyb about 1 month ago

#1233 - Support CUDA Graph for MoE models

Pull Request - State: open - Opened by buptzyb about 1 month ago - 4 comments

#1232 - Add FlashAttention3 to CP implementations

Pull Request - State: closed - Opened by xrennvidia about 1 month ago - 1 comment

#1230 - Fused Attention Support 64-bit Ragged Offsets for Large THD Tensors

Pull Request - State: open - Opened by mgoldfarb-nvidia about 1 month ago - 8 comments

#1229 - [Pytorch] Check gradient in test numerics

Pull Request - State: open - Opened by pggPL about 1 month ago - 2 comments

#1228 - [TE/JAX] Enabling CudaGraph for custom calls with FFI

Pull Request - State: open - Opened by phu0ngng about 1 month ago
Labels: jax

#1227 - Check for backend support in Jax context parallel fused attention test

Pull Request - State: open - Opened by mgoldfarb-nvidia about 1 month ago - 4 comments

#1226 - [PyTorch] Drop FA as an installation requirement

Pull Request - State: open - Opened by cyanguwa about 1 month ago - 4 comments

#1225 - Small fixes to Float8Tensor

Pull Request - State: closed - Opened by ptrendx about 1 month ago - 1 comment

#1224 - Test THD

Pull Request - State: closed - Opened by zlsh80826 about 1 month ago - 1 comment

#1223 - [PyTorch] Add documentation for FP8 attention checkpointing

Pull Request - State: closed - Opened by cyanguwa about 1 month ago - 1 comment

#1222 - [PyTorch] Move `block_table` argument to FA varlen function

Pull Request - State: closed - Opened by cyanguwa about 1 month ago - 1 comment
Labels: 1.11.0.late

#1221 - Create README.md for examples/

Pull Request - State: open - Opened by sbhavani about 1 month ago

#1220 - Removed the unused options from GroupedLinear docs and fixed the bug with offsets

Pull Request - State: closed - Opened by ptrendx about 1 month ago - 2 comments
Labels: 1.11

#1219 - [PyTorch] Fix distributed testing

Pull Request - State: closed - Opened by ksivaman about 1 month ago - 1 comment

#1218 - [PyTorch] Add pool argument to make_graphed_callable

Pull Request - State: closed - Opened by ksivaman about 1 month ago - 2 comments

#1217 - Fix bug in torch compile and seqdim is integer

Pull Request - State: closed - Opened by wplf about 1 month ago - 9 comments

#1216 - importlib.metadata.PackageNotFoundError: transformer-engine

Issue - State: open - Opened by zmtttt about 1 month ago - 5 comments

#1215 - [PyTorch] remove duplicate code

Pull Request - State: closed - Opened by emmanuel-ferdman about 1 month ago - 1 comment

#1214 - [PyTorch] Improve `get_qkv_layout`

Pull Request - State: closed - Opened by cyanguwa about 1 month ago - 3 comments

#1213 - Passing nonexistent argument when flash_attn version is >= 2.5.7

Issue - State: closed - Opened by MaciejBalaNV about 1 month ago - 2 comments

#1212 - Fix cuDNN sliding window size

Pull Request - State: closed - Opened by cyanguwa about 1 month ago - 3 comments

#1211 - Fix CP unit test on A100 and L40s

Pull Request - State: closed - Opened by xrennvidia about 1 month ago - 2 comments

#1210 - Punctuation and Capitalization Model not working

Issue - State: closed - Opened by ican24 about 1 month ago - 6 comments

#1209 - Hierarchical CP implementation (Ulysses + Ring)

Pull Request - State: closed - Opened by xrennvidia about 2 months ago - 6 comments

#1208 - [PyTorch] Improve CP P2P efficiency

Pull Request - State: open - Opened by yenchenlin about 2 months ago

#1206 - [PyTorch] Add GroupedLinear to the docs and fix typos

Pull Request - State: closed - Opened by pggPL about 2 months ago

#1205 - [JAX] Expose sliding window attn to TE-JAX API

Pull Request - State: closed - Opened by huanghua1994 about 2 months ago - 9 comments
Labels: enhancement, jax

#1204 - Enable fuse_wgrad_accumulation flag if using cudagraphs

Pull Request - State: closed - Opened by JimmyZhang12 about 2 months ago

#1203 - Update list of CI users

Pull Request - State: closed - Opened by ksivaman about 2 months ago
Labels: testing

#1202 - [PyTorch] Debug dtype casting in operation-based API

Pull Request - State: closed - Opened by timmoon10 about 2 months ago - 4 comments
Labels: bug

#1201 - No "pool" argument in make_graphed_callables function

Issue - State: closed - Opened by MaciejBalaNV about 2 months ago

#1200 - Draft: Use fused push_send_recv kernel for TP AG and RS overlaps

Pull Request - State: open - Opened by erhoo82 about 2 months ago

#1199 - [Dummy] add d64 support

Pull Request - State: closed - Opened by cyanguwa about 2 months ago

#1198 - Update list of CI users

Pull Request - State: closed - Opened by timmoon10 about 2 months ago
Labels: testing

#1197 - [PyTorch] fused CUDNN attention kernel and sliding window attention

Issue - State: closed - Opened by Marks101 about 2 months ago - 3 comments

#1196 - Tests for distributed

Pull Request - State: closed - Opened by pggPL about 2 months ago - 1 comment

#1195 - [PyTorch] fused CUDNN attention kernel not properly handling strides

Issue - State: closed - Opened by Marks101 about 2 months ago - 4 comments

#1194 - fix NVTE_UB_WITH_MPI read

Pull Request - State: closed - Opened by erhoo82 about 2 months ago - 2 comments

#1193 - FP8 for norm inputs and residuals?

Issue - State: open - Opened by cbcase about 2 months ago - 1 comment
Labels: question

#1191 - [PyTorch] Minor optimizations to reduce CPU overheads in modules

Pull Request - State: closed - Opened by timmoon10 about 2 months ago - 5 comments
Labels: enhancement

#1189 - Restore compatibility with Python 3.8

Pull Request - State: closed - Opened by ptrendx about 2 months ago - 2 comments

#1188 - FSDP: How to do all-gather using FP8?

Issue - State: open - Opened by vgoklani about 2 months ago - 2 comments

#1187 - [PyTorch] Fix detection of 3 in 3hd/h3d layouts

Pull Request - State: closed - Opened by cyanguwa about 2 months ago - 2 comments

#1186 - Allow specifying cmake setup directory

Pull Request - State: closed - Opened by ryxli about 2 months ago - 2 comments

#1185 - [PyTorch] Port fused optimizer tests to pytest

Pull Request - State: closed - Opened by timmoon10 about 2 months ago - 1 comment
Labels: testing

#1184 - Add docs for installing from PyPI

Pull Request - State: closed - Opened by ksivaman about 2 months ago

#1183 - [Common] Default CUDA_HOME to /usr/local/cuda when dynamically loading cuDNN and NVRTC

Pull Request - State: closed - Opened by denera about 2 months ago - 2 comments

#1182 - fused_attn_fwd_qkvpacked silently doesn't support 3 or 7 heads

Issue - State: closed - Opened by ajayjain about 2 months ago - 1 comment

#1181 - Update CI users

Pull Request - State: closed - Opened by timmoon10 2 months ago
Labels: testing

#1180 - Update CI users

Pull Request - State: closed - Opened by timmoon10 2 months ago
Labels: testing

#1178 - Allow to pass architectures like 90a, without being overriden

Pull Request - State: closed - Opened by aurianer 2 months ago - 2 comments

#1176 - [PyTorch] Relax the contiguous check for flash attention

Pull Request - State: closed - Opened by yaox12 2 months ago - 3 comments

#1175 - [PyTorch] Check network interface name when initializing Userbuffers

Pull Request - State: closed - Opened by denera 2 months ago - 1 comment
Labels: bug

#1174 - [PyTorch] Miscellaneous fixes for FA3 attention

Pull Request - State: closed - Opened by cyanguwa 2 months ago - 13 comments

#1173 - [WIP] [PyTorch] Proof-of-concept for using operation-based API in modules

Pull Request - State: open - Opened by timmoon10 2 months ago - 1 comment

#1172 - Allow downloading of model weights automatically

Pull Request - State: closed - Opened by sudhakarsingh27 2 months ago

#1171 - Add dtensor support for TE optimizers

Pull Request - State: closed - Opened by blahBlahhhJ 2 months ago - 2 comments

#1170 - [Bug] A bug in the initialize_ub function.

Issue - State: closed - Opened by wangzihe1996 2 months ago - 5 comments

#1169 - [Question] fp8 amax_history setup

Issue - State: open - Opened by tylaar 2 months ago

#1168 - [PyTorch] Fused dbias-cast-transpose in bias operation

Pull Request - State: open - Opened by timmoon10 2 months ago - 2 comments

#1167 - Fix autocast deprecation warning.

Pull Request - State: open - Opened by jondeaton 2 months ago - 4 comments

#1166 - cannot find MHA example for FA3

Issue - State: closed - Opened by saurabh-kataria 2 months ago

#1165 - AssertionError: Outputs not close enough in tensor in test_numerics.py

Issue - State: open - Opened by sirutBuasai 2 months ago - 1 comment
Labels: bug

#1164 - [PyTorch] Activation operations

Pull Request - State: open - Opened by timmoon10 2 months ago - 4 comments

#1163 - Applying LayerNorm After TEColumnParallelLinear in Tensor Parallel Setup

Issue - State: closed - Opened by ftgreat 2 months ago - 2 comments

#1162 - Added Adobe analytics to the documentation

Pull Request - State: closed - Opened by ptrendx 2 months ago

#1161 - Revert "[C] Suppress 128-D warning from cudnn-frontend"

Pull Request - State: closed - Opened by ksivaman 2 months ago

#1160 - Add a context parallelism implementation with QKVO all-to-all

Pull Request - State: closed - Opened by xrennvidia 2 months ago - 4 comments

#1158 - [C] Suppress 128-D warning from cudnn-frontend

Pull Request - State: closed - Opened by cyanguwa 2 months ago - 1 comment

#1157 - [PyTorch] Lower atol/rtol for F16 attention tests

Pull Request - State: closed - Opened by cyanguwa 2 months ago - 2 comments

#1156 - installation guide for NVHPC SDK

Issue - State: closed - Opened by jinz2014 2 months ago - 1 comment

#1155 - Add user to TE CI

Pull Request - State: closed - Opened by timmoon10 2 months ago

#1154 - Bump actions/download-artifact from 3 to 4.1.7 in /.github/workflows

Pull Request - State: closed - Opened by dependabot[bot] 2 months ago
Labels: dependencies

#1153 - The question about flash attention and fused attention

Issue - State: open - Opened by HenHenry-Z 2 months ago - 1 comment

#1152 - Question about the cublaslt_gemm.cu

Issue - State: closed - Opened by south-ocean 2 months ago - 3 comments