Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / NVIDIA/TransformerEngine issues and pull requests
#1253 - Draft: reduce cudagraph mem via preoallcations
Pull Request -
State: open - Opened by JimmyZhang12 about 1 month ago
#1252 - [PyTorch] Build custom ORT ops before running ONNX export tests
Pull Request -
State: open - Opened by timmoon10 about 1 month ago
- 1 comment
Labels: bug, testing
#1251 - [pyTorch] Infrastructure for C++ QuantizedTensor
Pull Request -
State: open - Opened by ptrendx about 1 month ago
#1250 - Fix layernorm fsdp
Pull Request -
State: open - Opened by eljandoubi about 1 month ago
#1249 - TransformerEngine install fails with no clear cause
Issue -
State: open - Opened by sytelus about 1 month ago
- 1 comment
Labels: bug, build
#1248 - fused out correction in CP
Pull Request -
State: open - Opened by xiaoyao0115 about 1 month ago
#1247 - FSDP with FP8 is not working
Issue -
State: open - Opened by eljandoubi about 1 month ago
#1246 - [Bugfix] Fix bias for 0-dim tensors in gemm
Pull Request -
State: open - Opened by yaox12 about 1 month ago
#1245 - Lower precision RoPE computation leads to training instability
Issue -
State: open - Opened by viclzhu about 1 month ago
- 2 comments
#1244 - [C] Add `max_t` support for THD
Pull Request -
State: open - Opened by cyanguwa about 1 month ago
#1243 - fused out correction in CP
Pull Request -
State: closed - Opened by xiaoyao0115 about 1 month ago
#1242 - fix assertion bug for SWA API in TE-JAX
Pull Request -
State: open - Opened by kocchop about 1 month ago
- 8 comments
#1241 - How about the torch.compile in TransformerEngine ?
Issue -
State: open - Opened by south-ocean about 1 month ago
- 2 comments
Labels: question
#1240 - Do not link against CUDA driver when building
Pull Request -
State: closed - Opened by timmoon10 about 1 month ago
- 1 comment
Labels: bug, build
#1239 - Fused out correction
Pull Request -
State: closed - Opened by xiaoyao0115 about 1 month ago
#1238 - [PyTorch] Let Fused RoPE support CP with THD format
Pull Request -
State: closed - Opened by yaox12 about 1 month ago
- 4 comments
#1236 - Bug in TransformerEngine v1.11 for PyTorch when using flash-attn>=2.5.7
Issue -
State: open - Opened by saimidu about 1 month ago
- 2 comments
#1235 - [PyTorch] Failed running call_method movedim
Issue -
State: open - Opened by RedRAINXXXX about 1 month ago
- 2 comments
#1234 - Save CUDA Graph memory by reusing input and output tensors
Pull Request -
State: open - Opened by buptzyb about 1 month ago
#1233 - Support CUDA Graph for MoE models
Pull Request -
State: open - Opened by buptzyb about 1 month ago
- 4 comments
#1232 - Add FlashAttention3 to CP implementations
Pull Request -
State: closed - Opened by xrennvidia about 1 month ago
- 1 comment
#1230 - Fused Attention Support 64-bit Ragged Offsets for Large THD Tensors
Pull Request -
State: open - Opened by mgoldfarb-nvidia about 1 month ago
- 8 comments
#1229 - [Pytorch] Check gradient in test numerics
Pull Request -
State: open - Opened by pggPL about 1 month ago
- 2 comments
#1228 - [TE/JAX] Enabling CudaGraph for custom calls with FFI
Pull Request -
State: open - Opened by phu0ngng about 1 month ago
Labels: jax
#1227 - Check for backend support in Jax context parallel fused attention test
Pull Request -
State: open - Opened by mgoldfarb-nvidia about 1 month ago
- 4 comments
#1226 - [PyTorch] Drop FA as an installation requirement
Pull Request -
State: open - Opened by cyanguwa about 1 month ago
- 4 comments
#1225 - Small fixes to Float8Tensor
Pull Request -
State: closed - Opened by ptrendx about 1 month ago
- 1 comment
#1224 - Test THD
Pull Request -
State: closed - Opened by zlsh80826 about 1 month ago
- 1 comment
#1223 - [PyTorch] Add documentation for FP8 attention checkpointing
Pull Request -
State: closed - Opened by cyanguwa about 1 month ago
- 1 comment
#1222 - [PyTorch] Move `block_table` argument to FA varlen function
Pull Request -
State: closed - Opened by cyanguwa about 1 month ago
- 1 comment
Labels: 1.11.0.late
#1221 - Create README.md for examples/
Pull Request -
State: open - Opened by sbhavani about 1 month ago
#1220 - Removed the unused options from GroupedLinear docs and fixed the bug with offsets
Pull Request -
State: closed - Opened by ptrendx about 1 month ago
- 2 comments
Labels: 1.11
#1219 - [PyTorch] Fix distributed testing
Pull Request -
State: closed - Opened by ksivaman about 1 month ago
- 1 comment
#1218 - [PyTorch] Add pool argument to make_graphed_callable
Pull Request -
State: closed - Opened by ksivaman about 1 month ago
- 2 comments
#1217 - Fix bug in torch compile and seqdim is integer
Pull Request -
State: closed - Opened by wplf about 2 months ago
- 9 comments
#1216 - importlib.metadata.PackageNotFoundError: transformer-engine
Issue -
State: open - Opened by zmtttt about 2 months ago
- 5 comments
#1215 - [PyTorch] remove duplicate code
Pull Request -
State: closed - Opened by emmanuel-ferdman about 2 months ago
- 1 comment
#1214 - [PyTorch] Improve `get_qkv_layout`
Pull Request -
State: closed - Opened by cyanguwa about 2 months ago
- 3 comments
#1213 - Passing nonexistent argument when flash_attn version is >= 2.5.7
Issue -
State: closed - Opened by MaciejBalaNV about 2 months ago
- 2 comments
#1212 - Fix cuDNN sliding window size
Pull Request -
State: closed - Opened by cyanguwa about 2 months ago
- 3 comments
#1211 - Fix CP unit test on A100 and L40s
Pull Request -
State: closed - Opened by xrennvidia about 2 months ago
- 2 comments
#1210 - Punctuation and Capitalization Model not working
Issue -
State: closed - Opened by ican24 about 2 months ago
- 6 comments
#1209 - Hierarchical CP implementation (Ulysses + Ring)
Pull Request -
State: closed - Opened by xrennvidia about 2 months ago
- 6 comments
#1208 - [PyTorch] Improve CP P2P efficiency
Pull Request -
State: open - Opened by yenchenlin about 2 months ago
#1207 - No option to change FP8 status in graphed module after using "make_graphed_callables"
Issue -
State: open - Opened by MaciejBalaNV about 2 months ago
Labels: bug
#1206 - [PyTorch] Add GroupedLinear to the docs and fix typos
Pull Request -
State: closed - Opened by pggPL about 2 months ago
#1205 - [JAX] Expose sliding window attn to TE-JAX API
Pull Request -
State: closed - Opened by huanghua1994 about 2 months ago
- 9 comments
Labels: enhancement, jax
#1204 - Enable fuse_wgrad_accumulation flag if using cudagraphs
Pull Request -
State: closed - Opened by JimmyZhang12 about 2 months ago
#1203 - Update list of CI users
Pull Request -
State: closed - Opened by ksivaman about 2 months ago
Labels: testing
#1202 - [PyTorch] Debug dtype casting in operation-based API
Pull Request -
State: closed - Opened by timmoon10 about 2 months ago
- 4 comments
Labels: bug
#1201 - No "pool" argument in make_graphed_callables function
Issue -
State: closed - Opened by MaciejBalaNV about 2 months ago
#1200 - Draft: Use fused push_send_recv kernel for TP AG and RS overlaps
Pull Request -
State: open - Opened by erhoo82 about 2 months ago
#1199 - [Dummy] add d64 support
Pull Request -
State: closed - Opened by cyanguwa about 2 months ago
#1198 - Update list of CI users
Pull Request -
State: closed - Opened by timmoon10 about 2 months ago
Labels: testing
#1197 - [PyTorch] fused CUDNN attention kernel and sliding window attention
Issue -
State: closed - Opened by Marks101 about 2 months ago
- 3 comments
#1196 - Tests for distributed
Pull Request -
State: closed - Opened by pggPL about 2 months ago
- 1 comment
#1195 - [PyTorch] fused CUDNN attention kernel not properly handling strides
Issue -
State: closed - Opened by Marks101 about 2 months ago
- 4 comments
#1194 - fix NVTE_UB_WITH_MPI read
Pull Request -
State: closed - Opened by erhoo82 about 2 months ago
- 2 comments
#1193 - FP8 for norm inputs and residuals?
Issue -
State: open - Opened by cbcase about 2 months ago
- 1 comment
Labels: question
#1192 - Importing torch gives ImportError - undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
Issue -
State: closed - Opened by zimmerrol about 2 months ago
#1191 - [PyTorch] Minor optimizations to reduce CPU overheads in modules
Pull Request -
State: closed - Opened by timmoon10 about 2 months ago
- 5 comments
Labels: enhancement
#1190 - [PyTorch] FP8 and activation checkpointing causes training instabilities
Issue -
State: open - Opened by Marks101 about 2 months ago
#1189 - Restore compatibility with Python 3.8
Pull Request -
State: closed - Opened by ptrendx about 2 months ago
- 2 comments
#1188 - FSDP: How to do all-gather using FP8?
Issue -
State: open - Opened by vgoklani about 2 months ago
- 2 comments
#1187 - [PyTorch] Fix detection of 3 in 3hd/h3d layouts
Pull Request -
State: closed - Opened by cyanguwa about 2 months ago
- 2 comments
#1186 - Allow specifying cmake setup directory
Pull Request -
State: closed - Opened by ryxli about 2 months ago
- 2 comments
#1185 - [PyTorch] Port fused optimizer tests to pytest
Pull Request -
State: closed - Opened by timmoon10 about 2 months ago
- 1 comment
Labels: testing
#1184 - Add docs for installing from PyPI
Pull Request -
State: closed - Opened by ksivaman about 2 months ago
#1183 - [Common] Default CUDA_HOME to /usr/local/cuda when dynamically loading cuDNN and NVRTC
Pull Request -
State: closed - Opened by denera about 2 months ago
- 2 comments
#1182 - fused_attn_fwd_qkvpacked silently doesn't support 3 or 7 heads
Issue -
State: closed - Opened by ajayjain 2 months ago
- 1 comment
#1181 - Update CI users
Pull Request -
State: closed - Opened by timmoon10 2 months ago
Labels: testing
#1180 - Update CI users
Pull Request -
State: closed - Opened by timmoon10 2 months ago
Labels: testing
#1179 - [JAX] Fix unit tests to work around cuDNN 9.4 regression of 0 length sequences
Pull Request -
State: closed - Opened by mgoldfarb-nvidia 2 months ago
- 2 comments
#1178 - Allow to pass architectures like 90a, without being overriden
Pull Request -
State: closed - Opened by aurianer 2 months ago
- 2 comments
#1177 - New format for with statement does not compatible with python 3.8
Issue -
State: closed - Opened by skydoorkai 2 months ago
#1176 - [PyTorch] Relax the contiguous check for flash attention
Pull Request -
State: closed - Opened by yaox12 2 months ago
- 3 comments
#1175 - [PyTorch] Check network interface name when initializing Userbuffers
Pull Request -
State: closed - Opened by denera 2 months ago
- 1 comment
Labels: bug
#1174 - [PyTorch] Miscellaneous fixes for FA3 attention
Pull Request -
State: closed - Opened by cyanguwa 2 months ago
- 13 comments
#1173 - [WIP] [PyTorch] Proof-of-concept for using operation-based API in modules
Pull Request -
State: open - Opened by timmoon10 2 months ago
- 1 comment
#1172 - Allow downloading of model weights automatically
Pull Request -
State: closed - Opened by sudhakarsingh27 2 months ago
#1171 - Add dtensor support for TE optimizers
Pull Request -
State: closed - Opened by blahBlahhhJ 2 months ago
- 2 comments
#1170 - [Bug] A bug in the initialize_ub function.
Issue -
State: closed - Opened by wangzihe1996 2 months ago
- 5 comments
#1169 - [Question] fp8 amax_history setup
Issue -
State: open - Opened by tylaar 2 months ago
#1168 - [PyTorch] Fused dbias-cast-transpose in bias operation
Pull Request -
State: open - Opened by timmoon10 2 months ago
- 2 comments
#1167 - Fix autocast deprecation warning.
Pull Request -
State: open - Opened by jondeaton 2 months ago
- 4 comments
#1166 - cannot find MHA example for FA3
Issue -
State: closed - Opened by saurabh-kataria 2 months ago
#1165 - AssertionError: Outputs not close enough in tensor in test_numerics.py
Issue -
State: open - Opened by sirutBuasai 2 months ago
- 1 comment
Labels: bug
#1164 - [PyTorch] Activation operations
Pull Request -
State: open - Opened by timmoon10 2 months ago
- 4 comments
#1163 - Applying LayerNorm After TEColumnParallelLinear in Tensor Parallel Setup
Issue -
State: closed - Opened by ftgreat 2 months ago
- 2 comments
#1162 - Added Adobe analytics to the documentation
Pull Request -
State: closed - Opened by ptrendx 2 months ago
#1161 - Revert "[C] Suppress 128-D warning from cudnn-frontend"
Pull Request -
State: closed - Opened by ksivaman 2 months ago
#1160 - Add a context parallelism implementation with QKVO all-to-all
Pull Request -
State: closed - Opened by xrennvidia 2 months ago
- 4 comments
#1159 - AssertionError: Device compute capability 8.9 or higher required for FP8 execution.
Issue -
State: open - Opened by kamrul-NSL 2 months ago
- 1 comment
#1158 - [C] Suppress 128-D warning from cudnn-frontend
Pull Request -
State: closed - Opened by cyanguwa 2 months ago
- 1 comment
#1157 - [PyTorch] Lower atol/rtol for F16 attention tests
Pull Request -
State: closed - Opened by cyanguwa 2 months ago
- 2 comments
#1156 - installation guide for NVHPC SDK
Issue -
State: closed - Opened by jinz2014 2 months ago
- 1 comment
#1155 - Add user to TE CI
Pull Request -
State: closed - Opened by timmoon10 2 months ago
#1154 - Bump actions/download-artifact from 3 to 4.1.7 in /.github/workflows
Pull Request -
State: closed - Opened by dependabot[bot] 2 months ago
Labels: dependencies
#1153 - The question about flash attention and fused attention
Issue -
State: open - Opened by HenHenry-Z 2 months ago
- 1 comment
#1152 - Question about the cublaslt_gemm.cu
Issue -
State: closed - Opened by south-ocean 2 months ago
- 3 comments