Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / NVIDIA/TransformerEngine issues and pull requests
#1412 - [PyTorch] `te.Linear` FP8 DGRAD+RS output bugfix
Pull Request -
State: open - Opened by denera 2 days ago
- 1 comment
Labels: bug
#1411 - Plans for block-wise FP8 quantization during training?
Issue -
State: open - Opened by beccohov 3 days ago
- 1 comment
#1410 - Make it an option to compile activation functions with fast math
Pull Request -
State: closed - Opened by guyueh1 3 days ago
- 1 comment
#1409 - Questions on DotProductAttention API Usage in Flash Attention thd Mode
Issue -
State: open - Opened by pipSu 4 days ago
#1408 - Support `store_param_remainders` feature from Apex in TE Fused Adam
Pull Request -
State: open - Opened by sanandaraj5597 4 days ago
#1407 - Fused attention error while running Nvidia Cosmos
Issue -
State: open - Opened by deepbeepmeep 4 days ago
#1406 - [JAX] Support segment_ids/pos as FA inputs
Pull Request -
State: open - Opened by zlsh80826 5 days ago
- 2 comments
#1405 - [JAX] Consolidate the distributed fused attention test code
Pull Request -
State: open - Opened by mgoldfarb-nvidia 5 days ago
- 5 comments
#1404 - Not compile in wsl2 pytorch wheels
Issue -
State: closed - Opened by johnnynunez 6 days ago
- 1 comment
#1403 - [PyTorch] Avoid `parameters` function in op backward pass
Pull Request -
State: open - Opened by timmoon10 7 days ago
- 1 comment
Labels: bug
#1402 - Fix "refractor" typo in the PR template
Pull Request -
State: closed - Opened by kit1980 7 days ago
#1401 - Use log1p(x) instead of log(1+x)
Pull Request -
State: open - Opened by kit1980 7 days ago
- 4 comments
#1400 - Import fails when working from a TE directory
Issue -
State: open - Opened by ksivaman 7 days ago
Labels: good first issue
#1399 - Installation stuck at 97%
Issue -
State: open - Opened by lorenzbaraldi 8 days ago
- 1 comment
#1398 - why close ag overlap when is_grad_enabled is False
Issue -
State: open - Opened by sallyjunjun 8 days ago
- 1 comment
#1397 - [PyTorch] Fix AttentionParams comparison logic
Pull Request -
State: open - Opened by cyanguwa 9 days ago
- 1 comment
#1396 - Take token count quantization of fused attention into consideration for CP results correction
Pull Request -
State: closed - Opened by xrennvidia 9 days ago
- 1 comment
#1395 - [PyTorch] Fix fusible ops checkpoint
Pull Request -
State: closed - Opened by ksivaman 9 days ago
Labels: bug
#1394 - [JAX] Test_multiprocessing_encoder with process spawn in bash
Pull Request -
State: closed - Opened by phu0ngng 10 days ago
- 1 comment
#1393 - [JAX] Correct fused attention output after each step of ring attention
Pull Request -
State: closed - Opened by mgoldfarb-nvidia 10 days ago
- 3 comments
#1392 - support new flash_attn_interface
Issue -
State: open - Opened by rgtjf 11 days ago
- 2 comments
#1391 - FP8 GEMM Kernels
Issue -
State: open - Opened by xiaoxiao26 11 days ago
#1391 - FP8 GEMM Kernels
Issue -
State: open - Opened by xiaoxiao26 11 days ago
#1390 - [JAX] Add THD + SWA unit tests
Pull Request -
State: closed - Opened by zlsh80826 12 days ago
- 1 comment
#1389 - Better cuBLAS handle management
Pull Request -
State: open - Opened by ptrendx 15 days ago
- 6 comments
#1388 - Update copyright to include 2025
Pull Request -
State: closed - Opened by ksivaman 16 days ago
#1387 - clean CP implementation for flash attention and cuDNN 9.6
Pull Request -
State: closed - Opened by xrennvidia 19 days ago
- 3 comments
#1386 - How about the grouplinear?
Issue -
State: open - Opened by south-ocean 23 days ago
- 2 comments
#1385 - Update README.rst
Pull Request -
State: open - Opened by sbhavani 26 days ago
#1384 - _NoopCatFunc in transformer layer
Issue -
State: open - Opened by robot-transformer 26 days ago
Labels: bug
#1383 - thd qkv-format in transformer layer
Issue -
State: open - Opened by robot-transformer 26 days ago
#1382 - bug fix for using `return_layernorm_output=True`
Pull Request -
State: closed - Opened by LiyuanLucasLiu 29 days ago
- 1 comment
#1381 - [PyTorch] Add caching for attention backend selection results
Pull Request -
State: open - Opened by cyanguwa 30 days ago
#1380 - Don't touch nor send messages to the root logger.
Pull Request -
State: open - Opened by sagostinho-nvidia 30 days ago
#1379 - AttributeError: module 'transformer_engine' has no attribute 'pytorch'
Issue -
State: open - Opened by carrot0117 about 1 month ago
- 2 comments
#1378 - [common/PyTorch] Add cuDNN SWA (left, 0) + padding + bottom right causal
Pull Request -
State: closed - Opened by cyanguwa about 1 month ago
- 5 comments
Labels: 1.14.0
#1377 - ViT Support
Issue -
State: open - Opened by cnut1648 about 1 month ago
- 1 comment
#1376 - TypeError: initialize_ub() got an unexpected keyword argument 'tp_size'
Issue -
State: closed - Opened by wccccp about 1 month ago
- 3 comments
#1375 - [JAX] Bug Fix: Softmax FFIs with correct Encapsulates
Pull Request -
State: closed - Opened by phu0ngng about 1 month ago
- 1 comment
#1374 - [PyTorch] Add weights_only=False for torch.load
Pull Request -
State: closed - Opened by cyanguwa about 1 month ago
- 1 comment
Labels: 1.14.0
#1373 - [MoE][PyTorch] Add mask-based MoE permutation
Pull Request -
State: open - Opened by hxbai about 1 month ago
#1372 - Should cublasLtHandle_t be Destroyed?
Issue -
State: open - Opened by shenzhenghai about 1 month ago
- 2 comments
#1371 - Add user to CI
Pull Request -
State: closed - Opened by ksivaman about 1 month ago
#1370 - [common] Add max_t support for KV in THD
Pull Request -
State: closed - Opened by cyanguwa about 1 month ago
- 1 comment
Labels: 1.14.0
#1369 - [common/PyTorch] Add FusedAttention support for SWA (left, right)
Pull Request -
State: open - Opened by cyanguwa about 1 month ago
- 1 comment
#1368 - How to use thd format qkv with cp + packed_seq_params
Issue -
State: open - Opened by Wraythh about 1 month ago
- 4 comments
#1366 - [JAX] Bug fix for distributed normalization
Pull Request -
State: closed - Opened by phu0ngng about 1 month ago
- 1 comment
Labels: 1.14.0
#1365 - TypeError: UbufP2PCommOverlap(): incompatible function arguments.
Issue -
State: closed - Opened by sallyjunjun about 1 month ago
- 5 comments
#1364 - [JAX] Use default factory for not sharing mutable default values
Pull Request -
State: closed - Opened by zlsh80826 about 1 month ago
- 2 comments
#1364 - [JAX] Use default factory for not sharing mutable default values
Pull Request -
State: closed - Opened by zlsh80826 about 1 month ago
- 2 comments
#1363 - The comm/gemm overlap example failed with "ran out of input".
Issue -
State: closed - Opened by wujingyue about 1 month ago
- 2 comments
#1362 - Fix an invalid reference in the doc
Pull Request -
State: open - Opened by wujingyue about 1 month ago
#1362 - Fix an invalid reference in the doc
Pull Request -
State: closed - Opened by wujingyue about 1 month ago
- 1 comment
#1361 - [JAX] Bug Fix: WeightInit with field
Pull Request -
State: closed - Opened by phu0ngng about 1 month ago
- 1 comment
#1360 - [Bug] Failed to pass pytorch's numerical test on A800 SXM
Issue -
State: closed - Opened by junjzhang about 1 month ago
- 1 comment
#1360 - [Bug] Failed to pass pytorch's numerical test on A800 SXM
Issue -
State: closed - Opened by junjzhang about 1 month ago
- 1 comment
#1359 - support float8 in flash-attn v3
Issue -
State: open - Opened by Monekyzoon about 1 month ago
#1359 - support float8 in flash-attn v3
Issue -
State: open - Opened by Monekyzoon about 1 month ago
#1358 - Enabling FP8 all-gather for TE Float8Tensor when using Torch FSDP2
Pull Request -
State: closed - Opened by youngeunkwon0405 about 1 month ago
- 1 comment
#1357 - Disable FP8 in Mcore integration test on older GPUs
Pull Request -
State: closed - Opened by timmoon10 about 1 month ago
- 1 comment
Labels: bug, testing, 1.14.0
#1356 - [JAX] Move parallel encoder tests to L0 distributed test set.
Pull Request -
State: closed - Opened by phu0ngng about 1 month ago
- 1 comment
#1355 - Add paged attention support
Pull Request -
State: open - Opened by cyanguwa about 2 months ago
- 2 comments
#1354 - Fix attention mask type for Flash Attention + CP + THD
Pull Request -
State: closed - Opened by xrennvidia about 2 months ago
- 1 comment
#1353 - overlapping issue about backward of LayerNormLinear
Issue -
State: closed - Opened by cos120 about 2 months ago
- 5 comments
#1352 - [JAX] Fused attention unit tests fixes and refinements
Pull Request -
State: closed - Opened by zlsh80826 about 2 months ago
- 6 comments
#1351 - Can this project support jetson orin nx?
Issue -
State: closed - Opened by zzk2021 about 2 months ago
#1350 - te.TransformerLayer fails on H100 with cudnn errors.
Issue -
State: closed - Opened by wujingyue about 2 months ago
- 2 comments
#1349 - Support more than 1 shape/attention_params for DotProductAttention decision cache
Issue -
State: open - Opened by parthmannan about 2 months ago
#1349 - Support more than 1 shape/attention_params for DotProductAttention decision cache
Issue -
State: open - Opened by parthmannan about 2 months ago
#1348 - [Bug] attention_backend update throttle
Issue -
State: closed - Opened by Jianbing-D about 2 months ago
- 1 comment
#1347 - [JAX] Scale sequence length in CP tests to avoid tiny sizes.
Pull Request -
State: closed - Opened by mgoldfarb-nvidia about 2 months ago
- 2 comments
#1346 - [Draft] Introduce NVSHMEM based communication API for pytorch
Pull Request -
State: open - Opened by gdengk about 2 months ago
#1345 - Fix cuda graph capture for grouped gemm
Pull Request -
State: closed - Opened by xrennvidia about 2 months ago
- 3 comments
#1344 - How to setup TP Overlap configs
Issue -
State: open - Opened by TJ-Solergibert about 2 months ago
- 1 comment
#1343 - [PyTorch] Adding TP overlap support for `te.Linear` with `parallel_mode="column"`
Pull Request -
State: closed - Opened by denera about 2 months ago
- 3 comments
Labels: enhancement, 1.14.0
#1342 - [Core] Add function to convert container to string
Pull Request -
State: closed - Opened by timmoon10 about 2 months ago
- 1 comment
#1341 - [PyTorch] Bugfix for wgrad bulk overlap conflict when dgrad overlap is reduce-scatter
Pull Request -
State: open - Opened by denera 2 months ago
- 2 comments
Labels: bug
#1340 - Update list of CI users
Pull Request -
State: closed - Opened by timmoon10 2 months ago
- 1 comment
Labels: testing
#1340 - Update list of CI users
Pull Request -
State: closed - Opened by timmoon10 2 months ago
- 1 comment
Labels: testing
#1339 - [Common] Moved framework agnostic THD kernels to common.
Pull Request -
State: closed - Opened by mgoldfarb-nvidia 2 months ago
- 8 comments
#1338 - Debug nightly docs
Pull Request -
State: closed - Opened by timmoon10 2 months ago
- 1 comment
Labels: documentation, testing
#1337 - [C/JAX] Comm+GEMM Overlap API for TE/JAX
Pull Request -
State: open - Opened by denera 2 months ago
Labels: enhancement, jax
#1337 - [C/JAX] Comm+GEMM Overlap API for TE/JAX
Pull Request -
State: open - Opened by denera 2 months ago
Labels: enhancement, jax
#1336 - the max error of moe_permute/unpermute.grad could reach 3.6e+00
Issue -
State: open - Opened by NiuMa-1234 2 months ago
- 1 comment
#1335 - [PyTorch] Store module extra state in tensor
Pull Request -
State: open - Opened by timmoon10 2 months ago
- 1 comment
Labels: bug
#1335 - [PyTorch] Store module extra state in tensor
Pull Request -
State: closed - Opened by timmoon10 2 months ago
- 1 comment
Labels: bug, 1.14.0
#1334 - [PyTorch] Fix multiple calls to saved_tensors in CP attention
Pull Request -
State: closed - Opened by ksivaman 2 months ago
- 1 comment
Labels: bug
#1334 - [PyTorch] Fix multiple calls to saved_tensors in CP attention
Pull Request -
State: closed - Opened by ksivaman 2 months ago
- 1 comment
Labels: bug
#1333 - Use `CMAKE_CURRENT_SOURCE_DIR` instead of `CMAKE_SOURCE_DIR`
Pull Request -
State: closed - Opened by kmaehashi 2 months ago
#1333 - Use `CMAKE_CURRENT_SOURCE_DIR` instead of `CMAKE_SOURCE_DIR`
Pull Request -
State: closed - Opened by kmaehashi 2 months ago
#1332 - [TP comm overlap unit test]`CUDA Error: misaligned address` error when testing with recent cublas (or pytorch container)
Issue -
State: open - Opened by erhoo82 2 months ago
- 3 comments
#1332 - [TP comm overlap unit test]`CUDA Error: misaligned address` error when testing with recent cublas (or pytorch container)
Issue -
State: open - Opened by erhoo82 2 months ago
- 4 comments
#1331 - [JAX] WIP Added L0 Distributed Tests
Pull Request -
State: open - Opened by phu0ngng 2 months ago
#1331 - [JAX] WIP Added L0 Distributed Tests
Pull Request -
State: closed - Opened by phu0ngng 2 months ago
#1330 - [Dummy] Testing branch for #1326
Pull Request -
State: closed - Opened by timmoon10 2 months ago
Labels: invalid
#1330 - [Dummy] Testing branch for #1326
Pull Request -
State: closed - Opened by timmoon10 2 months ago
Labels: invalid
#1329 - [PyTorch] Integration test for Megatron-LM
Pull Request -
State: closed - Opened by timmoon10 2 months ago
- 2 comments
Labels: bug, 1.13.0
#1329 - [PyTorch] Integration test for Megatron-LM
Pull Request -
State: closed - Opened by timmoon10 2 months ago
- 2 comments
Labels: bug, 1.13.0
#1328 - [PyTorch] Fix GQA error message
Pull Request -
State: closed - Opened by cyanguwa 2 months ago
- 1 comment
Labels: 1.13.0
#1328 - [PyTorch] Fix GQA error message
Pull Request -
State: closed - Opened by cyanguwa 2 months ago
- 1 comment
Labels: 1.13.0