Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / NVIDIA/TransformerEngine issues and pull requests
#762 - Could TransformerEngine work with Deepspeed Zero w/ offloading?
Issue -
State: open - Opened by leiwen83 10 months ago
- 1 comment
Labels: question
#700 - ERROR: Failed building wheel for transformer-engine
Issue -
State: closed - Opened by ShabnamRA 11 months ago
- 7 comments
Labels: build
#694 - main branch cannot compile due to incompatibility with the main branch of cudnn-frontend
Issue -
State: closed - Opened by lucifer1004 11 months ago
- 2 comments
Labels: build
#689 - Version constraint of `flash-attn` needs to be updated
Issue -
State: closed - Opened by lucifer1004 11 months ago
- 3 comments
#683 - Doesn't work on wsl2
Issue -
State: open - Opened by Pzzzzz5142 11 months ago
- 5 comments
#679 - [Feature Request] Grouped GEMM kernel
Issue -
State: open - Opened by LiyuanLucasLiu 11 months ago
- 1 comment
Labels: enhancement
#651 - TransformerEngine v1.2.1 throws CuDNN frontend error on H100 GPU (AWS p5.48xlarge instance)
Issue -
State: open - Opened by sirutBuasai 12 months ago
- 10 comments
Labels: bug
#651 - TransformerEngine v1.2.1 throws CuDNN frontend error on H100 GPU (AWS p5.48xlarge instance)
Issue -
State: open - Opened by sirutBuasai 12 months ago
- 10 comments
Labels: bug
#553 - installing error
Issue -
State: closed - Opened by foreverpiano about 1 year ago
- 1 comment
#526 - Failed Installation
Issue -
State: closed - Opened by sudy-super about 1 year ago
- 1 comment
#517 - [Common][PyTorch] Fused `apply_rotorary_pos_emb`
Pull Request -
State: closed - Opened by yaox12 about 1 year ago
- 10 comments
#516 - question for building wheel for transformer-engine
Issue -
State: open - Opened by Mrzhang-dada about 1 year ago
- 6 comments
#459 - Failed building wheel for transformer-engine
Issue -
State: closed - Opened by RuslanSel over 1 year ago
- 3 comments
#456 - Dummy PR to test CI
Pull Request -
State: closed - Opened by timmoon10 over 1 year ago
- 15 comments
Labels: invalid
#456 - Dummy PR to test CI
Pull Request -
State: closed - Opened by timmoon10 over 1 year ago
- 15 comments
Labels: invalid
#359 - Optimize flash-attention transposes
Pull Request -
State: closed - Opened by ksivaman over 1 year ago
- 1 comment
#355 - Installation failed with cmake error
Issue -
State: closed - Opened by RuiWang1998 over 1 year ago
- 23 comments
#298 - ModuleNotFoundError: No module named 'torch'
Issue -
State: closed - Opened by conceptofmind over 1 year ago
- 6 comments
#298 - ModuleNotFoundError: No module named 'torch'
Issue -
State: closed - Opened by conceptofmind over 1 year ago
- 6 comments
#235 - Refactor build system
Pull Request -
State: closed - Opened by timmoon10 over 1 year ago
- 7 comments
#100 - Update PyTorch comm API
Pull Request -
State: closed - Opened by ksivaman almost 2 years ago
- 1 comment
#99 - Fix FlashAttention tests
Pull Request -
State: closed - Opened by tcherckez-nvidia almost 2 years ago
- 12 comments
#98 - Adding JAX to README.rst
Pull Request -
State: closed - Opened by mingxu1067 almost 2 years ago
- 2 comments
#97 - Catch FP8 modulo16 error before cublas and fp8 kernels
Pull Request -
State: closed - Opened by ksivaman almost 2 years ago
- 1 comment
#96 - [WIP] add cuDNN Flash Attention for FP8
Pull Request -
State: closed - Opened by cyanguwa almost 2 years ago
#95 - Add a temporary workaround to layernorm ONNX export
Pull Request -
State: closed - Opened by nzmora-nvidia almost 2 years ago
- 6 comments
#94 - Add an option to serialize test i/o to file
Pull Request -
State: closed - Opened by nzmora-nvidia almost 2 years ago
- 1 comment
#93 - Raise autocast usage error
Pull Request -
State: closed - Opened by ksivaman almost 2 years ago
- 4 comments
#92 - Move from Sphinx Autodoc to sphinx-autoapi
Pull Request -
State: closed - Opened by ptrendx almost 2 years ago
- 1 comment
#91 - Fix the link to the documentation archives
Pull Request -
State: closed - Opened by ptrendx almost 2 years ago
- 1 comment
#90 - deprecate qk layer scaling and fp32 softmax args
Pull Request -
State: closed - Opened by ksivaman almost 2 years ago
- 2 comments
#89 - Adding slice to fix failure with multi-devices.
Pull Request -
State: closed - Opened by mingxu1067 almost 2 years ago
- 1 comment
#88 - Exporting MajorShardingType, ShardingType and LayerNorm for TE/JAX.
Pull Request -
State: closed - Opened by mingxu1067 almost 2 years ago
- 1 comment
#87 - Adding documents to TE/JAX
Pull Request -
State: closed - Opened by mingxu1067 almost 2 years ago
- 10 comments
#86 - Separate linting passes for PyTorch and JAX
Pull Request -
State: closed - Opened by timmoon10 almost 2 years ago
- 2 comments
Labels: enhancement
#85 - Add TensorFlow module and extensions
Pull Request -
State: closed - Opened by trevor-m almost 2 years ago
- 7 comments
#84 - Fix flash attention
Pull Request -
State: closed - Opened by ksivaman almost 2 years ago
- 5 comments
#83 - Fix unfused QKV params case; stack vs interleave option
Pull Request -
State: closed - Opened by ksivaman almost 2 years ago
- 2 comments
#82 - 3rd party acknowledgements
Pull Request -
State: closed - Opened by ksivaman almost 2 years ago
#81 - fix bug in non-FP8 nvfuser path
Pull Request -
State: closed - Opened by ksivaman almost 2 years ago
- 1 comment
#80 - Relax checks for flash-attn
Pull Request -
State: closed - Opened by cyanguwa almost 2 years ago
- 4 comments
#79 - Remove redundant AR for SP case
Pull Request -
State: closed - Opened by ksivaman almost 2 years ago
- 4 comments
#78 - Move TE/PyTorch UT to tests/pytorch/
Pull Request -
State: closed - Opened by jeng1220 almost 2 years ago
- 5 comments
#77 - Change version to 0.7.0dev
Pull Request -
State: closed - Opened by ksivaman almost 2 years ago
#76 - Add an option to serialize test i/o to file
Pull Request -
State: closed - Opened by nzmora-nvidia almost 2 years ago
- 4 comments
#75 - Support arbitrary output dtypes in PyT GEMM functions
Pull Request -
State: closed - Opened by timmoon10 almost 2 years ago
- 3 comments
Labels: enhancement
#74 - Sequence-parallel amax reduction fix
Pull Request -
State: closed - Opened by ksivaman almost 2 years ago
- 6 comments
#73 - New fp8_transpose_dbias kernel
Pull Request -
State: closed - Opened by vasunvidia almost 2 years ago
- 2 comments
#72 - Gradient enablement bug fix
Pull Request -
State: closed - Opened by ksivaman almost 2 years ago
- 2 comments
#71 - Support simulating FP8 on older hardware
Issue -
State: open - Opened by zplizzi almost 2 years ago
- 1 comment
Labels: enhancement
#70 - Fix gradients when using AMP
Pull Request -
State: closed - Opened by ksivaman almost 2 years ago
- 1 comment
#69 - Installation errors on Ampere GPUs
Issue -
State: open - Opened by realAsma almost 2 years ago
- 3 comments
Labels: documentation
#68 - New transpose_dbias kernel
Pull Request -
State: closed - Opened by vasunvidia almost 2 years ago
- 1 comment
#67 - Zero-centered gamma support in LayerNorm (LayerNorm1p)
Pull Request -
State: closed - Opened by ptrendx almost 2 years ago
- 6 comments
#66 - QKV parameters unfused path fixes and optimization
Pull Request -
State: closed - Opened by ksivaman almost 2 years ago
- 8 comments
#65 - Bug fixes from PR 22
Pull Request -
State: closed - Opened by ksivaman almost 2 years ago
- 6 comments
#64 - remove d2d copies
Pull Request -
State: closed - Opened by ksivaman almost 2 years ago
- 3 comments
#63 - Address steady memory increase and bloated checkpoints
Pull Request -
State: closed - Opened by ksivaman almost 2 years ago
- 2 comments
#62 - flash-attn integration
Pull Request -
State: closed - Opened by cyanguwa almost 2 years ago
- 8 comments
#61 - Add docs for FP8 calibration
Pull Request -
State: closed - Opened by ksivaman about 2 years ago
- 1 comment
#60 - Fix the integer overflow in fused softmax
Pull Request -
State: closed - Opened by ptrendx about 2 years ago
- 2 comments
#59 - Numerics fix from #40
Pull Request -
State: closed - Opened by ksivaman about 2 years ago
- 3 comments
#58 - Bug fixes from #40
Pull Request -
State: closed - Opened by ksivaman about 2 years ago
- 2 comments
#57 - Add margin for LayerNorm kernel SM usage
Pull Request -
State: closed - Opened by ksivaman about 2 years ago
- 2 comments
#56 - Remove intermediate dispatch functions
Pull Request -
State: closed - Opened by ksivaman about 2 years ago
- 1 comment
#55 - Fix NVTX name for LN backward
Pull Request -
State: closed - Opened by ksivaman about 2 years ago
#54 - Add TE/JAX high-level modules, unittests and examples
Pull Request -
State: closed - Opened by jeng1220 about 2 years ago
- 11 comments
#53 - add building workflow for TE/Jax
Pull Request -
State: closed - Opened by jeng1220 about 2 years ago
- 18 comments
#52 - Indexing fix for bug in virtual interleaved pipelining configs
Pull Request -
State: closed - Opened by ksivaman about 2 years ago
- 5 comments
#51 - Move calculation of scale inverse to framework
Pull Request -
State: closed - Opened by ksivaman about 2 years ago
- 5 comments
#50 - Add NVTX to TE modules
Pull Request -
State: closed - Opened by ptrendx about 2 years ago
- 4 comments
#49 - Enforce boolean attention mask type
Pull Request -
State: closed - Opened by ksivaman about 2 years ago
- 2 comments
#48 - Update copyright year
Pull Request -
State: closed - Opened by ptrendx about 2 years ago
- 2 comments
#47 - Add GeGLU and the corresponding gradient kernels
Pull Request -
State: closed - Opened by zlsh80826 about 2 years ago
- 4 comments
#46 - Reduce unit tests time
Pull Request -
State: closed - Opened by zlsh80826 about 2 years ago
- 3 comments
#45 - Add RMSNorm
Pull Request -
State: closed - Opened by zlsh80826 about 2 years ago
- 4 comments
#44 - Docs: remove build warnings and add FP8 caching note
Pull Request -
State: closed - Opened by ksivaman about 2 years ago
- 2 comments
#43 - Fix in MHA cross attention path
Pull Request -
State: closed - Opened by ksivaman about 2 years ago
- 6 comments
#42 - Fix LayerNorm API param names
Pull Request -
State: closed - Opened by ksivaman about 2 years ago
- 1 comment
#41 - Add ONNX export support for TE modules
Pull Request -
State: closed - Opened by asfiyab-nvidia about 2 years ago
- 13 comments
#40 - Schetlur/fp8 calibration
Pull Request -
State: closed - Opened by schetlur-nv about 2 years ago
- 6 comments
#39 - Standardize formatting
Pull Request -
State: closed - Opened by ksivaman about 2 years ago
#38 - Ensure contiguous inputs
Pull Request -
State: closed - Opened by ksivaman about 2 years ago
- 1 comment
#37 - Softmax docstrings and type fixes
Pull Request -
State: closed - Opened by ksivaman about 2 years ago
- 1 comment
#36 - Link performance optimization tutorial to docs
Pull Request -
State: closed - Opened by ptrendx about 2 years ago
#35 - cleanup pylintrc
Pull Request -
State: closed - Opened by ksivaman about 2 years ago
#34 - Fix illegal memory access in general layer norm backward kernel
Pull Request -
State: closed - Opened by timmoon10 about 2 years ago
#33 - Move the amax/scale/scale_inv into the TE Tensor struct.
Pull Request -
State: closed - Opened by ptrendx about 2 years ago
- 6 comments
#32 - Don't update FP8 weights during validation/inference
Pull Request -
State: closed - Opened by ksivaman about 2 years ago
- 2 comments
#31 - Full activation recompute checkpointing bug fix
Pull Request -
State: closed - Opened by ksivaman about 2 years ago
- 1 comment
#30 - Framework agnostic softmax kernels
Pull Request -
State: closed - Opened by ksivaman about 2 years ago
- 3 comments
#29 - Fixes #26
Pull Request -
State: closed - Opened by ksivaman about 2 years ago
- 1 comment
#28 - Fix the out-of-bounds access in the C+T+dbias kernel
Pull Request -
State: closed - Opened by ptrendx about 2 years ago
- 3 comments
#27 - Update README.md
Pull Request -
State: closed - Opened by nzmora-nvidia about 2 years ago
#26 - The fc2.bias of LayerNormMLP is not used
Issue -
State: closed - Opened by wkcn about 2 years ago
#25 - Incorrect parameter in landing page example.
Issue -
State: closed - Opened by jomayeri about 2 years ago
- 1 comment
#24 - Fix bugs for full activation recompute in FP8
Pull Request -
State: closed - Opened by ksivaman about 2 years ago
- 7 comments
#23 - [DO NOT MERGE UPSTREAM]
Pull Request -
State: closed - Opened by mjsML about 2 years ago
- 3 comments
#22 - Increase number of FP8 tensors per GEMM
Pull Request -
State: closed - Opened by vasunvidia over 2 years ago
- 7 comments
#21 - Conditional wgrad support
Pull Request -
State: closed - Opened by schetlur-nv over 2 years ago
- 3 comments