Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / NVIDIA/TransformerEngine issues and pull requests
#74 - Sequence-parallel amax reduction fix
Pull Request -
State: closed - Opened by ksivaman over 1 year ago
- 6 comments
#73 - New fp8_transpose_dbias kernel
Pull Request -
State: closed - Opened by vasunvidia over 1 year ago
- 2 comments
#72 - Gradient enablement bug fix
Pull Request -
State: closed - Opened by ksivaman almost 2 years ago
- 2 comments
#71 - Support simulating FP8 on older hardware
Issue -
State: open - Opened by zplizzi almost 2 years ago
- 1 comment
Labels: enhancement
#70 - Fix gradients when using AMP
Pull Request -
State: closed - Opened by ksivaman almost 2 years ago
- 1 comment
#69 - Installation errors on Ampere GPUs
Issue -
State: open - Opened by realAsma almost 2 years ago
- 3 comments
Labels: documentation
#68 - New transpose_dbias kernel
Pull Request -
State: closed - Opened by vasunvidia almost 2 years ago
- 1 comment
#67 - Zero-centered gamma support in LayerNorm (LayerNorm1p)
Pull Request -
State: closed - Opened by ptrendx almost 2 years ago
- 6 comments
#66 - QKV parameters unfused path fixes and optimization
Pull Request -
State: closed - Opened by ksivaman almost 2 years ago
- 8 comments
#65 - Bug fixes from PR 22
Pull Request -
State: closed - Opened by ksivaman almost 2 years ago
- 6 comments
#64 - remove d2d copies
Pull Request -
State: closed - Opened by ksivaman almost 2 years ago
- 3 comments
#63 - Address steady memory increase and bloated checkpoints
Pull Request -
State: closed - Opened by ksivaman almost 2 years ago
- 2 comments
#62 - flash-attn integration
Pull Request -
State: closed - Opened by cyanguwa almost 2 years ago
- 8 comments
#61 - Add docs for FP8 calibration
Pull Request -
State: closed - Opened by ksivaman almost 2 years ago
- 1 comment
#60 - Fix the integer overflow in fused softmax
Pull Request -
State: closed - Opened by ptrendx almost 2 years ago
- 2 comments
#59 - Numerics fix from #40
Pull Request -
State: closed - Opened by ksivaman almost 2 years ago
- 3 comments
#58 - Bug fixes from #40
Pull Request -
State: closed - Opened by ksivaman almost 2 years ago
- 2 comments
#57 - Add margin for LayerNorm kernel SM usage
Pull Request -
State: closed - Opened by ksivaman almost 2 years ago
- 2 comments
#56 - Remove intermediate dispatch functions
Pull Request -
State: closed - Opened by ksivaman almost 2 years ago
- 1 comment
#55 - Fix NVTX name for LN backward
Pull Request -
State: closed - Opened by ksivaman almost 2 years ago
#54 - Add TE/JAX high-level modules, unittests and examples
Pull Request -
State: closed - Opened by jeng1220 almost 2 years ago
- 11 comments
#53 - add building workflow for TE/Jax
Pull Request -
State: closed - Opened by jeng1220 almost 2 years ago
- 18 comments
#52 - Indexing fix for bug in virtual interleaved pipelining configs
Pull Request -
State: closed - Opened by ksivaman almost 2 years ago
- 5 comments
#51 - Move calculation of scale inverse to framework
Pull Request -
State: closed - Opened by ksivaman almost 2 years ago
- 5 comments
#50 - Add NVTX to TE modules
Pull Request -
State: closed - Opened by ptrendx almost 2 years ago
- 4 comments
#49 - Enforce boolean attention mask type
Pull Request -
State: closed - Opened by ksivaman almost 2 years ago
- 2 comments
#48 - Update copyright year
Pull Request -
State: closed - Opened by ptrendx almost 2 years ago
- 2 comments
#47 - Add GeGLU and the corresponding gradient kernels
Pull Request -
State: closed - Opened by zlsh80826 almost 2 years ago
- 4 comments
#46 - Reduce unit tests time
Pull Request -
State: closed - Opened by zlsh80826 almost 2 years ago
- 3 comments
#45 - Add RMSNorm
Pull Request -
State: closed - Opened by zlsh80826 almost 2 years ago
- 4 comments
#44 - Docs: remove build warnings and add FP8 caching note
Pull Request -
State: closed - Opened by ksivaman almost 2 years ago
- 2 comments
#43 - Fix in MHA cross attention path
Pull Request -
State: closed - Opened by ksivaman almost 2 years ago
- 6 comments
#42 - Fix LayerNorm API param names
Pull Request -
State: closed - Opened by ksivaman almost 2 years ago
- 1 comment
#41 - Add ONNX export support for TE modules
Pull Request -
State: closed - Opened by asfiyab-nvidia almost 2 years ago
- 13 comments
#40 - Schetlur/fp8 calibration
Pull Request -
State: closed - Opened by schetlur-nv almost 2 years ago
- 6 comments
#39 - Standardize formatting
Pull Request -
State: closed - Opened by ksivaman almost 2 years ago
#38 - Ensure contiguous inputs
Pull Request -
State: closed - Opened by ksivaman almost 2 years ago
- 1 comment
#37 - Softmax docstrings and type fixes
Pull Request -
State: closed - Opened by ksivaman almost 2 years ago
- 1 comment
#36 - Link performance optimization tutorial to docs
Pull Request -
State: closed - Opened by ptrendx almost 2 years ago
#35 - cleanup pylintrc
Pull Request -
State: closed - Opened by ksivaman almost 2 years ago
#34 - Fix illegal memory access in general layer norm backward kernel
Pull Request -
State: closed - Opened by timmoon10 almost 2 years ago
#33 - Move the amax/scale/scale_inv into the TE Tensor struct.
Pull Request -
State: closed - Opened by ptrendx almost 2 years ago
- 6 comments
#32 - Don't update FP8 weights during validation/inference
Pull Request -
State: closed - Opened by ksivaman almost 2 years ago
- 2 comments
#31 - Full activation recompute checkpointing bug fix
Pull Request -
State: closed - Opened by ksivaman almost 2 years ago
- 1 comment
#30 - Framework agnostic softmax kernels
Pull Request -
State: closed - Opened by ksivaman almost 2 years ago
- 3 comments
#29 - Fixes #26
Pull Request -
State: closed - Opened by ksivaman almost 2 years ago
- 1 comment
#28 - Fix the out-of-bounds access in the C+T+dbias kernel
Pull Request -
State: closed - Opened by ptrendx about 2 years ago
- 3 comments
#27 - Update README.md
Pull Request -
State: closed - Opened by nzmora-nvidia about 2 years ago
#26 - The fc2.bias of LayerNormMLP is not used
Issue -
State: closed - Opened by wkcn about 2 years ago
#25 - Incorrect parameter in landing page example.
Issue -
State: closed - Opened by jomayeri about 2 years ago
- 1 comment
#24 - Fix bugs for full activation recompute in FP8
Pull Request -
State: closed - Opened by ksivaman about 2 years ago
- 7 comments
#23 - [DO NOT MERGE UPSTREAM]
Pull Request -
State: closed - Opened by mjsML about 2 years ago
- 3 comments
#22 - Increase number of FP8 tensors per GEMM
Pull Request -
State: closed - Opened by vasunvidia about 2 years ago
- 7 comments
#21 - Conditional wgrad support
Pull Request -
State: closed - Opened by schetlur-nv about 2 years ago
- 3 comments
#20 - Documentation for advanced performance optimizations
Pull Request -
State: closed - Opened by timmoon10 about 2 years ago
- 5 comments
#19 - Add pylint to Lint action
Pull Request -
State: closed - Opened by ptrendx about 2 years ago
- 2 comments
#18 - Multi-tensor cast-transpose
Pull Request -
State: closed - Opened by timmoon10 about 2 years ago
- 5 comments
#17 - Please consider supporting Windows
Issue -
State: closed - Opened by C43H66N12O12S2 about 2 years ago
- 4 comments
#16 - Test
Pull Request -
State: closed - Opened by cyanguwa about 2 years ago
#15 - It doesn't support the latest RTX 40-series card
Issue -
State: closed - Opened by hxssgaa about 2 years ago
- 30 comments
#14 - Add link to the documentation archives in the docs
Pull Request -
State: closed - Opened by ptrendx about 2 years ago
#13 - Test build as GitHub action
Pull Request -
State: closed - Opened by ptrendx about 2 years ago
#12 - Test Blossom CI
Pull Request -
State: closed - Opened by ptrendx about 2 years ago
- 26 comments
#11 - Make amax reduction optional
Pull Request -
State: closed - Opened by ksivaman about 2 years ago
- 2 comments
#10 - Add C++ lint as GitHub action
Pull Request -
State: closed - Opened by ptrendx about 2 years ago
- 1 comment
#9 - Add Blossom CI yml
Pull Request -
State: closed - Opened by ptrendx about 2 years ago
- 1 comment
#8 - Remove fp8_out from the LN API
Pull Request -
State: closed - Opened by ptrendx about 2 years ago
- 2 comments
#7 - Remove pytest-runner from setup requirements
Pull Request -
State: closed - Opened by ksivaman about 2 years ago
#6 - Fix docs for default FP8 format in recipe
Pull Request -
State: closed - Opened by ksivaman about 2 years ago
#5 - Efficient Multi-Head Attention (EMHA) support
Pull Request -
State: closed - Opened by ksivaman about 2 years ago
- 2 comments
#4 - Bug fix for distributed TE case
Pull Request -
State: closed - Opened by ksivaman about 2 years ago
#3 - Add checks for tensor parallel use case to ensure all-reduce is called only when necessary
Pull Request -
State: closed - Opened by ksivaman about 2 years ago
#2 - fp8_autocast bug fix when switching from non-fp8 execution
Pull Request -
State: closed - Opened by ksivaman about 2 years ago
#1 - Added the link to the User Guide
Pull Request -
State: closed - Opened by ptrendx about 2 years ago