Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / NVIDIA/TransformerEngine issues and pull requests

#762 - Could TransformerEngine work with Deepspeed Zero w/ offloading?

Issue - State: open - Opened by leiwen83 10 months ago - 1 comment
Labels: question

#700 - ERROR: Failed building wheel for transformer-engine

Issue - State: closed - Opened by ShabnamRA 11 months ago - 7 comments
Labels: build

#694 - main branch cannot compile due to incompatibility with the main branch of cudnn-frontend

Issue - State: closed - Opened by lucifer1004 11 months ago - 2 comments
Labels: build

#689 - Version constraint of `flash-attn` needs to be updated

Issue - State: closed - Opened by lucifer1004 11 months ago - 3 comments

#683 - Doesn't work on wsl2

Issue - State: open - Opened by Pzzzzz5142 11 months ago - 5 comments

#679 - [Feature Request] Grouped GEMM kernel

Issue - State: open - Opened by LiyuanLucasLiu 11 months ago - 1 comment
Labels: enhancement

#553 - installing error

Issue - State: closed - Opened by foreverpiano about 1 year ago - 1 comment

#526 - Failed Installation

Issue - State: closed - Opened by sudy-super about 1 year ago - 1 comment

#517 - [Common][PyTorch] Fused `apply_rotorary_pos_emb`

Pull Request - State: closed - Opened by yaox12 about 1 year ago - 10 comments

#516 - question for building wheel for transformer-engine

Issue - State: open - Opened by Mrzhang-dada about 1 year ago - 6 comments

#459 - Failed building wheel for transformer-engine

Issue - State: closed - Opened by RuslanSel over 1 year ago - 3 comments

#456 - Dummy PR to test CI

Pull Request - State: closed - Opened by timmoon10 over 1 year ago - 15 comments
Labels: invalid

#456 - Dummy PR to test CI

Pull Request - State: closed - Opened by timmoon10 over 1 year ago - 15 comments
Labels: invalid

#359 - Optimize flash-attention transposes

Pull Request - State: closed - Opened by ksivaman over 1 year ago - 1 comment

#355 - Installation failed with cmake error

Issue - State: closed - Opened by RuiWang1998 over 1 year ago - 23 comments

#298 - ModuleNotFoundError: No module named 'torch'

Issue - State: closed - Opened by conceptofmind over 1 year ago - 6 comments

#298 - ModuleNotFoundError: No module named 'torch'

Issue - State: closed - Opened by conceptofmind over 1 year ago - 6 comments

#235 - Refactor build system

Pull Request - State: closed - Opened by timmoon10 over 1 year ago - 7 comments

#100 - Update PyTorch comm API

Pull Request - State: closed - Opened by ksivaman almost 2 years ago - 1 comment

#99 - Fix FlashAttention tests

Pull Request - State: closed - Opened by tcherckez-nvidia almost 2 years ago - 12 comments

#98 - Adding JAX to README.rst

Pull Request - State: closed - Opened by mingxu1067 almost 2 years ago - 2 comments

#97 - Catch FP8 modulo16 error before cublas and fp8 kernels

Pull Request - State: closed - Opened by ksivaman almost 2 years ago - 1 comment

#96 - [WIP] add cuDNN Flash Attention for FP8

Pull Request - State: closed - Opened by cyanguwa almost 2 years ago

#95 - Add a temporary workaround to layernorm ONNX export

Pull Request - State: closed - Opened by nzmora-nvidia almost 2 years ago - 6 comments

#94 - Add an option to serialize test i/o to file

Pull Request - State: closed - Opened by nzmora-nvidia almost 2 years ago - 1 comment

#93 - Raise autocast usage error

Pull Request - State: closed - Opened by ksivaman almost 2 years ago - 4 comments

#92 - Move from Sphinx Autodoc to sphinx-autoapi

Pull Request - State: closed - Opened by ptrendx almost 2 years ago - 1 comment

#91 - Fix the link to the documentation archives

Pull Request - State: closed - Opened by ptrendx almost 2 years ago - 1 comment

#90 - deprecate qk layer scaling and fp32 softmax args

Pull Request - State: closed - Opened by ksivaman almost 2 years ago - 2 comments

#89 - Adding slice to fix failure with multi-devices.

Pull Request - State: closed - Opened by mingxu1067 almost 2 years ago - 1 comment

#88 - Exporting MajorShardingType, ShardingType and LayerNorm for TE/JAX.

Pull Request - State: closed - Opened by mingxu1067 almost 2 years ago - 1 comment

#87 - Adding documents to TE/JAX

Pull Request - State: closed - Opened by mingxu1067 almost 2 years ago - 10 comments

#86 - Separate linting passes for PyTorch and JAX

Pull Request - State: closed - Opened by timmoon10 almost 2 years ago - 2 comments
Labels: enhancement

#85 - Add TensorFlow module and extensions

Pull Request - State: closed - Opened by trevor-m almost 2 years ago - 7 comments

#84 - Fix flash attention

Pull Request - State: closed - Opened by ksivaman almost 2 years ago - 5 comments

#83 - Fix unfused QKV params case; stack vs interleave option

Pull Request - State: closed - Opened by ksivaman almost 2 years ago - 2 comments

#82 - 3rd party acknowledgements

Pull Request - State: closed - Opened by ksivaman almost 2 years ago

#81 - fix bug in non-FP8 nvfuser path

Pull Request - State: closed - Opened by ksivaman almost 2 years ago - 1 comment

#80 - Relax checks for flash-attn

Pull Request - State: closed - Opened by cyanguwa almost 2 years ago - 4 comments

#79 - Remove redundant AR for SP case

Pull Request - State: closed - Opened by ksivaman almost 2 years ago - 4 comments

#78 - Move TE/PyTorch UT to tests/pytorch/

Pull Request - State: closed - Opened by jeng1220 almost 2 years ago - 5 comments

#77 - Change version to 0.7.0dev

Pull Request - State: closed - Opened by ksivaman almost 2 years ago

#76 - Add an option to serialize test i/o to file

Pull Request - State: closed - Opened by nzmora-nvidia almost 2 years ago - 4 comments

#75 - Support arbitrary output dtypes in PyT GEMM functions

Pull Request - State: closed - Opened by timmoon10 almost 2 years ago - 3 comments
Labels: enhancement

#74 - Sequence-parallel amax reduction fix

Pull Request - State: closed - Opened by ksivaman almost 2 years ago - 6 comments

#73 - New fp8_transpose_dbias kernel

Pull Request - State: closed - Opened by vasunvidia almost 2 years ago - 2 comments

#72 - Gradient enablement bug fix

Pull Request - State: closed - Opened by ksivaman almost 2 years ago - 2 comments

#71 - Support simulating FP8 on older hardware

Issue - State: open - Opened by zplizzi almost 2 years ago - 1 comment
Labels: enhancement

#70 - Fix gradients when using AMP

Pull Request - State: closed - Opened by ksivaman almost 2 years ago - 1 comment

#69 - Installation errors on Ampere GPUs

Issue - State: open - Opened by realAsma almost 2 years ago - 3 comments
Labels: documentation

#68 - New transpose_dbias kernel

Pull Request - State: closed - Opened by vasunvidia almost 2 years ago - 1 comment

#67 - Zero-centered gamma support in LayerNorm (LayerNorm1p)

Pull Request - State: closed - Opened by ptrendx almost 2 years ago - 6 comments

#66 - QKV parameters unfused path fixes and optimization

Pull Request - State: closed - Opened by ksivaman almost 2 years ago - 8 comments

#65 - Bug fixes from PR 22

Pull Request - State: closed - Opened by ksivaman almost 2 years ago - 6 comments

#64 - remove d2d copies

Pull Request - State: closed - Opened by ksivaman almost 2 years ago - 3 comments

#63 - Address steady memory increase and bloated checkpoints

Pull Request - State: closed - Opened by ksivaman almost 2 years ago - 2 comments

#62 - flash-attn integration

Pull Request - State: closed - Opened by cyanguwa almost 2 years ago - 8 comments

#61 - Add docs for FP8 calibration

Pull Request - State: closed - Opened by ksivaman about 2 years ago - 1 comment

#60 - Fix the integer overflow in fused softmax

Pull Request - State: closed - Opened by ptrendx about 2 years ago - 2 comments

#59 - Numerics fix from #40

Pull Request - State: closed - Opened by ksivaman about 2 years ago - 3 comments

#58 - Bug fixes from #40

Pull Request - State: closed - Opened by ksivaman about 2 years ago - 2 comments

#57 - Add margin for LayerNorm kernel SM usage

Pull Request - State: closed - Opened by ksivaman about 2 years ago - 2 comments

#56 - Remove intermediate dispatch functions

Pull Request - State: closed - Opened by ksivaman about 2 years ago - 1 comment

#55 - Fix NVTX name for LN backward

Pull Request - State: closed - Opened by ksivaman about 2 years ago

#54 - Add TE/JAX high-level modules, unittests and examples

Pull Request - State: closed - Opened by jeng1220 about 2 years ago - 11 comments

#53 - add building workflow for TE/Jax

Pull Request - State: closed - Opened by jeng1220 about 2 years ago - 18 comments

#52 - Indexing fix for bug in virtual interleaved pipelining configs

Pull Request - State: closed - Opened by ksivaman about 2 years ago - 5 comments

#51 - Move calculation of scale inverse to framework

Pull Request - State: closed - Opened by ksivaman about 2 years ago - 5 comments

#50 - Add NVTX to TE modules

Pull Request - State: closed - Opened by ptrendx about 2 years ago - 4 comments

#49 - Enforce boolean attention mask type

Pull Request - State: closed - Opened by ksivaman about 2 years ago - 2 comments

#48 - Update copyright year

Pull Request - State: closed - Opened by ptrendx about 2 years ago - 2 comments

#47 - Add GeGLU and the corresponding gradient kernels

Pull Request - State: closed - Opened by zlsh80826 about 2 years ago - 4 comments

#46 - Reduce unit tests time

Pull Request - State: closed - Opened by zlsh80826 about 2 years ago - 3 comments

#45 - Add RMSNorm

Pull Request - State: closed - Opened by zlsh80826 about 2 years ago - 4 comments

#44 - Docs: remove build warnings and add FP8 caching note

Pull Request - State: closed - Opened by ksivaman about 2 years ago - 2 comments

#43 - Fix in MHA cross attention path

Pull Request - State: closed - Opened by ksivaman about 2 years ago - 6 comments

#42 - Fix LayerNorm API param names

Pull Request - State: closed - Opened by ksivaman about 2 years ago - 1 comment

#41 - Add ONNX export support for TE modules

Pull Request - State: closed - Opened by asfiyab-nvidia about 2 years ago - 13 comments

#40 - Schetlur/fp8 calibration

Pull Request - State: closed - Opened by schetlur-nv about 2 years ago - 6 comments

#39 - Standardize formatting

Pull Request - State: closed - Opened by ksivaman about 2 years ago

#38 - Ensure contiguous inputs

Pull Request - State: closed - Opened by ksivaman about 2 years ago - 1 comment

#37 - Softmax docstrings and type fixes

Pull Request - State: closed - Opened by ksivaman about 2 years ago - 1 comment

#36 - Link performance optimization tutorial to docs

Pull Request - State: closed - Opened by ptrendx about 2 years ago

#35 - cleanup pylintrc

Pull Request - State: closed - Opened by ksivaman about 2 years ago

#34 - Fix illegal memory access in general layer norm backward kernel

Pull Request - State: closed - Opened by timmoon10 about 2 years ago

#33 - Move the amax/scale/scale_inv into the TE Tensor struct.

Pull Request - State: closed - Opened by ptrendx about 2 years ago - 6 comments

#32 - Don't update FP8 weights during validation/inference

Pull Request - State: closed - Opened by ksivaman about 2 years ago - 2 comments

#31 - Full activation recompute checkpointing bug fix

Pull Request - State: closed - Opened by ksivaman about 2 years ago - 1 comment

#30 - Framework agnostic softmax kernels

Pull Request - State: closed - Opened by ksivaman about 2 years ago - 3 comments

#29 - Fixes #26

Pull Request - State: closed - Opened by ksivaman about 2 years ago - 1 comment

#28 - Fix the out-of-bounds access in the C+T+dbias kernel

Pull Request - State: closed - Opened by ptrendx about 2 years ago - 3 comments

#27 - Update README.md

Pull Request - State: closed - Opened by nzmora-nvidia about 2 years ago

#26 - The fc2.bias of LayerNormMLP is not used

Issue - State: closed - Opened by wkcn about 2 years ago

#25 - Incorrect parameter in landing page example.

Issue - State: closed - Opened by jomayeri about 2 years ago - 1 comment

#24 - Fix bugs for full activation recompute in FP8

Pull Request - State: closed - Opened by ksivaman about 2 years ago - 7 comments

#23 - [DO NOT MERGE UPSTREAM]

Pull Request - State: closed - Opened by mjsML about 2 years ago - 3 comments

#22 - Increase number of FP8 tensors per GEMM

Pull Request - State: closed - Opened by vasunvidia over 2 years ago - 7 comments

#21 - Conditional wgrad support

Pull Request - State: closed - Opened by schetlur-nv over 2 years ago - 3 comments