Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / rocm/transformerengine issues and pull requests

#88 - [ROCm] add options to disable fused attn backend compilation

Pull Request - State: closed - Opened by wangye805 about 1 month ago - 1 comment

#87 - Support Deepseekv2

Pull Request - State: open - Opened by hongywei about 1 month ago - 2 comments

#86 - [ROCm] support swa in ROCm TE fused attn CK backend

Pull Request - State: closed - Opened by wangye805 about 1 month ago

#85 - Ipanfilo/ci script

Pull Request - State: open - Opened by ipanfilo about 1 month ago

#84 - Match ROCm platform detection in CMake and setup.py. Issue #9690

Pull Request - State: closed - Opened by ipanfilo about 1 month ago - 1 comment

#83 - Wen/opt cast transpose noop

Pull Request - State: closed - Opened by wenchenvincent about 1 month ago

#81 - Fix the use_fused_attention filtering

Pull Request - State: closed - Opened by wangye805 about 1 month ago

#80 - Fix undesired nv_fusion disabling after reentrant_activation_recompute

Pull Request - State: closed - Opened by ipanfilo about 1 month ago

#78 - [FSDP 8xMI300X]: LLama3 70B 4 Layer Proxy Model GPU Core Dumps

Issue - State: open - Opened by OrenLeung about 2 months ago - 23 comments

#77 - Add default compile arch to ck_fused_attn for building TE in docker image

Pull Request - State: closed - Opened by wangye805 about 2 months ago

#75 - Skip distributest tests on single-GPU systems

Pull Request - State: closed - Opened by ipanfilo about 2 months ago

#73 - MI300X FP8 TE.Linear 2x Slower than AMP BF16 F.Linear

Issue - State: open - Opened by OrenLeung about 2 months ago - 14 comments

#72 - [1xMI300X] GPT-2 XL 1.5B FP8 Training ~30% slower than H100 FP8

Issue - State: open - Opened by OrenLeung about 2 months ago - 19 comments

#70 - fix: rocm transformer engine install instructions

Pull Request - State: closed - Opened by OrenLeung about 2 months ago

#69 - remove -ffast-math in ck_fused_attn compilation

Pull Request - State: open - Opened by wangye805 about 2 months ago

#67 - Add hipBLASLt autotune results persistent storage

Pull Request - State: closed - Opened by ipanfilo about 2 months ago - 2 comments

#66 - Ifu release v1.9

Pull Request - State: closed - Opened by wangye805 about 2 months ago

#65 - Add simultaneous support of hipBlasLt and rocBlas

Pull Request - State: closed - Opened by ipanfilo about 2 months ago

#64 - Ipanfilo/ifu20240614

Pull Request - State: closed - Opened by ipanfilo about 2 months ago

#63 - [ROCm] remove the extra sync in gqa/mqa bwd

Pull Request - State: closed - Opened by wangye805 2 months ago - 1 comment

#62 - [ROCm] upgrade aotriton to version release/0.7

Pull Request - State: open - Opened by wangye805 2 months ago

#61 - Update supported tests in README.rst

Pull Request - State: closed - Opened by ipanfilo 2 months ago

#60 - Fixed wrong test methods arguments in some cases

Pull Request - State: closed - Opened by ipanfilo 2 months ago - 1 comment

#59 - Ifu20240625 group gemm yewang12

Pull Request - State: closed - Opened by wangye805 2 months ago - 1 comment

#58 - Fix JAX examples, fix ROCm device capability check

Pull Request - State: closed - Opened by ipanfilo 2 months ago

#57 - Update README.rst

Pull Request - State: closed - Opened by wangye805 3 months ago

#56 - Pytorch: fixed ONNX test, control fused attn for cuda graph tests

Pull Request - State: closed - Opened by ipanfilo 3 months ago

#55 - Ifu 20240613 r1

Pull Request - State: closed - Opened by ipanfilo 3 months ago

#54 - [ROCm] enable MQA/GQA in CK but use dk dv expand walkaround

Pull Request - State: closed - Opened by wangye805 3 months ago

#53 - Ifu 20240613

Pull Request - State: closed - Opened by ipanfilo 3 months ago

#52 - Revert "[ROCm] change llvm url to local amd server"

Pull Request - State: closed - Opened by wangye805 3 months ago

#51 - [ROCm] Enable context parallelism in pytorch TE

Pull Request - State: closed - Opened by wangye805 3 months ago

#50 - [Issue]: install TransformerEngine error, cannot reach llvm tar file

Issue - State: closed - Opened by amd-fuweiy 3 months ago - 2 comments

#49 - Update ck_fused_attn CMakeLists.txt to clean gen_src

Pull Request - State: closed - Opened by wangye805 3 months ago

#48 - [ROCm] change llvm url to local amd server

Pull Request - State: closed - Opened by wangye805 3 months ago

#47 - [ROCm] temporary workaround to disable CK compilation in rocm6.2

Pull Request - State: closed - Opened by wangye805 3 months ago

#46 - [ROCm] update CK version to fix the compilation issue in ROCm6.2

Pull Request - State: closed - Opened by wangye805 4 months ago

#45 - Update README.rst with PYTORCH_ROCM_ARCH=gfx942

Pull Request - State: closed - Opened by wangye805 4 months ago

#44 - Added flag to skip aotriton build for faster incremental builds

Pull Request - State: closed - Opened by ipanfilo 4 months ago

#43 - Integrate ck fused attn

Pull Request - State: closed - Opened by wangye805 4 months ago - 1 comment

#42 - Issue6445 - revert w/a. Issue8516 - fix test run

Pull Request - State: closed - Opened by ipanfilo 5 months ago

#41 - adding nanogpt submodule example

Pull Request - State: closed - Opened by floraamd 5 months ago

#40 - Fix memory corruption due to wrong descructors order- issue #8239

Pull Request - State: closed - Opened by ipanfilo 6 months ago

#39 - Add hipblaslt heuristic cache

Pull Request - State: closed - Opened by ipanfilo 6 months ago - 1 comment

#38 - AOTriton fused attn integration

Pull Request - State: closed - Opened by wangye805 6 months ago

#37 - Hipblaslt handle caching

Pull Request - State: closed - Opened by ipanfilo 7 months ago - 3 comments

#36 - Ifu 20240222

Pull Request - State: closed - Opened by wangye805 7 months ago

#35 - IFU 20240222

Pull Request - State: closed - Opened by wangye805 7 months ago

#34 - [TE] Investigate parallelism implementation in Transformer Engine

Issue - State: open - Opened by wangye805 7 months ago - 1 comment

#33 - GEMM test: add HW support filter for FP8, fix some HIPBLASLT

Pull Request - State: closed - Opened by ipanfilo 8 months ago

#31 - Fixed build with new hipify_torch, fix switching to HIPBLAS codepath

Pull Request - State: closed - Opened by ipanfilo 8 months ago - 3 comments

#30 - IFU 20240221

Pull Request - State: closed - Opened by wangye805 9 months ago

#29 - [ROCm] denorm fix for rocblas path in gemm

Pull Request - State: closed - Opened by wangye805 10 months ago

#27 - [ROCm] support jax in transformer engine

Pull Request - State: closed - Opened by wangye805 10 months ago

#25 - Fix setup exception if cmake.__file__ is None

Pull Request - State: closed - Opened by ipanfilo 10 months ago

#24 - [ROCm] add fp8 output support in rocblas gemm path

Pull Request - State: closed - Opened by wangye805 11 months ago

#22 - Fix TE RTC on ROCm 6.0

Pull Request - State: closed - Opened by ipanfilo 11 months ago - 1 comment

#20 - Added __HIP_PLATFORM_HCC__ to building cpp tests.

Pull Request - State: closed - Opened by wenchenvincent 12 months ago

#19 - [ROCm] enable nvfuser

Pull Request - State: closed - Opened by wangye805 12 months ago

#18 - Enable roctx usage

Pull Request - State: closed - Opened by ipanfilo 12 months ago - 6 comments

#17 - [ROCm] Re-organize the readme to add a dedicated ROCm and AMDGPU

Pull Request - State: closed - Opened by wangye805 12 months ago

#16 - Fixed bugs with bf16 GEMM when using rocblas path.

Pull Request - State: closed - Opened by wenchenvincent about 1 year ago

#15 - Support TE transpose RTC

Pull Request - State: closed - Opened by ipanfilo about 1 year ago - 3 comments

#14 - Ifu 20230906

Pull Request - State: closed - Opened by wangye805 about 1 year ago

#13 - HIPRTC initial support

Pull Request - State: closed - Opened by ipanfilo about 1 year ago

#12 - Merge recent commits into the dev branch

Pull Request - State: closed - Opened by wangye805 about 1 year ago

#11 - add install option of use_hipblaslt into pip install/cmake

Pull Request - State: closed - Opened by wangye805 over 1 year ago - 1 comment

#10 - Worked around an issue with intrinsics for f8 upcasting.

Pull Request - State: closed - Opened by wenchenvincent over 1 year ago

#9 - gfx940 performance improvement

Pull Request - State: closed - Opened by wenchenvincent over 1 year ago - 1 comment

#8 - Fp8 gemm for gfx940 enabled.

Pull Request - State: closed - Opened by wenchenvincent over 1 year ago

#7 - Fp8 gemm enabled

Pull Request - State: closed - Opened by wenchenvincent over 1 year ago

#6 - F8: Interop investigation considering G's discussion and feedback

Issue - State: closed - Opened by HaiShaw almost 2 years ago

#5 - 2897 nvte port prior fp8

Pull Request - State: closed - Opened by HaiShaw almost 2 years ago

#4 - 2897 nvte unit tests

Pull Request - State: closed - Opened by wenchenvincent almost 2 years ago

#2 - Fixed build issues with Pytorch extensions

Pull Request - State: closed - Opened by wenchenvincent about 2 years ago

#1 - 2897 prep

Pull Request - State: closed - Opened by HaiShaw about 2 years ago - 1 comment