Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / rocm/transformerengine issues and pull requests
#89 - add new branch heyi_cast_transpose and update cast_transpose optimizations
Pull Request -
State: open - Opened by eliotwang 26 days ago
#88 - [ROCm] add options to disable fused attn backend compilation
Pull Request -
State: closed - Opened by wangye805 about 1 month ago
- 1 comment
#87 - Support Deepseekv2
Pull Request -
State: open - Opened by hongywei about 1 month ago
- 2 comments
#86 - [ROCm] support swa in ROCm TE fused attn CK backend
Pull Request -
State: closed - Opened by wangye805 about 1 month ago
#85 - Ipanfilo/ci script
Pull Request -
State: open - Opened by ipanfilo about 1 month ago
#84 - Match ROCm platform detection in CMake and setup.py. Issue #9690
Pull Request -
State: closed - Opened by ipanfilo about 1 month ago
- 1 comment
#83 - Wen/opt cast transpose noop
Pull Request -
State: closed - Opened by wenchenvincent about 1 month ago
#82 - [Out of Box Experience]: ROCm Transformer Engine Should Be Included in AMD Pytorch Images
Issue -
State: open - Opened by OrenLeung about 1 month ago
- 1 comment
#81 - Fix the use_fused_attention filtering
Pull Request -
State: closed - Opened by wangye805 about 1 month ago
#80 - Fix undesired nv_fusion disabling after reentrant_activation_recompute
Pull Request -
State: closed - Opened by ipanfilo about 1 month ago
#79 - [FSDP 8xMI300X] Llama3 8B FP8 is 21% slower than BF16 & OOMs on the same batch size
Issue -
State: open - Opened by OrenLeung about 2 months ago
- 6 comments
#78 - [FSDP 8xMI300X]: LLama3 70B 4 Layer Proxy Model GPU Core Dumps
Issue -
State: open - Opened by OrenLeung about 2 months ago
- 23 comments
#77 - Add default compile arch to ck_fused_attn for building TE in docker image
Pull Request -
State: closed - Opened by wangye805 about 2 months ago
#76 - [DDP 8xMI300X] GPT2-1.5B FP8 is 25% slower than BF16 & OOMs on the same batch size
Issue -
State: open - Opened by OrenLeung about 2 months ago
- 3 comments
#75 - Skip distributest tests on single-GPU systems
Pull Request -
State: closed - Opened by ipanfilo about 2 months ago
#74 - [Issue]: MI300X fused_attn CK Backend Broken HIP runtime error: invalid device function 3rdparty/composable_kernel/include/ck_tile/host/hip_check_error.hpp: 18in function: hip_check_error
Issue -
State: open - Opened by OrenLeung about 2 months ago
- 9 comments
#73 - MI300X FP8 TE.Linear 2x Slower than AMP BF16 F.Linear
Issue -
State: open - Opened by OrenLeung about 2 months ago
- 14 comments
#72 - [1xMI300X] GPT-2 XL 1.5B FP8 Training ~30% slower than H100 FP8
Issue -
State: open - Opened by OrenLeung about 2 months ago
- 19 comments
#71 - [Issue]: ROCm TE Installation Error: no member named 'getCurrentHIPStreamMasqueradingAsCUDA' in namespace 'c10::hip'
Issue -
State: closed - Opened by OrenLeung about 2 months ago
- 3 comments
#70 - fix: rocm transformer engine install instructions
Pull Request -
State: closed - Opened by OrenLeung about 2 months ago
#69 - remove -ffast-math in ck_fused_attn compilation
Pull Request -
State: open - Opened by wangye805 about 2 months ago
#67 - Add hipBLASLt autotune results persistent storage
Pull Request -
State: closed - Opened by ipanfilo about 2 months ago
- 2 comments
#66 - Ifu release v1.9
Pull Request -
State: closed - Opened by wangye805 about 2 months ago
#65 - Add simultaneous support of hipBlasLt and rocBlas
Pull Request -
State: closed - Opened by ipanfilo about 2 months ago
#64 - Ipanfilo/ifu20240614
Pull Request -
State: closed - Opened by ipanfilo about 2 months ago
#63 - [ROCm] remove the extra sync in gqa/mqa bwd
Pull Request -
State: closed - Opened by wangye805 2 months ago
- 1 comment
#62 - [ROCm] upgrade aotriton to version release/0.7
Pull Request -
State: open - Opened by wangye805 2 months ago
#61 - Update supported tests in README.rst
Pull Request -
State: closed - Opened by ipanfilo 2 months ago
#60 - Fixed wrong test methods arguments in some cases
Pull Request -
State: closed - Opened by ipanfilo 2 months ago
- 1 comment
#59 - Ifu20240625 group gemm yewang12
Pull Request -
State: closed - Opened by wangye805 2 months ago
- 1 comment
#58 - Fix JAX examples, fix ROCm device capability check
Pull Request -
State: closed - Opened by ipanfilo 2 months ago
#57 - Update README.rst
Pull Request -
State: closed - Opened by wangye805 3 months ago
#56 - Pytorch: fixed ONNX test, control fused attn for cuda graph tests
Pull Request -
State: closed - Opened by ipanfilo 3 months ago
#55 - Ifu 20240613 r1
Pull Request -
State: closed - Opened by ipanfilo 3 months ago
#54 - [ROCm] enable MQA/GQA in CK but use dk dv expand walkaround
Pull Request -
State: closed - Opened by wangye805 3 months ago
#53 - Ifu 20240613
Pull Request -
State: closed - Opened by ipanfilo 3 months ago
#52 - Revert "[ROCm] change llvm url to local amd server"
Pull Request -
State: closed - Opened by wangye805 3 months ago
#51 - [ROCm] Enable context parallelism in pytorch TE
Pull Request -
State: closed - Opened by wangye805 3 months ago
#50 - [Issue]: install TransformerEngine error, cannot reach llvm tar file
Issue -
State: closed - Opened by amd-fuweiy 3 months ago
- 2 comments
#49 - Update ck_fused_attn CMakeLists.txt to clean gen_src
Pull Request -
State: closed - Opened by wangye805 3 months ago
#48 - [ROCm] change llvm url to local amd server
Pull Request -
State: closed - Opened by wangye805 3 months ago
#47 - [ROCm] temporary workaround to disable CK compilation in rocm6.2
Pull Request -
State: closed - Opened by wangye805 3 months ago
#46 - [ROCm] update CK version to fix the compilation issue in ROCm6.2
Pull Request -
State: closed - Opened by wangye805 4 months ago
#45 - Update README.rst with PYTORCH_ROCM_ARCH=gfx942
Pull Request -
State: closed - Opened by wangye805 4 months ago
#44 - Added flag to skip aotriton build for faster incremental builds
Pull Request -
State: closed - Opened by ipanfilo 4 months ago
#43 - Integrate ck fused attn
Pull Request -
State: closed - Opened by wangye805 4 months ago
- 1 comment
#42 - Issue6445 - revert w/a. Issue8516 - fix test run
Pull Request -
State: closed - Opened by ipanfilo 5 months ago
#41 - adding nanogpt submodule example
Pull Request -
State: closed - Opened by floraamd 5 months ago
#40 - Fix memory corruption due to wrong descructors order- issue #8239
Pull Request -
State: closed - Opened by ipanfilo 6 months ago
#39 - Add hipblaslt heuristic cache
Pull Request -
State: closed - Opened by ipanfilo 6 months ago
- 1 comment
#38 - AOTriton fused attn integration
Pull Request -
State: closed - Opened by wangye805 6 months ago
#37 - Hipblaslt handle caching
Pull Request -
State: closed - Opened by ipanfilo 7 months ago
- 3 comments
#36 - Ifu 20240222
Pull Request -
State: closed - Opened by wangye805 7 months ago
#35 - IFU 20240222
Pull Request -
State: closed - Opened by wangye805 7 months ago
#34 - [TE] Investigate parallelism implementation in Transformer Engine
Issue -
State: open - Opened by wangye805 7 months ago
- 1 comment
#33 - GEMM test: add HW support filter for FP8, fix some HIPBLASLT
Pull Request -
State: closed - Opened by ipanfilo 8 months ago
#32 - [ROCM] Fixed Copyright statement for hipify_torch in Acknowledgement.
Pull Request -
State: closed - Opened by wenchenvincent 8 months ago
#32 - [ROCM] Fixed Copyright statement for hipify_torch in Acknowledgement.
Pull Request -
State: closed - Opened by wenchenvincent 8 months ago
#31 - Fixed build with new hipify_torch, fix switching to HIPBLAS codepath
Pull Request -
State: closed - Opened by ipanfilo 8 months ago
- 3 comments
#30 - IFU 20240221
Pull Request -
State: closed - Opened by wangye805 9 months ago
#29 - [ROCm] denorm fix for rocblas path in gemm
Pull Request -
State: closed - Opened by wangye805 10 months ago
#28 - Enable pytorch tests with hipgraph with workaround: reuse stream for …
Pull Request -
State: closed - Opened by ipanfilo 10 months ago
#27 - [ROCm] support jax in transformer engine
Pull Request -
State: closed - Opened by wangye805 10 months ago
#26 - [ROCM] Added AMD Copyright statements. Added MIT license for AMD cont…
Pull Request -
State: closed - Opened by wenchenvincent 10 months ago
#25 - Fix setup exception if cmake.__file__ is None
Pull Request -
State: closed - Opened by ipanfilo 10 months ago
#24 - [ROCm] add fp8 output support in rocblas gemm path
Pull Request -
State: closed - Opened by wangye805 11 months ago
#23 - [ROCm] add fp8/bf8 output support in gemm in rocblas simulation path
Pull Request -
State: closed - Opened by wangye805 11 months ago
#22 - Fix TE RTC on ROCm 6.0
Pull Request -
State: closed - Opened by ipanfilo 11 months ago
- 1 comment
#21 - [ROCm] Move from hipblasltDatatype_t to hipDataType for ROCm 6.0 release
Pull Request -
State: closed - Opened by wenchenvincent 12 months ago
#20 - Added __HIP_PLATFORM_HCC__ to building cpp tests.
Pull Request -
State: closed - Opened by wenchenvincent 12 months ago
#19 - [ROCm] enable nvfuser
Pull Request -
State: closed - Opened by wangye805 12 months ago
#18 - Enable roctx usage
Pull Request -
State: closed - Opened by ipanfilo 12 months ago
- 6 comments
#17 - [ROCm] Re-organize the readme to add a dedicated ROCm and AMDGPU
Pull Request -
State: closed - Opened by wangye805 12 months ago
#16 - Fixed bugs with bf16 GEMM when using rocblas path.
Pull Request -
State: closed - Opened by wenchenvincent about 1 year ago
#15 - Support TE transpose RTC
Pull Request -
State: closed - Opened by ipanfilo about 1 year ago
- 3 comments
#14 - Ifu 20230906
Pull Request -
State: closed - Opened by wangye805 about 1 year ago
#13 - HIPRTC initial support
Pull Request -
State: closed - Opened by ipanfilo about 1 year ago
#12 - Merge recent commits into the dev branch
Pull Request -
State: closed - Opened by wangye805 about 1 year ago
#11 - add install option of use_hipblaslt into pip install/cmake
Pull Request -
State: closed - Opened by wangye805 over 1 year ago
- 1 comment
#10 - Worked around an issue with intrinsics for f8 upcasting.
Pull Request -
State: closed - Opened by wenchenvincent over 1 year ago
#9 - gfx940 performance improvement
Pull Request -
State: closed - Opened by wenchenvincent over 1 year ago
- 1 comment
#8 - Fp8 gemm for gfx940 enabled.
Pull Request -
State: closed - Opened by wenchenvincent over 1 year ago
#7 - Fp8 gemm enabled
Pull Request -
State: closed - Opened by wenchenvincent over 1 year ago
#6 - F8: Interop investigation considering G's discussion and feedback
Issue -
State: closed - Opened by HaiShaw almost 2 years ago
#5 - 2897 nvte port prior fp8
Pull Request -
State: closed - Opened by HaiShaw almost 2 years ago
#4 - 2897 nvte unit tests
Pull Request -
State: closed - Opened by wenchenvincent almost 2 years ago
#3 - Replaced cublasLt calls with rocblas gemm + 6 manually crafted epilogues
Pull Request -
State: closed - Opened by wenchenvincent about 2 years ago
#2 - Fixed build issues with Pytorch extensions
Pull Request -
State: closed - Opened by wenchenvincent about 2 years ago
#1 - 2897 prep
Pull Request -
State: closed - Opened by HaiShaw about 2 years ago
- 1 comment