Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / DefTruth/CUDA-Learn-Notes issues and pull requests

#139 - [HGEMM] Add MMA HGEMM NN C++ benchmark

Pull Request - State: closed - Opened by DefTruth 7 days ago

#138 - [HGEMM] fix cublas hgemm handle error

Pull Request - State: closed - Opened by DefTruth 7 days ago

#137 - [HGEMM] Update HGEMM L20/4090 Bench

Pull Request - State: closed - Opened by DefTruth 7 days ago

#136 - [HGEMM] refactor HGEMM cpp benchmark

Pull Request - State: closed - Opened by DefTruth 7 days ago

#135 - [HGEMM] trans mat b from row major -> col major

Pull Request - State: closed - Opened by DefTruth 7 days ago

#135 - [HGEMM] trans mat b from row major -> col major

Pull Request - State: closed - Opened by DefTruth 7 days ago

#134 - [HGEMM] Add CuTe HGEMM with SMEM Swizzle

Pull Request - State: closed - Opened by DefTruth 8 days ago

#134 - [HGEMM] Add CuTe HGEMM with SMEM Swizzle

Pull Request - State: closed - Opened by DefTruth 8 days ago

#133 - Update embedding.cu

Pull Request - State: closed - Opened by TheManWhoIsStupid 8 days ago

#133 - Update embedding.cu

Pull Request - State: closed - Opened by TheManWhoIsStupid 8 days ago

#132 - [HGEMM] Add large MNK block swizzle policy

Pull Request - State: closed - Opened by DefTruth 8 days ago

#132 - [HGEMM] Add large MNK block swizzle policy

Pull Request - State: closed - Opened by DefTruth 8 days ago

#131 - Bump up to v2.6

Pull Request - State: closed - Opened by DefTruth 12 days ago

#130 - [README] Update README.md

Pull Request - State: closed - Opened by DefTruth 13 days ago

#129 - [README] Update README

Pull Request - State: closed - Opened by DefTruth 14 days ago

#128 - [README] Add contents lists

Pull Request - State: closed - Opened by DefTruth 14 days ago

#127 - [Blog]图解DeepSpeed-Ulysses&Megatron-LM TP/SP

Pull Request - State: closed - Opened by DefTruth 15 days ago

#126 - [HGEMM] Update NVIDIA L20/4090 Perf plots

Pull Request - State: closed - Opened by DefTruth 19 days ago

#125 - Bump up to v2.5

Pull Request - State: closed - Opened by DefTruth 22 days ago

#124 - [HGEMM] Add HGEMM L20/4090 benchmark figures

Pull Request - State: closed - Opened by DefTruth 22 days ago

#123 - [PERF] Update HGEMM benchmark scripts

Pull Request - State: closed - Opened by DefTruth 23 days ago - 5 comments

#122 - [HGEMM] Add NVIDIA RTX 3090 Laptop perf plot

Pull Request - State: closed - Opened by DefTruth 24 days ago

#121 - [HGEMM] Add plot tflops function

Pull Request - State: closed - Opened by DefTruth 25 days ago

#120 - [HGEMM] Update HGEMM README.md

Pull Request - State: closed - Opened by DefTruth 26 days ago

#119 - [HGEMM] Add NVIDIA RTX 4090 benchmark

Pull Request - State: closed - Opened by DefTruth 27 days ago

#118 - [README] Update HGEMM/SGEMM Supported Matrix

Pull Request - State: closed - Opened by DefTruth 28 days ago

#117 - [HGEMM] Update HGEMM/SGEMM Supported Matrix

Pull Request - State: closed - Opened by DefTruth 28 days ago

#116 - [HGEMM] Update HGEMM Supported Matrix

Pull Request - State: closed - Opened by DefTruth 28 days ago

#115 - Update README.md

Pull Request - State: closed - Opened by DefTruth 28 days ago

#114 - [Docs] Update HGEMM/SGEMM Supported Matrix

Pull Request - State: closed - Opened by DefTruth 29 days ago

#113 - [README] Update HGEMM/SGEMM Supported matrix

Pull Request - State: closed - Opened by DefTruth 29 days ago

#112 - [HGEMM] Update HGEMM/SGEMM Supported Matrix

Pull Request - State: closed - Opened by DefTruth 29 days ago

#111 - [HGEMM] Add M=N=K option for benchmark

Pull Request - State: closed - Opened by DefTruth 29 days ago

#110 - [HGEMM][Docs] Add HGEMM Supported Matrix

Pull Request - State: closed - Opened by DefTruth 29 days ago

#109 - [HGEMM] Update HGEMM MMA/WMMA Usage

Pull Request - State: closed - Opened by DefTruth 29 days ago

#108 - [HGEMM] Try reduce registers usage

Pull Request - State: closed - Opened by DefTruth 29 days ago

#107 - [HGEMM] add -Xptxas -v compile flag

Pull Request - State: closed - Opened by DefTruth about 1 month ago

#106 - [HGEMM] Add Warp Swizzle as template param

Pull Request - State: closed - Opened by DefTruth about 1 month ago

#105 - [HGEMM] Update HGEMM benchmark scripts

Pull Request - State: closed - Opened by DefTruth about 1 month ago

#104 - [HGEMM] Add HGEMM MMA Col Major Kernel

Pull Request - State: closed - Opened by DefTruth about 1 month ago

#103 - [HGEMM] Add some note to collective store

Pull Request - State: closed - Opened by DefTruth about 1 month ago

#103 - [HGEMM] Add some note to collective store

Pull Request - State: closed - Opened by DefTruth about 1 month ago

#102 - [NMS] Add nms f32 cuda kernel.

Pull Request - State: closed - Opened by bear-zd about 1 month ago - 1 comment

#102 - [NMS] Add nms f32 cuda kernel.

Pull Request - State: closed - Opened by bear-zd about 1 month ago - 1 comment

#101 - [HGEMM] collective store via warp shfl&reg reuse

Pull Request - State: closed - Opened by DefTruth about 1 month ago

#101 - [HGEMM] collective store via warp shfl&reg reuse

Pull Request - State: closed - Opened by DefTruth about 1 month ago

#100 - [HGEMM] ldmatrix.x4.trans with reg double buffers

Pull Request - State: closed - Opened by DefTruth about 1 month ago

#100 - [HGEMM] ldmatrix.x4.trans with reg double buffers

Pull Request - State: closed - Opened by DefTruth about 1 month ago

#99 - [HGEMM] HGEMM MMA with Reg Double Buffers

Pull Request - State: closed - Opened by DefTruth about 1 month ago

#99 - [HGEMM] HGEMM MMA with Reg Double Buffers

Pull Request - State: closed - Opened by DefTruth about 1 month ago

#98 - [HGEMM] Add MMA 16816 swizzle, Up to 115 TFLOPS

Pull Request - State: closed - Opened by DefTruth about 1 month ago

#97 - [HGEMM] Update HGEMM WMMA Benchmark

Pull Request - State: closed - Opened by DefTruth about 1 month ago

#96 - [HGEMM] Refactor HGEMM WMMA 161616 kernels

Pull Request - State: closed - Opened by DefTruth about 1 month ago

#96 - [HGEMM] Refactor HGEMM WMMA 161616 kernels

Pull Request - State: closed - Opened by DefTruth about 1 month ago

#95 - [HGEMM] update HGEMM benchmark option

Pull Request - State: closed - Opened by DefTruth about 1 month ago

#95 - [HGEMM] update HGEMM benchmark option

Pull Request - State: closed - Opened by DefTruth about 1 month ago

#94 - [HGEMM] Add GeForce RTX 3080 Laptop benchmark

Pull Request - State: closed - Opened by DefTruth about 1 month ago

#93 - [Docs] rename mat_transpose -> mat-transpose

Pull Request - State: closed - Opened by DefTruth about 1 month ago

#92 - [HGEMM] optimize SMEM padding, up to 113 TFLOPS

Pull Request - State: closed - Opened by DefTruth about 1 month ago

#91 - [Mat][Trans] Add f32x4_shared/bcf row/col first kernel.

Pull Request - State: closed - Opened by bear-zd about 1 month ago - 2 comments

#90 - [Docs][Contribute] Add How to contribute Notes

Pull Request - State: closed - Opened by DefTruth about 1 month ago

#89 - [Mat][Trans] Add f32/f32x4 row/col first kernel

Pull Request - State: closed - Opened by bear-zd about 1 month ago - 1 comment

#88 - [Mat Transpose] Add mat transpose f32/x4_packed kernel.

Pull Request - State: closed - Opened by bear-zd about 1 month ago - 2 comments

#88 - [Mat Transpose] Add mat transpose f32/x4_packed kernel.

Pull Request - State: closed - Opened by bear-zd about 1 month ago - 2 comments

#87 - [SGEMM] Update SGEMM TF32 Benchmark

Pull Request - State: closed - Opened by DefTruth about 1 month ago

#87 - [SGEMM] Update SGEMM TF32 Benchmark

Pull Request - State: closed - Opened by DefTruth about 1 month ago

#86 - [HGEMM] mma4x4_warp4x4_stages with swizzle

Pull Request - State: closed - Opened by DefTruth about 1 month ago

#86 - [HGEMM] mma4x4_warp4x4_stages with swizzle

Pull Request - State: closed - Opened by DefTruth about 1 month ago

#85 - [SWISH] support Swish F32/F16 kernel

Pull Request - State: closed - Opened by wangzijian1010 about 1 month ago - 1 comment

#85 - [SWISH] support Swish F32/F16 kernel

Pull Request - State: closed - Opened by wangzijian1010 about 1 month ago - 1 comment

#84 - [SGEMM] SGEMM TF32 Thread Block Swizzle

Pull Request - State: closed - Opened by DefTruth about 1 month ago

#84 - [SGEMM] SGEMM TF32 Thread Block Swizzle

Pull Request - State: closed - Opened by DefTruth about 1 month ago

#83 - [HGEMM] make thread block swizzle stride as N/4

Pull Request - State: closed - Opened by DefTruth about 1 month ago

#83 - [HGEMM] make thread block swizzle stride as N/4

Pull Request - State: closed - Opened by DefTruth about 1 month ago

#82 - [HEGMM] HGEMM WMMA Thread Block Swizzle

Pull Request - State: closed - Opened by DefTruth about 1 month ago

#81 - [Docs] Update README.md

Pull Request - State: closed - Opened by DefTruth about 1 month ago

#80 - [RoPE] Add minimal RoPE f32/f32x4 pack impl

Pull Request - State: closed - Opened by bear-zd about 1 month ago - 2 comments

#79 - [SGEMM] Add Kernel cudaFuncSetAttribute hint

Pull Request - State: closed - Opened by DefTruth about 1 month ago

#78 - [SGEMM] Add cuBLAS SGEMM F32/TF32 baseline

Pull Request - State: closed - Opened by DefTruth about 1 month ago

#77 - [SGEMM] Add SGEMM WMMA TF32 Stage2/3

Pull Request - State: closed - Opened by DefTruth about 1 month ago

#76 - [HGEMM] HGEMM WMMA Stage mma4x2+warp4x4

Pull Request - State: closed - Opened by DefTruth about 1 month ago

#75 - [HEGMM][Bugfix] fix HGEMM Stage cp.async error

Pull Request - State: closed - Opened by DefTruth about 1 month ago

#74 - [HGEMM] Add HGEMM WMMA Stage 3/4 Kernel

Pull Request - State: closed - Opened by DefTruth about 2 months ago

#73 - [Softmax] Add online softmax f32x4 pack kernel

Pull Request - State: closed - Opened by bear-zd about 2 months ago

#72 - [Softmax] Add online softmax f32x4 pack kernel

Pull Request - State: closed - Opened by bear-zd about 2 months ago

#71 - [HGEMM] HGEMM WMMA with Reg double buffers

Pull Request - State: closed - Opened by DefTruth about 2 months ago

#70 - [HGEMM] Add HGEMM mma4x2, warp2x4x2 kernel

Pull Request - State: closed - Opened by DefTruth about 2 months ago

#69 - [HGEMM] Add HGEMM WMMA Double Buffers

Pull Request - State: closed - Opened by DefTruth about 2 months ago

#68 - [Embedding] Add embedding kernel f32/x4/x4_pack, f16/x8/x8_pack

Pull Request - State: closed - Opened by bear-zd about 2 months ago - 2 comments

#67 - [HGEMM] HGEMM Tensor Cores Support Part-1

Pull Request - State: closed - Opened by DefTruth about 2 months ago

#66 - [GELU] Add f32/x4, f16/x2/x8/x8pack kernel.

Pull Request - State: closed - Opened by bear-zd about 2 months ago - 1 comment

#65 - [SGEMM][Async] Add K16 + Copy Async Kernel

Pull Request - State: closed - Opened by DefTruth about 2 months ago

#64 - [SGEMM][Async] Add naive copy async SGEMM

Pull Request - State: closed - Opened by DefTruth about 2 months ago

#63 - [Softmax][Bugfix] fixed softmax compile error

Pull Request - State: closed - Opened by DefTruth about 2 months ago

#62 - [HGEMM][Async] support K16/32 pack+cp.async+dbuf

Pull Request - State: closed - Opened by DefTruth about 2 months ago

#61 - [Softmax] Add online softmax according to Nvidia Paper (#60)

Pull Request - State: closed - Opened by DefTruth about 2 months ago

#60 - [Softmax] Add online softmax according to Nvidia Paper

Pull Request - State: closed - Opened by bear-zd about 2 months ago - 1 comment

#59 - [HGEMM] Add PyTorch HGEMM profile

Pull Request - State: closed - Opened by DefTruth about 2 months ago

#58 - [Docs] Add docs for HGEMM/SGEMM double buffers

Pull Request - State: closed - Opened by DefTruth about 2 months ago

#57 - [HGEMM] HEGMM kernel with double buffers

Pull Request - State: closed - Opened by DefTruth about 2 months ago