Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / DefTruth/CUDA-Learn-Notes issues and pull requests
#139 - [HGEMM] Add MMA HGEMM NN C++ benchmark
Pull Request -
State: closed - Opened by DefTruth 7 days ago
#138 - [HGEMM] fix cublas hgemm handle error
Pull Request -
State: closed - Opened by DefTruth 7 days ago
#137 - [HGEMM] Update HGEMM L20/4090 Bench
Pull Request -
State: closed - Opened by DefTruth 7 days ago
#136 - [HGEMM] refactor HGEMM cpp benchmark
Pull Request -
State: closed - Opened by DefTruth 7 days ago
#135 - [HGEMM] trans mat b from row major -> col major
Pull Request -
State: closed - Opened by DefTruth 7 days ago
#135 - [HGEMM] trans mat b from row major -> col major
Pull Request -
State: closed - Opened by DefTruth 7 days ago
#134 - [HGEMM] Add CuTe HGEMM with SMEM Swizzle
Pull Request -
State: closed - Opened by DefTruth 8 days ago
#134 - [HGEMM] Add CuTe HGEMM with SMEM Swizzle
Pull Request -
State: closed - Opened by DefTruth 8 days ago
#133 - Update embedding.cu
Pull Request -
State: closed - Opened by TheManWhoIsStupid 8 days ago
#133 - Update embedding.cu
Pull Request -
State: closed - Opened by TheManWhoIsStupid 8 days ago
#132 - [HGEMM] Add large MNK block swizzle policy
Pull Request -
State: closed - Opened by DefTruth 8 days ago
#132 - [HGEMM] Add large MNK block swizzle policy
Pull Request -
State: closed - Opened by DefTruth 8 days ago
#131 - Bump up to v2.6
Pull Request -
State: closed - Opened by DefTruth 12 days ago
#130 - [README] Update README.md
Pull Request -
State: closed - Opened by DefTruth 13 days ago
#129 - [README] Update README
Pull Request -
State: closed - Opened by DefTruth 14 days ago
#128 - [README] Add contents lists
Pull Request -
State: closed - Opened by DefTruth 14 days ago
#127 - [Blog]图解DeepSpeed-Ulysses&Megatron-LM TP/SP
Pull Request -
State: closed - Opened by DefTruth 15 days ago
#126 - [HGEMM] Update NVIDIA L20/4090 Perf plots
Pull Request -
State: closed - Opened by DefTruth 19 days ago
#125 - Bump up to v2.5
Pull Request -
State: closed - Opened by DefTruth 22 days ago
#124 - [HGEMM] Add HGEMM L20/4090 benchmark figures
Pull Request -
State: closed - Opened by DefTruth 22 days ago
#123 - [PERF] Update HGEMM benchmark scripts
Pull Request -
State: closed - Opened by DefTruth 23 days ago
- 5 comments
#122 - [HGEMM] Add NVIDIA RTX 3090 Laptop perf plot
Pull Request -
State: closed - Opened by DefTruth 24 days ago
#121 - [HGEMM] Add plot tflops function
Pull Request -
State: closed - Opened by DefTruth 25 days ago
#120 - [HGEMM] Update HGEMM README.md
Pull Request -
State: closed - Opened by DefTruth 26 days ago
#119 - [HGEMM] Add NVIDIA RTX 4090 benchmark
Pull Request -
State: closed - Opened by DefTruth 27 days ago
#118 - [README] Update HGEMM/SGEMM Supported Matrix
Pull Request -
State: closed - Opened by DefTruth 28 days ago
#117 - [HGEMM] Update HGEMM/SGEMM Supported Matrix
Pull Request -
State: closed - Opened by DefTruth 28 days ago
#116 - [HGEMM] Update HGEMM Supported Matrix
Pull Request -
State: closed - Opened by DefTruth 28 days ago
#115 - Update README.md
Pull Request -
State: closed - Opened by DefTruth 28 days ago
#114 - [Docs] Update HGEMM/SGEMM Supported Matrix
Pull Request -
State: closed - Opened by DefTruth 29 days ago
#113 - [README] Update HGEMM/SGEMM Supported matrix
Pull Request -
State: closed - Opened by DefTruth 29 days ago
#112 - [HGEMM] Update HGEMM/SGEMM Supported Matrix
Pull Request -
State: closed - Opened by DefTruth 29 days ago
#111 - [HGEMM] Add M=N=K option for benchmark
Pull Request -
State: closed - Opened by DefTruth 29 days ago
#110 - [HGEMM][Docs] Add HGEMM Supported Matrix
Pull Request -
State: closed - Opened by DefTruth 29 days ago
#109 - [HGEMM] Update HGEMM MMA/WMMA Usage
Pull Request -
State: closed - Opened by DefTruth 29 days ago
#108 - [HGEMM] Try reduce registers usage
Pull Request -
State: closed - Opened by DefTruth 29 days ago
#107 - [HGEMM] add -Xptxas -v compile flag
Pull Request -
State: closed - Opened by DefTruth about 1 month ago
#106 - [HGEMM] Add Warp Swizzle as template param
Pull Request -
State: closed - Opened by DefTruth about 1 month ago
#105 - [HGEMM] Update HGEMM benchmark scripts
Pull Request -
State: closed - Opened by DefTruth about 1 month ago
#104 - [HGEMM] Add HGEMM MMA Col Major Kernel
Pull Request -
State: closed - Opened by DefTruth about 1 month ago
#103 - [HGEMM] Add some note to collective store
Pull Request -
State: closed - Opened by DefTruth about 1 month ago
#103 - [HGEMM] Add some note to collective store
Pull Request -
State: closed - Opened by DefTruth about 1 month ago
#102 - [NMS] Add nms f32 cuda kernel.
Pull Request -
State: closed - Opened by bear-zd about 1 month ago
- 1 comment
#102 - [NMS] Add nms f32 cuda kernel.
Pull Request -
State: closed - Opened by bear-zd about 1 month ago
- 1 comment
#101 - [HGEMM] collective store via warp shfl® reuse
Pull Request -
State: closed - Opened by DefTruth about 1 month ago
#101 - [HGEMM] collective store via warp shfl® reuse
Pull Request -
State: closed - Opened by DefTruth about 1 month ago
#100 - [HGEMM] ldmatrix.x4.trans with reg double buffers
Pull Request -
State: closed - Opened by DefTruth about 1 month ago
#100 - [HGEMM] ldmatrix.x4.trans with reg double buffers
Pull Request -
State: closed - Opened by DefTruth about 1 month ago
#99 - [HGEMM] HGEMM MMA with Reg Double Buffers
Pull Request -
State: closed - Opened by DefTruth about 1 month ago
#99 - [HGEMM] HGEMM MMA with Reg Double Buffers
Pull Request -
State: closed - Opened by DefTruth about 1 month ago
#98 - [HGEMM] Add MMA 16816 swizzle, Up to 115 TFLOPS
Pull Request -
State: closed - Opened by DefTruth about 1 month ago
#97 - [HGEMM] Update HGEMM WMMA Benchmark
Pull Request -
State: closed - Opened by DefTruth about 1 month ago
#96 - [HGEMM] Refactor HGEMM WMMA 161616 kernels
Pull Request -
State: closed - Opened by DefTruth about 1 month ago
#96 - [HGEMM] Refactor HGEMM WMMA 161616 kernels
Pull Request -
State: closed - Opened by DefTruth about 1 month ago
#95 - [HGEMM] update HGEMM benchmark option
Pull Request -
State: closed - Opened by DefTruth about 1 month ago
#95 - [HGEMM] update HGEMM benchmark option
Pull Request -
State: closed - Opened by DefTruth about 1 month ago
#94 - [HGEMM] Add GeForce RTX 3080 Laptop benchmark
Pull Request -
State: closed - Opened by DefTruth about 1 month ago
#93 - [Docs] rename mat_transpose -> mat-transpose
Pull Request -
State: closed - Opened by DefTruth about 1 month ago
#92 - [HGEMM] optimize SMEM padding, up to 113 TFLOPS
Pull Request -
State: closed - Opened by DefTruth about 1 month ago
#91 - [Mat][Trans] Add f32x4_shared/bcf row/col first kernel.
Pull Request -
State: closed - Opened by bear-zd about 1 month ago
- 2 comments
#90 - [Docs][Contribute] Add How to contribute Notes
Pull Request -
State: closed - Opened by DefTruth about 1 month ago
#89 - [Mat][Trans] Add f32/f32x4 row/col first kernel
Pull Request -
State: closed - Opened by bear-zd about 1 month ago
- 1 comment
#88 - [Mat Transpose] Add mat transpose f32/x4_packed kernel.
Pull Request -
State: closed - Opened by bear-zd about 1 month ago
- 2 comments
#88 - [Mat Transpose] Add mat transpose f32/x4_packed kernel.
Pull Request -
State: closed - Opened by bear-zd about 1 month ago
- 2 comments
#87 - [SGEMM] Update SGEMM TF32 Benchmark
Pull Request -
State: closed - Opened by DefTruth about 1 month ago
#87 - [SGEMM] Update SGEMM TF32 Benchmark
Pull Request -
State: closed - Opened by DefTruth about 1 month ago
#86 - [HGEMM] mma4x4_warp4x4_stages with swizzle
Pull Request -
State: closed - Opened by DefTruth about 1 month ago
#86 - [HGEMM] mma4x4_warp4x4_stages with swizzle
Pull Request -
State: closed - Opened by DefTruth about 1 month ago
#85 - [SWISH] support Swish F32/F16 kernel
Pull Request -
State: closed - Opened by wangzijian1010 about 1 month ago
- 1 comment
#85 - [SWISH] support Swish F32/F16 kernel
Pull Request -
State: closed - Opened by wangzijian1010 about 1 month ago
- 1 comment
#84 - [SGEMM] SGEMM TF32 Thread Block Swizzle
Pull Request -
State: closed - Opened by DefTruth about 1 month ago
#84 - [SGEMM] SGEMM TF32 Thread Block Swizzle
Pull Request -
State: closed - Opened by DefTruth about 1 month ago
#83 - [HGEMM] make thread block swizzle stride as N/4
Pull Request -
State: closed - Opened by DefTruth about 1 month ago
#83 - [HGEMM] make thread block swizzle stride as N/4
Pull Request -
State: closed - Opened by DefTruth about 1 month ago
#82 - [HEGMM] HGEMM WMMA Thread Block Swizzle
Pull Request -
State: closed - Opened by DefTruth about 1 month ago
#81 - [Docs] Update README.md
Pull Request -
State: closed - Opened by DefTruth about 1 month ago
#80 - [RoPE] Add minimal RoPE f32/f32x4 pack impl
Pull Request -
State: closed - Opened by bear-zd about 1 month ago
- 2 comments
#79 - [SGEMM] Add Kernel cudaFuncSetAttribute hint
Pull Request -
State: closed - Opened by DefTruth about 1 month ago
#78 - [SGEMM] Add cuBLAS SGEMM F32/TF32 baseline
Pull Request -
State: closed - Opened by DefTruth about 1 month ago
#77 - [SGEMM] Add SGEMM WMMA TF32 Stage2/3
Pull Request -
State: closed - Opened by DefTruth about 1 month ago
#76 - [HGEMM] HGEMM WMMA Stage mma4x2+warp4x4
Pull Request -
State: closed - Opened by DefTruth about 1 month ago
#75 - [HEGMM][Bugfix] fix HGEMM Stage cp.async error
Pull Request -
State: closed - Opened by DefTruth about 1 month ago
#74 - [HGEMM] Add HGEMM WMMA Stage 3/4 Kernel
Pull Request -
State: closed - Opened by DefTruth about 2 months ago
#73 - [Softmax] Add online softmax f32x4 pack kernel
Pull Request -
State: closed - Opened by bear-zd about 2 months ago
#72 - [Softmax] Add online softmax f32x4 pack kernel
Pull Request -
State: closed - Opened by bear-zd about 2 months ago
#71 - [HGEMM] HGEMM WMMA with Reg double buffers
Pull Request -
State: closed - Opened by DefTruth about 2 months ago
#70 - [HGEMM] Add HGEMM mma4x2, warp2x4x2 kernel
Pull Request -
State: closed - Opened by DefTruth about 2 months ago
#69 - [HGEMM] Add HGEMM WMMA Double Buffers
Pull Request -
State: closed - Opened by DefTruth about 2 months ago
#68 - [Embedding] Add embedding kernel f32/x4/x4_pack, f16/x8/x8_pack
Pull Request -
State: closed - Opened by bear-zd about 2 months ago
- 2 comments
#67 - [HGEMM] HGEMM Tensor Cores Support Part-1
Pull Request -
State: closed - Opened by DefTruth about 2 months ago
#66 - [GELU] Add f32/x4, f16/x2/x8/x8pack kernel.
Pull Request -
State: closed - Opened by bear-zd about 2 months ago
- 1 comment
#65 - [SGEMM][Async] Add K16 + Copy Async Kernel
Pull Request -
State: closed - Opened by DefTruth about 2 months ago
#64 - [SGEMM][Async] Add naive copy async SGEMM
Pull Request -
State: closed - Opened by DefTruth about 2 months ago
#63 - [Softmax][Bugfix] fixed softmax compile error
Pull Request -
State: closed - Opened by DefTruth about 2 months ago
#62 - [HGEMM][Async] support K16/32 pack+cp.async+dbuf
Pull Request -
State: closed - Opened by DefTruth about 2 months ago
#61 - [Softmax] Add online softmax according to Nvidia Paper (#60)
Pull Request -
State: closed - Opened by DefTruth about 2 months ago
#60 - [Softmax] Add online softmax according to Nvidia Paper
Pull Request -
State: closed - Opened by bear-zd about 2 months ago
- 1 comment
#59 - [HGEMM] Add PyTorch HGEMM profile
Pull Request -
State: closed - Opened by DefTruth about 2 months ago
#58 - [Docs] Add docs for HGEMM/SGEMM double buffers
Pull Request -
State: closed - Opened by DefTruth about 2 months ago
#57 - [HGEMM] HEGMM kernel with double buffers
Pull Request -
State: closed - Opened by DefTruth about 2 months ago