Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / DefTruth/CUDA-Learn-Notes issues and pull requests
#56 - [SGEMM] test bank conflicts free with smem offset
Pull Request -
State: closed - Opened by DefTruth about 2 months ago
#55 - [FlashAttention] Refactor FlashAttention PyTorch bindings
Pull Request -
State: closed - Opened by DefTruth about 2 months ago
#54 - [HGEMM] Pack sliced_k f16x4/fp16x8 HGEMM
Pull Request -
State: closed - Opened by DefTruth about 2 months ago
#53 - [Misc][Benchmark] optimize benchmarks
Pull Request -
State: closed - Opened by DefTruth about 2 months ago
#52 - [SGEMM] bank conflicts free & double buffers
Pull Request -
State: closed - Opened by DefTruth 2 months ago
#51 - [SGEMM] Add naive sgemm kernel
Pull Request -
State: closed - Opened by DefTruth 2 months ago
#50 - 🌤🌤 CONTRIBUTE 🎉🎉
Issue -
State: open - Opened by DefTruth 2 months ago
Labels: documentation, good first issue, contribute
#49 - [Softmax][FP16] Pack f16x8 softmax kernel
Pull Request -
State: closed - Opened by DefTruth 2 months ago
#48 - [LayerNorm][FP16] support fp16x8_pack_f32 kernel
Pull Request -
State: closed - Opened by DefTruth 2 months ago
#47 - [RMSNorm][FP16] Pack f16x8 rmsnorm
Pull Request -
State: closed - Opened by DefTruth 2 months ago
#46 - [LayerNorm][FP16] Add pack support for f16x8 LD/ST
Pull Request -
State: closed - Opened by DefTruth 2 months ago
#45 - [DotProd][FP16] support f16x8_pack kernel
Pull Request -
State: closed - Opened by DefTruth 2 months ago
#44 - [Nsight] Add nsys/ncu usage, ptx/sass
Pull Request -
State: closed - Opened by DefTruth 2 months ago
#43 - [Reduce][Kernel] Pack f16/bf16x8 & fp8/i8x16 LD/ST
Pull Request -
State: closed - Opened by DefTruth 2 months ago
#42 - [RELU][FP16] Add f16x8_pack kernel, boost 2.1x
Pull Request -
State: closed - Opened by DefTruth 2 months ago
#41 - [FlashAttention] replace FLOAT4 with LDST128BITS macro
Pull Request -
State: closed - Opened by DefTruth 2 months ago
#40 - [Elementwise][Half] support f16x8_pack kernel, boost 1.1x
Pull Request -
State: closed - Opened by DefTruth 2 months ago
#39 - [Sigmoid][F16] Add f16x8_pack kernel, boost 1.5x ~
Pull Request -
State: closed - Opened by DefTruth 2 months ago
#38 - [Misc] Update refactor branch
Pull Request -
State: closed - Opened by DefTruth 2 months ago
#37 - Clamped input range in Sigmoid kernel to prevent overflow
Pull Request -
State: closed - Opened by Phoenix8215 2 months ago
#36 - [Refactor][7/N] CUDA Learn Notes refactor Part-7
Pull Request -
State: closed - Opened by DefTruth 2 months ago
#35 - [FA2][Half] Add FA2 f16_mma_m16n8k16 kernel
Pull Request -
State: closed - Opened by DefTruth 2 months ago
#34 - Bump up to v2.3
Pull Request -
State: closed - Opened by DefTruth 2 months ago
#33 - [FlashAttention] Refactor flash_attn_1_fwd_f32 kernel
Pull Request -
State: closed - Opened by DefTruth 2 months ago
#32 - [HGEMV][Half] support hgemv k32/k128/f16
Pull Request -
State: closed - Opened by DefTruth 2 months ago
#31 - [HGEMM] Add slicked_k&t_8x8_sliced_k_f16x4
Pull Request -
State: closed - Opened by DefTruth 2 months ago
#30 - [LayerNorm][Kernel] Add HALF2 SUM/SUB/VAR macro
Pull Request -
State: closed - Opened by DefTruth 2 months ago
#29 - [RMSNorm][Kernel] Add FLOAT2/HALF2_VARIANCE macro
Pull Request -
State: closed - Opened by DefTruth 2 months ago
#28 - [RMSNorm] support f16x8_f32 RMSNorm
Pull Request -
State: closed - Opened by DefTruth 2 months ago
#27 - update branch
Pull Request -
State: closed - Opened by DefTruth 2 months ago
#26 - [RELU][Half] support fp16x8 RELU kernel
Pull Request -
State: closed - Opened by DefTruth 2 months ago
#25 - [Elementwise][Half] support fp16x8 packed Elementwise
Pull Request -
State: closed - Opened by DefTruth 2 months ago
#24 - [Elementwise][Half] support fp16x8 packed Elementwise
Pull Request -
State: closed - Opened by DefTruth 2 months ago
#23 - [Bugfix][Kernel] fixed some kernel blocks calculate errors
Pull Request -
State: closed - Opened by DefTruth 2 months ago
#22 - [RMSNorm][Half] support fp16x8 packed RMSNorm
Pull Request -
State: closed - Opened by DefTruth 2 months ago
#21 - [Reduce][Half] add HALF2 & BFLOAT2 macro
Pull Request -
State: closed - Opened by DefTruth 2 months ago
#20 - [LayerNorm][Half] support fp16x8 packed LayerNorm (#19)
Pull Request -
State: closed - Opened by DefTruth 2 months ago
#19 - [LayerNorm][Half] support fp16x8 packed LayerNorm
Pull Request -
State: closed - Opened by DefTruth 2 months ago
#18 - [Refactor][5/N] CUDA Learn Notes refactor Part-6
Pull Request -
State: closed - Opened by DefTruth 3 months ago
#17 - [Refactor][6/N] CUDA Learn Notes refactor Part-6
Pull Request -
State: closed - Opened by DefTruth 3 months ago
#16 - Bump up to v2.2
Pull Request -
State: closed - Opened by DefTruth 3 months ago
#15 - [Refactor][5/N] CUDA Learn Notes refactor Part-5
Pull Request -
State: closed - Opened by DefTruth 3 months ago
#14 - [Refactor][4/N] CUDA Learn Notes refactor Part-4
Pull Request -
State: closed - Opened by DefTruth 3 months ago
#13 - [Refactor][4/N] CUDA Learn Notes refactor Part-4
Pull Request -
State: closed - Opened by DefTruth 3 months ago
#12 - [Refactor][4/N] CUDA Learn Notes refactor Part-4
Pull Request -
State: closed - Opened by DefTruth 3 months ago
#11 - [Refactor][4/N] CUDA Learn Notes refactor Part-4
Pull Request -
State: closed - Opened by DefTruth 3 months ago
#10 - [Refactor][4/N] CUDA Learn Notes refactor Part-4
Pull Request -
State: closed - Opened by DefTruth 3 months ago
#9 - Bump up to v0.8
Pull Request -
State: closed - Opened by DefTruth 3 months ago
#8 - Bump up to v0.7
Pull Request -
State: closed - Opened by DefTruth 4 months ago
#7 - __threadfence() 作用
Issue -
State: closed - Opened by zbt78 7 months ago
- 3 comments
#6 - 您好,请教一个关于代码中reduce相关的问题
Issue -
State: closed - Opened by Ss-shuang123 7 months ago
- 3 comments
#4 - 您好,请问sigmoid算子这里为啥没有考虑指数溢出问题
Issue -
State: closed - Opened by Phoenix8215 8 months ago
- 2 comments
#3 - resources
Issue -
State: closed - Opened by DefTruth 8 months ago
- 2 comments
Labels: stale
#2 - layer norm实现
Issue -
State: closed - Opened by zbt78 9 months ago
- 3 comments
Labels: stale
#1 - Update README.md
Pull Request -
State: closed - Opened by rjzhb 9 months ago
- 1 comment