Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / DefTruth/CUDA-Learn-Notes issues and pull requests

#56 - [SGEMM] test bank conflicts free with smem offset

Pull Request - State: closed - Opened by DefTruth about 2 months ago

#55 - [FlashAttention] Refactor FlashAttention PyTorch bindings

Pull Request - State: closed - Opened by DefTruth about 2 months ago

#54 - [HGEMM] Pack sliced_k f16x4/fp16x8 HGEMM

Pull Request - State: closed - Opened by DefTruth about 2 months ago

#53 - [Misc][Benchmark] optimize benchmarks

Pull Request - State: closed - Opened by DefTruth about 2 months ago

#52 - [SGEMM] bank conflicts free & double buffers

Pull Request - State: closed - Opened by DefTruth 2 months ago

#51 - [SGEMM] Add naive sgemm kernel

Pull Request - State: closed - Opened by DefTruth 2 months ago

#50 - 🌤🌤 CONTRIBUTE 🎉🎉

Issue - State: open - Opened by DefTruth 2 months ago
Labels: documentation, good first issue, contribute

#49 - [Softmax][FP16] Pack f16x8 softmax kernel

Pull Request - State: closed - Opened by DefTruth 2 months ago

#48 - [LayerNorm][FP16] support fp16x8_pack_f32 kernel

Pull Request - State: closed - Opened by DefTruth 2 months ago

#47 - [RMSNorm][FP16] Pack f16x8 rmsnorm

Pull Request - State: closed - Opened by DefTruth 2 months ago

#46 - [LayerNorm][FP16] Add pack support for f16x8 LD/ST

Pull Request - State: closed - Opened by DefTruth 2 months ago

#45 - [DotProd][FP16] support f16x8_pack kernel

Pull Request - State: closed - Opened by DefTruth 2 months ago

#44 - [Nsight] Add nsys/ncu usage, ptx/sass

Pull Request - State: closed - Opened by DefTruth 2 months ago

#43 - [Reduce][Kernel] Pack f16/bf16x8 & fp8/i8x16 LD/ST

Pull Request - State: closed - Opened by DefTruth 2 months ago

#42 - [RELU][FP16] Add f16x8_pack kernel, boost 2.1x

Pull Request - State: closed - Opened by DefTruth 2 months ago

#41 - [FlashAttention] replace FLOAT4 with LDST128BITS macro

Pull Request - State: closed - Opened by DefTruth 2 months ago

#40 - [Elementwise][Half] support f16x8_pack kernel, boost 1.1x

Pull Request - State: closed - Opened by DefTruth 2 months ago

#39 - [Sigmoid][F16] Add f16x8_pack kernel, boost 1.5x ~

Pull Request - State: closed - Opened by DefTruth 2 months ago

#38 - [Misc] Update refactor branch

Pull Request - State: closed - Opened by DefTruth 2 months ago

#37 - Clamped input range in Sigmoid kernel to prevent overflow

Pull Request - State: closed - Opened by Phoenix8215 2 months ago

#36 - [Refactor][7/N] CUDA Learn Notes refactor Part-7

Pull Request - State: closed - Opened by DefTruth 2 months ago

#35 - [FA2][Half] Add FA2 f16_mma_m16n8k16 kernel

Pull Request - State: closed - Opened by DefTruth 2 months ago

#34 - Bump up to v2.3

Pull Request - State: closed - Opened by DefTruth 2 months ago

#33 - [FlashAttention] Refactor flash_attn_1_fwd_f32 kernel

Pull Request - State: closed - Opened by DefTruth 2 months ago

#32 - [HGEMV][Half] support hgemv k32/k128/f16

Pull Request - State: closed - Opened by DefTruth 2 months ago

#31 - [HGEMM] Add slicked_k&t_8x8_sliced_k_f16x4

Pull Request - State: closed - Opened by DefTruth 2 months ago

#30 - [LayerNorm][Kernel] Add HALF2 SUM/SUB/VAR macro

Pull Request - State: closed - Opened by DefTruth 2 months ago

#29 - [RMSNorm][Kernel] Add FLOAT2/HALF2_VARIANCE macro

Pull Request - State: closed - Opened by DefTruth 2 months ago

#28 - [RMSNorm] support f16x8_f32 RMSNorm

Pull Request - State: closed - Opened by DefTruth 2 months ago

#27 - update branch

Pull Request - State: closed - Opened by DefTruth 2 months ago

#26 - [RELU][Half] support fp16x8 RELU kernel

Pull Request - State: closed - Opened by DefTruth 2 months ago

#25 - [Elementwise][Half] support fp16x8 packed Elementwise

Pull Request - State: closed - Opened by DefTruth 2 months ago

#24 - [Elementwise][Half] support fp16x8 packed Elementwise

Pull Request - State: closed - Opened by DefTruth 2 months ago

#23 - [Bugfix][Kernel] fixed some kernel blocks calculate errors

Pull Request - State: closed - Opened by DefTruth 2 months ago

#22 - [RMSNorm][Half] support fp16x8 packed RMSNorm

Pull Request - State: closed - Opened by DefTruth 2 months ago

#21 - [Reduce][Half] add HALF2 & BFLOAT2 macro

Pull Request - State: closed - Opened by DefTruth 2 months ago

#20 - [LayerNorm][Half] support fp16x8 packed LayerNorm (#19)

Pull Request - State: closed - Opened by DefTruth 2 months ago

#19 - [LayerNorm][Half] support fp16x8 packed LayerNorm

Pull Request - State: closed - Opened by DefTruth 2 months ago

#18 - [Refactor][5/N] CUDA Learn Notes refactor Part-6

Pull Request - State: closed - Opened by DefTruth 3 months ago

#17 - [Refactor][6/N] CUDA Learn Notes refactor Part-6

Pull Request - State: closed - Opened by DefTruth 3 months ago

#16 - Bump up to v2.2

Pull Request - State: closed - Opened by DefTruth 3 months ago

#15 - [Refactor][5/N] CUDA Learn Notes refactor Part-5

Pull Request - State: closed - Opened by DefTruth 3 months ago

#14 - [Refactor][4/N] CUDA Learn Notes refactor Part-4

Pull Request - State: closed - Opened by DefTruth 3 months ago

#13 - [Refactor][4/N] CUDA Learn Notes refactor Part-4

Pull Request - State: closed - Opened by DefTruth 3 months ago

#12 - [Refactor][4/N] CUDA Learn Notes refactor Part-4

Pull Request - State: closed - Opened by DefTruth 3 months ago

#11 - [Refactor][4/N] CUDA Learn Notes refactor Part-4

Pull Request - State: closed - Opened by DefTruth 3 months ago

#10 - [Refactor][4/N] CUDA Learn Notes refactor Part-4

Pull Request - State: closed - Opened by DefTruth 3 months ago

#9 - Bump up to v0.8

Pull Request - State: closed - Opened by DefTruth 3 months ago

#8 - Bump up to v0.7

Pull Request - State: closed - Opened by DefTruth 4 months ago

#7 - __threadfence() 作用

Issue - State: closed - Opened by zbt78 7 months ago - 3 comments

#6 - 您好,请教一个关于代码中reduce相关的问题

Issue - State: closed - Opened by Ss-shuang123 7 months ago - 3 comments

#4 - 您好,请问sigmoid算子这里为啥没有考虑指数溢出问题

Issue - State: closed - Opened by Phoenix8215 8 months ago - 2 comments

#3 - resources

Issue - State: closed - Opened by DefTruth 8 months ago - 2 comments
Labels: stale

#2 - layer norm实现

Issue - State: closed - Opened by zbt78 9 months ago - 3 comments
Labels: stale

#1 - Update README.md

Pull Request - State: closed - Opened by rjzhb 9 months ago - 1 comment