Dao-AILab/flash-attention issues and pull requests

#1316 - Why Doesn’t Any Output Appear When Combining Two Conditional Print Statements in CUTLASS Consumer?

Issue - State: closed - Opened by ziyuhuang123 3 months ago - 1 comment

#1315 - Why we iteratively arrive at barrier_O??

Issue - State: open - Opened by ziyuhuang123 3 months ago - 4 comments

#1314 - question on the `block_table` and `cache_seqlens` in `flash_attn_with_kvcache`

Issue - State: open - Opened by chakpongchung 3 months ago - 8 comments

#1313 - Package is uninstallable

Issue - State: open - Opened by chrisspen 3 months ago - 1 comment

#1312 - Operation Error: /usr/bin/ld: cannot find -lcuda

Issue - State: open - Opened by ying123ww 3 months ago - 1 comment

#1311 - Varlen flash attention: CUDA illegal memory access

Issue - State: open - Opened by clessig 3 months ago - 13 comments

#1310 - flash attention 3 benchmark for H20 hopper

Issue - State: closed - Opened by aftersnow 3 months ago - 9 comments

#1309 - Looking for compatible version

Issue - State: open - Opened by mahmoodn 3 months ago - 1 comment

#1308 - updating nvidia-open broke a lot of things... let's see if we can get things working again...

Issue - State: closed - Opened by kairin 3 months ago - 3 comments

#1307 - ROCm compilation error with PyTorch 2.5.1

Issue - State: open - Opened by calebthomas259 3 months ago - 4 comments

#1306 - Result mismatch with headdim=256 bwd

Issue - State: open - Opened by zidanehuang001 3 months ago - 5 comments

#1305 - Make namespace comment consistent

Pull Request - State: closed - Opened by ngocson2vn 3 months ago

#1304 - Question about disabling the causal mask

Issue - State: closed - Opened by volcverse 3 months ago - 2 comments

#1303 - Error in line 21 O_i adjustment in Algorithm 1 in FlashAttention-3 Paper

Issue - State: closed - Opened by hasanunlu 3 months ago - 1 comment

#1302 - whl for torch 2.5.0

Issue - State: open - Opened by Galaxy-Husky 3 months ago - 4 comments

#1301 - flash_attn_with_kvcach return block_lse or attention_score

Issue - State: open - Opened by NonvolatileMemory 3 months ago - 2 comments

#1300 - FlashSelfAttention and SelfAttention in flash_attn.modules.mha give different results

Issue - State: open - Opened by senxiu-puleya 3 months ago - 5 comments

#1299 - using `out` argument will change the output

Issue - State: open - Opened by youkaichao 3 months ago

#1298 - Different nr. of KV and Q tokens

Issue - State: closed - Opened by kilianhaefeli 3 months ago - 2 comments

#1297 - Promote wheels as alternative to pip install flash-attn

Pull Request - State: open - Opened by simonw 3 months ago - 4 comments

#1296 - Fail to initialize the TMA descriptor for head_dim of 192

Issue - State: closed - Opened by NiuMa-1234 3 months ago - 2 comments

#1295 - Build stuck on torch2.5.0

Issue - State: open - Opened by ycformal 4 months ago - 15 comments

#1294 - any plan for varlen fwd support hopper FP8?

Issue - State: closed - Opened by pengwu22 4 months ago

#1293 - Request for New Release with PT Compile Ops

Issue - State: open - Opened by kostum123 4 months ago

#1292 - Support for CUDA 12.4 and above? URGENT PERHAPS?

Issue - State: open - Opened by BBC-Esq 4 months ago - 7 comments

#1291 - Support different shape attention mask

Issue - State: open - Opened by SunzeY 4 months ago

#1290 - RuntimeError: Only support head size 64, 128, and 256 for now [flashattn_hopper_cuda]

Issue - State: closed - Opened by NiuMa-1234 4 months ago - 4 comments

#1289 - CUTLASS 3.5.1 makes Flash Attention 3 slower?

Issue - State: open - Opened by fno2010 4 months ago - 4 comments

#1288 - fix: in newer versions of triton, tl.dot should take as input only q …

Pull Request - State: open - Opened by EdouardYvinec 4 months ago

#1287 - undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE

Issue - State: open - Opened by LanXingXuan 4 months ago - 1 comment

#1286 - In unit test，how is the dropout_fraction diff tolerance selected?

Issue - State: open - Opened by muoshuosha 4 months ago

#1285 - Fix compilation with clang on ARM64

Pull Request - State: closed - Opened by sclarkson 4 months ago

#1284 - Feat: Add support for PyTorch 2.5 in workflows

Pull Request - State: open - Opened by NanoCode012 4 months ago

#1284 - Feat: Add support for PyTorch 2.5 in workflows

Pull Request - State: closed - Opened by NanoCode012 4 months ago - 5 comments

#1283 - How to profile standard attention written in pytorch?

Issue - State: open - Opened by woongjoonchoi 4 months ago

#1283 - How to profile standard attention written in pytorch?

Issue - State: open - Opened by woongjoonchoi 4 months ago

#1282 - FlashAttention installation error: "CUDA 11.6 and above" requirement issue

Issue - State: open - Opened by 21X5122 4 months ago

#1282 - FlashAttention installation error: "CUDA 11.6 and above" requirement issue

Issue - State: open - Opened by 21X5122 4 months ago - 1 comment

#1281 - Softcap for FlashAttention v3

Issue - State: open - Opened by Jeff-Zilence 4 months ago - 1 comment

#1281 - Softcap for FlashAttention v3

Issue - State: open - Opened by Jeff-Zilence 4 months ago - 1 comment

#1280 - ImportError: cannot import name 'flash_attn_unpadded_qkvpacked_func' from 'flash_attn.flash_attn_interface, why i cannot import it?? QAQ

Issue - State: open - Opened by YANGTUOMAO 4 months ago

#1280 - ImportError: cannot import name 'flash_attn_unpadded_qkvpacked_func' from 'flash_attn.flash_attn_interface, why i cannot import it?? QAQ

Issue - State: open - Opened by YANGTUOMAO 4 months ago

#1279 - Fix copy-paste error in hopper tests

Pull Request - State: closed - Opened by milesvant 4 months ago

#1279 - Fix copy-paste error in hopper tests

Pull Request - State: closed - Opened by milesvant 4 months ago

#1278 - Unable to import my new kernel function after compilation success.

Issue - State: open - Opened by jpli02 4 months ago - 2 comments

#1278 - Unable to import my new kernel function after compilation success.

Issue - State: open - Opened by jpli02 4 months ago - 2 comments

#1277 - Why does the flash_attn_varlen_func method increase GPU memory usage?

Issue - State: open - Opened by shaonan1993 4 months ago - 1 comment

#1277 - Why does the flash_attn_varlen_func method increase GPU memory usage?

Issue - State: open - Opened by shaonan1993 4 months ago - 1 comment

#1276 - Is there a way to install flash-attention without specific cuda version ?

Issue - State: open - Opened by HuangChiEn 4 months ago

#1276 - Is there a way to install flash-attention without specific cuda version ?

Issue - State: open - Opened by HuangChiEn 4 months ago

#1275 - Concurrent Warp Group Execution in FA3: Tensor Core Resource Limitation?

Issue - State: open - Opened by ziyuhuang123 4 months ago

#1274 - Does FA2 support 4D attention mask?

Issue - State: open - Opened by XiangTodayEatsWhat 4 months ago

#1273 - why flash attention fp8 kernel using fp16 for output?

Issue - State: closed - Opened by cccddd77 4 months ago

#1272 - Six Flash-Attention-3 unit tests fail on H20

Issue - State: closed - Opened by cailun01 4 months ago - 5 comments

#1271 - Would using both strategies simultaneously theoretically result in better overlap between TC and MUFU? How could this be explained with a diagram?

Issue - State: open - Opened by ziyuhuang123 4 months ago

#1270 - How to use the function of flash-attn-1 to mimic the behavior of flash_attn_func in flash-attn-2?

Issue - State: open - Opened by jpWang 4 months ago

#1269 - Unable to compile for MI300X (gfx942) with ROCm 6.2.2 due to getCurrentHIPStream().stream();

Issue - State: open - Opened by lhl 4 months ago - 1 comment

#1268 - Paged Attention support for FA3

Pull Request - State: closed - Opened by kadeng 4 months ago - 3 comments

#1267 - Why we use block shape like 176 80 192? How does they fit in the WGMMA?

Issue - State: closed - Opened by ziyuhuang123 4 months ago - 1 comment

#1266 - Where in the code demonstrate inter-warp policy?

Issue - State: open - Opened by ziyuhuang123 4 months ago - 4 comments

#1265 - Intra-Warpgroup Overlapping GEMMs and Softmax in FA3

Issue - State: closed - Opened by ziyuhuang123 4 months ago - 7 comments

#1264 - flash-attention

Issue - State: open - Opened by 21X5122 4 months ago - 1 comment

#1263 - FlashAttention3 support for forward pass with kv cache

Issue - State: open - Opened by jorgeantonio21 4 months ago - 1 comment

#1262 - No module named moe_kernel in Flash Attention Installation

Issue - State: closed - Opened by abhasin14 4 months ago - 1 comment

#1261 - Speeding up exp with lookup tables?

Issue - State: open - Opened by ethansmith2000 4 months ago - 3 comments

#1260 - Questions about calculating the number of hmb accesses

Issue - State: closed - Opened by uniqueness 4 months ago - 5 comments

#1259 - what if I want to reuse data through smem between gemm0 and gemm1?

Issue - State: open - Opened by ziyuhuang123 4 months ago - 1 comment

#1258 - Is TileShape_MNK shape 128, 176, 80, 192 kind of strange?

Issue - State: closed - Opened by ziyuhuang123 4 months ago - 1 comment

#1257 - Can I print value within function? (like load function)

Issue - State: closed - Opened by ziyuhuang123 4 months ago - 4 comments

#1256 - In non-casual case why we have mask?

Issue - State: open - Opened by ziyuhuang123 4 months ago - 2 comments

#1255 - how to remove softmax operations?

Issue - State: closed - Opened by ziyuhuang123 4 months ago - 3 comments

#1254 - FA3 varlen_bwd hangs (FA2 works in the same case)

Issue - State: open - Opened by goldhuang 4 months ago - 2 comments

#1253 - Why attn_ref use fp32 in fwd， but use fp16/bf16 in bwd?

Issue - State: open - Opened by muoshuosha 4 months ago - 4 comments

#1252 - Look into sequence packing

Issue - State: closed - Opened by alex-hh 4 months ago

#1251 - dropout in FA3 needs get fixed?

Issue - State: closed - Opened by jundaf2 4 months ago - 4 comments

#1250 - Runtime error from transformers

Issue - State: open - Opened by HarryK4673 4 months ago - 2 comments

#1249 - Difference between FusedMLP and MLP?

Issue - State: open - Opened by prmudgal 4 months ago - 4 comments

#1248 - where is flash decoding second stage (reduce) code ?

Issue - State: open - Opened by liuqi123123 4 months ago - 9 comments

#1246 - AttributeError: 'Qwen2FlashAttention2' object has no attribute '_flash_attention_forward'

Issue - State: closed - Opened by zhangyuqi-1 5 months ago - 1 comment

#1245 - [QST]When params.deterministic is true, why backward prop still use atomicAdd？

Issue - State: closed - Opened by zhang22222 5 months ago - 3 comments

#1244 - Why do we have an all_reduce with wrong backward?

Issue - State: closed - Opened by zhuzilin 5 months ago - 1 comment

#1243 - CUDA versions > 12.3 do not correctly compile H100 Flash Attention 3

Issue - State: open - Opened by rohany 5 months ago - 1 comment

#1242 - Partial success with build from source for Windows 11, but the resulting wheel needed work

Issue - State: closed - Opened by jim-plus 5 months ago - 1 comment

#1241 - b

Issue - State: closed - Opened by rgitfiletransfer 5 months ago

#1240 - Fix FAv3 compilation with MSVC

Pull Request - State: closed - Opened by hlky 5 months ago

#1239 - Sync api change for ROCm Flash attention

Pull Request - State: closed - Opened by rocking5566 5 months ago

#1238 - 【HDIM=96】head dim = 96 ?

Issue - State: open - Opened by SunNy820828449 5 months ago

#1237 - Minify `torch.torch.int32` to `torch.int32` in Bert

Pull Request - State: closed - Opened by imShZh 5 months ago

#1236 - FA3 kvcache + split kv + gqa parallelization

Pull Request - State: closed - Opened by jayhshah 5 months ago

#1235 - WHEN can we get the flash-attention 2.x for Turing GPU ?

Issue - State: open - Opened by eileen2003-w 5 months ago - 3 comments

#1234 - 2.6.4 and FA3 release .whl for CUDA 12.4 torch2.4.1 python 3.11?

Issue - State: open - Opened by tqangxl 5 months ago

#1233 - Add local attention in Hopper FAv3

Pull Request - State: closed - Opened by ipiszy 5 months ago

#1232 - fp8 not enabled for mha_varlen_fwd

Issue - State: open - Opened by goldhuang 5 months ago

#1231 - [BUG]2 tests failed...?

Issue - State: open - Opened by ziyuhuang123 5 months ago

#1230 - Turing architecture error on Nvidia Quadro T1000

Issue - State: open - Opened by Tortoise17 5 months ago - 2 comments

#1229 - ERROR [12/13] RUN pip install flash-attn --no-build-isolation

Issue - State: open - Opened by promaprogga 5 months ago - 1 comment

#1228 - Avoid padding computation with `cu_seqlens`

Issue - State: open - Opened by imoneoi 5 months ago - 3 comments

#1227 - ImportError: fused_dense is not installed

Issue - State: open - Opened by kanebay 5 months ago - 2 comments

#1226 - Pytorch 2.4.1 with flash-attn 2.5.8

Issue - State: closed - Opened by adtian2 5 months ago - 2 comments

#1225 - Softmax (particularly exp operations) becomes a major bottleneck in full FP16 pipeline

Issue - State: open - Opened by phantaurus 5 months ago - 6 comments

GitHub / Dao-AILab/flash-attention issues and pull requests