Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / Dao-AILab/flash-attention issues and pull requests
#1316 - Why Doesn’t Any Output Appear When Combining Two Conditional Print Statements in CUTLASS Consumer?
Issue -
State: closed - Opened by ziyuhuang123 3 months ago
- 1 comment
#1315 - Why we iteratively arrive at barrier_O??
Issue -
State: open - Opened by ziyuhuang123 3 months ago
- 4 comments
#1314 - question on the `block_table` and `cache_seqlens` in `flash_attn_with_kvcache`
Issue -
State: open - Opened by chakpongchung 3 months ago
- 8 comments
#1313 - Package is uninstallable
Issue -
State: open - Opened by chrisspen 3 months ago
- 1 comment
#1312 - Operation Error: /usr/bin/ld: cannot find -lcuda
Issue -
State: open - Opened by ying123ww 3 months ago
- 1 comment
#1311 - Varlen flash attention: CUDA illegal memory access
Issue -
State: open - Opened by clessig 3 months ago
- 13 comments
#1310 - flash attention 3 benchmark for H20 hopper
Issue -
State: closed - Opened by aftersnow 3 months ago
- 9 comments
#1309 - Looking for compatible version
Issue -
State: open - Opened by mahmoodn 3 months ago
- 1 comment
#1308 - updating nvidia-open broke a lot of things... let's see if we can get things working again...
Issue -
State: closed - Opened by kairin 3 months ago
- 3 comments
#1307 - ROCm compilation error with PyTorch 2.5.1
Issue -
State: open - Opened by calebthomas259 3 months ago
- 4 comments
#1306 - Result mismatch with headdim=256 bwd
Issue -
State: open - Opened by zidanehuang001 3 months ago
- 5 comments
#1305 - Make namespace comment consistent
Pull Request -
State: closed - Opened by ngocson2vn 3 months ago
#1304 - Question about disabling the causal mask
Issue -
State: closed - Opened by volcverse 3 months ago
- 2 comments
#1303 - Error in line 21 O_i adjustment in Algorithm 1 in FlashAttention-3 Paper
Issue -
State: closed - Opened by hasanunlu 3 months ago
- 1 comment
#1302 - whl for torch 2.5.0
Issue -
State: open - Opened by Galaxy-Husky 3 months ago
- 4 comments
#1301 - flash_attn_with_kvcach return block_lse or attention_score
Issue -
State: open - Opened by NonvolatileMemory 3 months ago
- 2 comments
#1300 - FlashSelfAttention and SelfAttention in flash_attn.modules.mha give different results
Issue -
State: open - Opened by senxiu-puleya 3 months ago
- 5 comments
#1299 - using `out` argument will change the output
Issue -
State: open - Opened by youkaichao 3 months ago
#1298 - Different nr. of KV and Q tokens
Issue -
State: closed - Opened by kilianhaefeli 3 months ago
- 2 comments
#1297 - Promote wheels as alternative to pip install flash-attn
Pull Request -
State: open - Opened by simonw 3 months ago
- 4 comments
#1296 - Fail to initialize the TMA descriptor for head_dim of 192
Issue -
State: closed - Opened by NiuMa-1234 3 months ago
- 2 comments
#1295 - Build stuck on torch2.5.0
Issue -
State: open - Opened by ycformal 4 months ago
- 15 comments
#1294 - any plan for varlen fwd support hopper FP8?
Issue -
State: closed - Opened by pengwu22 4 months ago
#1293 - Request for New Release with PT Compile Ops
Issue -
State: open - Opened by kostum123 4 months ago
#1292 - Support for CUDA 12.4 and above? URGENT PERHAPS?
Issue -
State: open - Opened by BBC-Esq 4 months ago
- 7 comments
#1291 - Support different shape attention mask
Issue -
State: open - Opened by SunzeY 4 months ago
#1290 - RuntimeError: Only support head size 64, 128, and 256 for now [flashattn_hopper_cuda]
Issue -
State: closed - Opened by NiuMa-1234 4 months ago
- 4 comments
#1289 - CUTLASS 3.5.1 makes Flash Attention 3 slower?
Issue -
State: open - Opened by fno2010 4 months ago
- 4 comments
#1288 - fix: in newer versions of triton, tl.dot should take as input only q …
Pull Request -
State: open - Opened by EdouardYvinec 4 months ago
#1287 - undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE
Issue -
State: open - Opened by LanXingXuan 4 months ago
- 1 comment
#1286 - In unit test,how is the dropout_fraction diff tolerance selected?
Issue -
State: open - Opened by muoshuosha 4 months ago
#1285 - Fix compilation with clang on ARM64
Pull Request -
State: closed - Opened by sclarkson 4 months ago
#1284 - Feat: Add support for PyTorch 2.5 in workflows
Pull Request -
State: open - Opened by NanoCode012 4 months ago
#1284 - Feat: Add support for PyTorch 2.5 in workflows
Pull Request -
State: closed - Opened by NanoCode012 4 months ago
- 5 comments
#1283 - How to profile standard attention written in pytorch?
Issue -
State: open - Opened by woongjoonchoi 4 months ago
#1283 - How to profile standard attention written in pytorch?
Issue -
State: open - Opened by woongjoonchoi 4 months ago
#1282 - FlashAttention installation error: "CUDA 11.6 and above" requirement issue
Issue -
State: open - Opened by 21X5122 4 months ago
#1282 - FlashAttention installation error: "CUDA 11.6 and above" requirement issue
Issue -
State: open - Opened by 21X5122 4 months ago
- 1 comment
#1281 - Softcap for FlashAttention v3
Issue -
State: open - Opened by Jeff-Zilence 4 months ago
- 1 comment
#1281 - Softcap for FlashAttention v3
Issue -
State: open - Opened by Jeff-Zilence 4 months ago
- 1 comment
#1280 - ImportError: cannot import name 'flash_attn_unpadded_qkvpacked_func' from 'flash_attn.flash_attn_interface, why i cannot import it?? QAQ
Issue -
State: open - Opened by YANGTUOMAO 4 months ago
#1280 - ImportError: cannot import name 'flash_attn_unpadded_qkvpacked_func' from 'flash_attn.flash_attn_interface, why i cannot import it?? QAQ
Issue -
State: open - Opened by YANGTUOMAO 4 months ago
#1279 - Fix copy-paste error in hopper tests
Pull Request -
State: closed - Opened by milesvant 4 months ago
#1279 - Fix copy-paste error in hopper tests
Pull Request -
State: closed - Opened by milesvant 4 months ago
#1278 - Unable to import my new kernel function after compilation success.
Issue -
State: open - Opened by jpli02 4 months ago
- 2 comments
#1278 - Unable to import my new kernel function after compilation success.
Issue -
State: open - Opened by jpli02 4 months ago
- 2 comments
#1277 - Why does the flash_attn_varlen_func method increase GPU memory usage?
Issue -
State: open - Opened by shaonan1993 4 months ago
- 1 comment
#1277 - Why does the flash_attn_varlen_func method increase GPU memory usage?
Issue -
State: open - Opened by shaonan1993 4 months ago
- 1 comment
#1276 - Is there a way to install flash-attention without specific cuda version ?
Issue -
State: open - Opened by HuangChiEn 4 months ago
#1276 - Is there a way to install flash-attention without specific cuda version ?
Issue -
State: open - Opened by HuangChiEn 4 months ago
#1275 - Concurrent Warp Group Execution in FA3: Tensor Core Resource Limitation?
Issue -
State: open - Opened by ziyuhuang123 4 months ago
#1274 - Does FA2 support 4D attention mask?
Issue -
State: open - Opened by XiangTodayEatsWhat 4 months ago
#1273 - why flash attention fp8 kernel using fp16 for output?
Issue -
State: closed - Opened by cccddd77 4 months ago
#1272 - Six Flash-Attention-3 unit tests fail on H20
Issue -
State: closed - Opened by cailun01 4 months ago
- 5 comments
#1271 - Would using both strategies simultaneously theoretically result in better overlap between TC and MUFU? How could this be explained with a diagram?
Issue -
State: open - Opened by ziyuhuang123 4 months ago
#1270 - How to use the function of flash-attn-1 to mimic the behavior of flash_attn_func in flash-attn-2?
Issue -
State: open - Opened by jpWang 4 months ago
#1269 - Unable to compile for MI300X (gfx942) with ROCm 6.2.2 due to getCurrentHIPStream().stream();
Issue -
State: open - Opened by lhl 4 months ago
- 1 comment
#1268 - Paged Attention support for FA3
Pull Request -
State: closed - Opened by kadeng 4 months ago
- 3 comments
#1267 - Why we use block shape like 176 80 192? How does they fit in the WGMMA?
Issue -
State: closed - Opened by ziyuhuang123 4 months ago
- 1 comment
#1266 - Where in the code demonstrate inter-warp policy?
Issue -
State: open - Opened by ziyuhuang123 4 months ago
- 4 comments
#1265 - Intra-Warpgroup Overlapping GEMMs and Softmax in FA3
Issue -
State: closed - Opened by ziyuhuang123 4 months ago
- 7 comments
#1264 - flash-attention
Issue -
State: open - Opened by 21X5122 4 months ago
- 1 comment
#1263 - FlashAttention3 support for forward pass with kv cache
Issue -
State: open - Opened by jorgeantonio21 4 months ago
- 1 comment
#1262 - No module named moe_kernel in Flash Attention Installation
Issue -
State: closed - Opened by abhasin14 4 months ago
- 1 comment
#1261 - Speeding up exp with lookup tables?
Issue -
State: open - Opened by ethansmith2000 4 months ago
- 3 comments
#1260 - Questions about calculating the number of hmb accesses
Issue -
State: closed - Opened by uniqueness 4 months ago
- 5 comments
#1259 - what if I want to reuse data through smem between gemm0 and gemm1?
Issue -
State: open - Opened by ziyuhuang123 4 months ago
- 1 comment
#1258 - Is TileShape_MNK shape 128, 176, 80, 192 kind of strange?
Issue -
State: closed - Opened by ziyuhuang123 4 months ago
- 1 comment
#1257 - Can I print value within function? (like load function)
Issue -
State: closed - Opened by ziyuhuang123 4 months ago
- 4 comments
#1256 - In non-casual case why we have mask?
Issue -
State: open - Opened by ziyuhuang123 4 months ago
- 2 comments
#1255 - how to remove softmax operations?
Issue -
State: closed - Opened by ziyuhuang123 4 months ago
- 3 comments
#1254 - FA3 varlen_bwd hangs (FA2 works in the same case)
Issue -
State: open - Opened by goldhuang 4 months ago
- 2 comments
#1253 - Why attn_ref use fp32 in fwd, but use fp16/bf16 in bwd?
Issue -
State: open - Opened by muoshuosha 4 months ago
- 4 comments
#1252 - Look into sequence packing
Issue -
State: closed - Opened by alex-hh 4 months ago
#1251 - dropout in FA3 needs get fixed?
Issue -
State: closed - Opened by jundaf2 4 months ago
- 4 comments
#1250 - Runtime error from transformers
Issue -
State: open - Opened by HarryK4673 4 months ago
- 2 comments
#1249 - Difference between FusedMLP and MLP?
Issue -
State: open - Opened by prmudgal 4 months ago
- 4 comments
#1248 - where is flash decoding second stage (reduce) code ?
Issue -
State: open - Opened by liuqi123123 4 months ago
- 9 comments
#1246 - AttributeError: 'Qwen2FlashAttention2' object has no attribute '_flash_attention_forward'
Issue -
State: closed - Opened by zhangyuqi-1 5 months ago
- 1 comment
#1245 - [QST]When params.deterministic is true, why backward prop still use atomicAdd?
Issue -
State: closed - Opened by zhang22222 5 months ago
- 3 comments
#1244 - Why do we have an all_reduce with wrong backward?
Issue -
State: closed - Opened by zhuzilin 5 months ago
- 1 comment
#1243 - CUDA versions > 12.3 do not correctly compile H100 Flash Attention 3
Issue -
State: open - Opened by rohany 5 months ago
- 1 comment
#1242 - Partial success with build from source for Windows 11, but the resulting wheel needed work
Issue -
State: closed - Opened by jim-plus 5 months ago
- 1 comment
#1241 - b
Issue -
State: closed - Opened by rgitfiletransfer 5 months ago
#1240 - Fix FAv3 compilation with MSVC
Pull Request -
State: closed - Opened by hlky 5 months ago
#1239 - Sync api change for ROCm Flash attention
Pull Request -
State: closed - Opened by rocking5566 5 months ago
#1238 - 【HDIM=96】head dim = 96 ?
Issue -
State: open - Opened by SunNy820828449 5 months ago
#1237 - Minify `torch.torch.int32` to `torch.int32` in Bert
Pull Request -
State: closed - Opened by imShZh 5 months ago
#1236 - FA3 kvcache + split kv + gqa parallelization
Pull Request -
State: closed - Opened by jayhshah 5 months ago
#1235 - WHEN can we get the flash-attention 2.x for Turing GPU ?
Issue -
State: open - Opened by eileen2003-w 5 months ago
- 3 comments
#1234 - 2.6.4 and FA3 release .whl for CUDA 12.4 torch2.4.1 python 3.11?
Issue -
State: open - Opened by tqangxl 5 months ago
#1233 - Add local attention in Hopper FAv3
Pull Request -
State: closed - Opened by ipiszy 5 months ago
#1232 - fp8 not enabled for mha_varlen_fwd
Issue -
State: open - Opened by goldhuang 5 months ago
#1231 - [BUG]2 tests failed...?
Issue -
State: open - Opened by ziyuhuang123 5 months ago
#1230 - Turing architecture error on Nvidia Quadro T1000
Issue -
State: open - Opened by Tortoise17 5 months ago
- 2 comments
#1229 - ERROR [12/13] RUN pip install flash-attn --no-build-isolation
Issue -
State: open - Opened by promaprogga 5 months ago
- 1 comment
#1228 - Avoid padding computation with `cu_seqlens`
Issue -
State: open - Opened by imoneoi 5 months ago
- 3 comments
#1227 - ImportError: fused_dense is not installed
Issue -
State: open - Opened by kanebay 5 months ago
- 2 comments
#1226 - Pytorch 2.4.1 with flash-attn 2.5.8
Issue -
State: closed - Opened by adtian2 5 months ago
- 2 comments
#1225 - Softmax (particularly exp operations) becomes a major bottleneck in full FP16 pipeline
Issue -
State: open - Opened by phantaurus 5 months ago
- 6 comments