Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / Dao-AILab/flash-attention issues and pull requests
#1471 - Support 576 Head dim for MLA
Issue -
State: open - Opened by sAviOr287 3 days ago
#1470 - Getting Error While Extracting
Issue -
State: open - Opened by emirardagn 3 days ago
#1469 - [How-to]How to get Flash-Attention under windows 11 CUDA
Issue -
State: open - Opened by mytait 7 days ago
- 6 comments
#1468 - fa3: include bert_padding utilities
Pull Request -
State: closed - Opened by tmm1 10 days ago
- 1 comment
#1467 - FA3 package is missing padding utilities
Issue -
State: open - Opened by tmm1 10 days ago
#1466 - What is `seqused_q` and `seqused_k`?
Issue -
State: open - Opened by cassanof 10 days ago
#1465 - FA3 KV Cache is slower than FA2 KV Cache
Issue -
State: open - Opened by DD-DuDa 11 days ago
- 3 comments
#1464 - Add support for Cuda 12.8 and B200 GPUs
Issue -
State: open - Opened by ofirkris 12 days ago
#1463 - Update Cuda Blackwell
Pull Request -
State: open - Opened by johnnynunez 13 days ago
#1462 - fused dense lib warning
Issue -
State: open - Opened by YuyueminAustin 13 days ago
#1461 - BUG? get the wrong value when logit_scale is 0
Issue -
State: open - Opened by shunshen93 13 days ago
- 1 comment
#1460 - [Build] Update version of setuptools used to generate core package
Pull Request -
State: closed - Opened by tmm1 14 days ago
#1459 - Conflict When Installing flash-attn 2.7.3 and 3.0.0b1 Together
Issue -
State: open - Opened by quanta42 14 days ago
#1458 - using Flash Attention version 2.5.7, upgraded CUTLASS to version 3.5 then encountered the following compilation error.
Issue -
State: open - Opened by ccccjunkang 14 days ago
- 1 comment
#1456 - dropout_layer_norm
Issue -
State: closed - Opened by ADiko1997 16 days ago
- 1 comment
#1455 - [BugFix] Fix a wrong reference to seqlen_k variable in the fwd_splitkv kernel
Pull Request -
State: open - Opened by muoshuosha 16 days ago
- 1 comment
#1454 - Usage of .item() in unpad_input()
Issue -
State: closed - Opened by qwertyforce 16 days ago
- 2 comments
#1453 - Main branch compilation on nvcc 12.6
Issue -
State: open - Opened by roded2 16 days ago
- 2 comments
#1452 - v2.7.3 build failed in NGC pytorch:24.12-py3
Issue -
State: open - Opened by xuchunmei000 16 days ago
- 4 comments
#1451 - FA3 consecutive failing tests after first failure
Issue -
State: open - Opened by benjamin-kroeger 17 days ago
#1450 - BUG? static_assert(!(!Mma1_is_RS && !IntraWGOverlap), "Mma1 must be RS if IntraWGOverlap is enabled");
Issue -
State: closed - Opened by ziyuhuang123 17 days ago
- 1 comment
#1449 - [QST] masking steps in flash decoding
Issue -
State: open - Opened by aws-jiadingg 20 days ago
- 1 comment
#1448 - Clarification on MMA0 Results Handling in the Latest Code
Issue -
State: open - Opened by ziyuhuang123 21 days ago
- 1 comment
#1447 - subprocess.CalledProcessError: Command '['path/to/cuda-11.7/bin/nvcc', '-V']' returned non-zero exit status 255
Issue -
State: open - Opened by ChosenOne-xx 22 days ago
#1446 - Support ROCM builds from source distribution, and improve error handling
Pull Request -
State: closed - Opened by mgorny 22 days ago
- 1 comment
#1445 - Is the output of FlashAttention completely identical to that of vanilla attention?
Issue -
State: closed - Opened by sunsmarterjie 23 days ago
- 1 comment
#1444 - Wheel names and version inconsitency.
Issue -
State: open - Opened by sfc-gh-mhazy 23 days ago
- 2 comments
#1443 - Setup failure in the latest build
Issue -
State: closed - Opened by complexfilter 24 days ago
- 2 comments
#1442 - Replace c10::optional with std::optional in flash_attn
Pull Request -
State: closed - Opened by houseroad 24 days ago
- 1 comment
#1441 - Error when importing dropout_layer_norm
Issue -
State: open - Opened by anfortas337 24 days ago
- 1 comment
#1440 - Running flash_attn/flash_attn_triton_amd/bench.py with sequence length > 4096 causes RuntimeError: Triton Error [CUDA]: an illegal memory access was encountered
Issue -
State: open - Opened by jiqimaoke 25 days ago
- 5 comments
#1440 - Running flash_attn/flash_attn_triton_amd/bench.py with sequence length > 4096 causes RuntimeError: Triton Error [CUDA]: an illegal memory access was encountered
Issue -
State: open - Opened by jiqimaoke 25 days ago
#1440 - Running flash_attn/flash_attn_triton_amd/bench.py with sequence length > 4096 causes RuntimeError: Triton Error [CUDA]: an illegal memory access was encountered
Issue -
State: open - Opened by jiqimaoke 25 days ago
- 3 comments
#1440 - Running flash_attn/flash_attn_triton_amd/bench.py with sequence length > 4096 causes RuntimeError: Triton Error [CUDA]: an illegal memory access was encountered
Issue -
State: open - Opened by jiqimaoke 25 days ago
#1439 - IncompatibleTypeErrorImpl('invalid operands of type pointer<int64> and triton.language.int32')
Issue -
State: open - Opened by wuyouliaoxi 26 days ago
#1438 - FA3 forward performance regression on H200
Issue -
State: open - Opened by complexfilter 27 days ago
- 3 comments
#1438 - FA3 forward performance regression on H200
Issue -
State: open - Opened by complexfilter 27 days ago
- 3 comments
#1438 - FA3 forward performance regression on H200
Issue -
State: open - Opened by complexfilter 27 days ago
- 7 comments
#1437 - Change version to 2.7.3
Pull Request -
State: closed - Opened by ksivaman 27 days ago
#1437 - Change version to 2.7.3
Pull Request -
State: closed - Opened by ksivaman 27 days ago
#1436 - Blackwell support
Pull Request -
State: closed - Opened by ksivaman 27 days ago
- 1 comment
#1436 - Blackwell support
Pull Request -
State: closed - Opened by ksivaman 27 days ago
- 1 comment
#1435 - FA3 does not work with torch.compile
Issue -
State: open - Opened by nighting0le01 27 days ago
#1434 - GFX1100
Issue -
State: closed - Opened by johnnynunez 27 days ago
#1434 - GFX1100
Issue -
State: closed - Opened by johnnynunez 27 days ago
#1433 - Expose `zero_tensors` arg in varlen functions
Pull Request -
State: closed - Opened by ksivaman 28 days ago
- 1 comment
#1433 - Expose `zero_tensors` arg in varlen functions
Pull Request -
State: closed - Opened by ksivaman 28 days ago
- 1 comment
#1433 - Expose `zero_tensors` arg in varlen functions
Pull Request -
State: closed - Opened by ksivaman 28 days ago
- 1 comment
#1432 - FA3 regression on H100 80GB?
Issue -
State: open - Opened by bastianhagedorn 28 days ago
- 8 comments
#1432 - FA3 regression on H100 80GB?
Issue -
State: open - Opened by bastianhagedorn 28 days ago
- 8 comments
#1432 - FA3 regression on H100 80GB?
Issue -
State: open - Opened by bastianhagedorn 28 days ago
- 8 comments
#1431 - [AMD ROCm] Support variable length of page attention
Pull Request -
State: closed - Opened by rocking5566 28 days ago
#1431 - [AMD ROCm] Support variable length of page attention
Pull Request -
State: closed - Opened by rocking5566 28 days ago
#1430 - Fix calls to `torch.is_grad_enabled()`
Pull Request -
State: closed - Opened by ksivaman 29 days ago
#1430 - Fix calls to `torch.is_grad_enabled()`
Pull Request -
State: closed - Opened by ksivaman 29 days ago
#1430 - Fix calls to `torch.is_grad_enabled()`
Pull Request -
State: closed - Opened by ksivaman 29 days ago
#1429 - [flash attn v2] Why V uses no-swizzle layout for registers?
Issue -
State: open - Opened by phantaurus 29 days ago
- 1 comment
#1429 - [flash attn v2] Why V uses no-swizzle layout for registers?
Issue -
State: open - Opened by phantaurus 29 days ago
- 1 comment
#1429 - [flash attn v2] Why V uses no-swizzle layout for registers?
Issue -
State: open - Opened by phantaurus 29 days ago
- 1 comment
#1429 - [flash attn v2] Why V uses no-swizzle layout for registers?
Issue -
State: open - Opened by phantaurus 29 days ago
- 1 comment
#1428 - version `GLIBCXX_3.4.29' not found
Issue -
State: open - Opened by zhanghanxing2022 29 days ago
#1427 - Generalize cuda version checks for A100 and above
Pull Request -
State: closed - Opened by ksivaman 30 days ago
#1427 - Generalize cuda version checks for A100 and above
Pull Request -
State: closed - Opened by ksivaman 30 days ago
#1426 - [Delete]
Issue -
State: closed - Opened by rebemika-amzn 30 days ago
#1426 - [Delete]
Issue -
State: closed - Opened by rebemika-amzn 30 days ago
#1426 - [Delete]
Issue -
State: closed - Opened by rebemika-amzn 30 days ago
#1425 - Remove unused 224 cu kernels
Pull Request -
State: closed - Opened by drisspg about 1 month ago
#1425 - Remove unused 224 cu kernels
Pull Request -
State: closed - Opened by drisspg about 1 month ago
#1425 - Remove unused 224 cu kernels
Pull Request -
State: closed - Opened by drisspg about 1 month ago
#1425 - Remove unused 224 cu kernels
Pull Request -
State: closed - Opened by drisspg about 1 month ago
#1424 - UnboundLocalError: cannot access local variable 'out' where it is not associated with a value
Issue -
State: closed - Opened by CicelyCafe about 1 month ago
- 1 comment
#1424 - UnboundLocalError: cannot access local variable 'out' where it is not associated with a value
Issue -
State: closed - Opened by CicelyCafe about 1 month ago
- 1 comment
#1424 - UnboundLocalError: cannot access local variable 'out' where it is not associated with a value
Issue -
State: closed - Opened by CicelyCafe about 1 month ago
- 1 comment
#1423 - ERROR: No matching distribution found for flash-attn==2.6.3+cu123torch2.4cxx11abifalse
Issue -
State: open - Opened by carolynsoo about 1 month ago
- 1 comment
#1422 - Unable to install flash_attn on H100 with CUDA 12.5
Issue -
State: open - Opened by ghadiaravi13 about 1 month ago
#1422 - Unable to install flash_attn on H100 with CUDA 12.5
Issue -
State: open - Opened by ghadiaravi13 about 1 month ago
#1421 - Unable to install `flash-attn` even if I first install `torch` alone
Issue -
State: closed - Opened by ytxmobile98 about 1 month ago
- 5 comments
#1420 - Is there a plan to support flash_attn_varlen_backward with fp8
Issue -
State: open - Opened by gaodaheng about 1 month ago
- 1 comment
#1419 - Add a macro for namespace
Pull Request -
State: closed - Opened by drisspg about 1 month ago
#1418 - Encounter some problems when building wheel
Issue -
State: open - Opened by ZarkPanda about 1 month ago
#1417 - `flash_attn_with_kvcache` discrepancy slicing kv_cache / cache_seqlens
Issue -
State: open - Opened by jeromeku about 1 month ago
#1416 - [CK_TILE] FAv3 bwd bugfix
Pull Request -
State: closed - Opened by poyenc about 1 month ago
#1415 - RuntimeError: Error compiling objects for extension
Issue -
State: open - Opened by ProgramerSalar about 1 month ago
- 2 comments
#1414 - looking for a test to verify cache correctness in `flash_attn_with_kvcache`
Issue -
State: open - Opened by chakpongchung about 1 month ago
- 2 comments
#1413 - Performance Impact of Using Three Warps per Group (WG) in FA3 Compared to Two WGs
Issue -
State: open - Opened by ziyuhuang123 about 1 month ago
- 1 comment
#1412 - UnboundLocalError: local variable 'out' referenced before assignment
Issue -
State: open - Opened by chuangzhidan about 1 month ago
- 3 comments
#1411 - Can't intall it
Issue -
State: open - Opened by TherrenceF about 1 month ago
- 1 comment
#1410 - Impact of Register Spills on FA3 Kernel Performance
Issue -
State: open - Opened by ziyuhuang123 about 1 month ago
- 1 comment
#1409 - FA 2.4.2 is falling unitest on A6000 and A5880
Issue -
State: open - Opened by BoxiangW about 1 month ago
- 5 comments
#1408 - Why Did FA3 Change SmemLayoutAtomO Definition in the New Version?
Issue -
State: closed - Opened by ziyuhuang123 about 2 months ago
#1407 - Why Does FA3 Use Registers Instead of Directly Accessing SMEM with WGMMA on SM90?
Issue -
State: open - Opened by ziyuhuang123 about 2 months ago
- 1 comment
#1406 - fix bug when is_grad is false
Pull Request -
State: closed - Opened by woaixiaoxiao about 2 months ago
#1405 - Add missing tests/__init__.py
Pull Request -
State: open - Opened by BioGeek about 2 months ago
#1404 - 4 Failing `test_flash_attn_output_fp8` tests on H100
Issue -
State: open - Opened by BioGeek about 2 months ago
- 3 comments
#1403 - Does bar.sync Emit Semaphores Alongside bar.arrive?
Issue -
State: closed - Opened by ziyuhuang123 about 2 months ago
- 1 comment
#1402 - is flash_attn_with_kvcache() supposed to work for seqlen > 1 ?
Issue -
State: closed - Opened by vince62s about 2 months ago
- 1 comment
#1401 - Understanding sync and arrive in FA3 Store Function
Issue -
State: open - Opened by ziyuhuang123 about 2 months ago
#1400 - Understanding the Role of arrive in NamedBarrier Synchronization
Issue -
State: open - Opened by ziyuhuang123 about 2 months ago
- 1 comment
#1399 - Fix incorrect torch dtype
Pull Request -
State: closed - Opened by kevmo314 about 2 months ago
#1398 - The execution order between GEMM0 of the next iteration and GEMM1 of the current iteration in Pingpong scheduling pipeline for overlapping gemms and softmax between warpgroups
Issue -
State: open - Opened by tengdecheng about 2 months ago