Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / Dao-AILab/flash-attention issues and pull requests

#1471 - Support 576 Head dim for MLA

Issue - State: open - Opened by sAviOr287 3 days ago

#1470 - Getting Error While Extracting

Issue - State: open - Opened by emirardagn 3 days ago

#1469 - [How-to]How to get Flash-Attention under windows 11 CUDA

Issue - State: open - Opened by mytait 7 days ago - 6 comments

#1468 - fa3: include bert_padding utilities

Pull Request - State: closed - Opened by tmm1 10 days ago - 1 comment

#1467 - FA3 package is missing padding utilities

Issue - State: open - Opened by tmm1 10 days ago

#1466 - What is `seqused_q` and `seqused_k`?

Issue - State: open - Opened by cassanof 10 days ago

#1465 - FA3 KV Cache is slower than FA2 KV Cache

Issue - State: open - Opened by DD-DuDa 11 days ago - 3 comments

#1464 - Add support for Cuda 12.8 and B200 GPUs

Issue - State: open - Opened by ofirkris 12 days ago

#1463 - Update Cuda Blackwell

Pull Request - State: open - Opened by johnnynunez 13 days ago

#1462 - fused dense lib warning

Issue - State: open - Opened by YuyueminAustin 13 days ago

#1461 - BUG? get the wrong value when logit_scale is 0

Issue - State: open - Opened by shunshen93 13 days ago - 1 comment

#1460 - [Build] Update version of setuptools used to generate core package

Pull Request - State: closed - Opened by tmm1 14 days ago

#1456 - dropout_layer_norm

Issue - State: closed - Opened by ADiko1997 16 days ago - 1 comment

#1454 - Usage of .item() in unpad_input()

Issue - State: closed - Opened by qwertyforce 16 days ago - 2 comments

#1453 - Main branch compilation on nvcc 12.6

Issue - State: open - Opened by roded2 16 days ago - 2 comments

#1452 - v2.7.3 build failed in NGC pytorch:24.12-py3

Issue - State: open - Opened by xuchunmei000 16 days ago - 4 comments

#1449 - [QST] masking steps in flash decoding

Issue - State: open - Opened by aws-jiadingg 20 days ago - 1 comment

#1448 - Clarification on MMA0 Results Handling in the Latest Code

Issue - State: open - Opened by ziyuhuang123 21 days ago - 1 comment

#1446 - Support ROCM builds from source distribution, and improve error handling

Pull Request - State: closed - Opened by mgorny 22 days ago - 1 comment

#1444 - Wheel names and version inconsitency.

Issue - State: open - Opened by sfc-gh-mhazy 23 days ago - 2 comments

#1443 - Setup failure in the latest build

Issue - State: closed - Opened by complexfilter 24 days ago - 2 comments

#1442 - Replace c10::optional with std::optional in flash_attn

Pull Request - State: closed - Opened by houseroad 24 days ago - 1 comment

#1441 - Error when importing dropout_layer_norm

Issue - State: open - Opened by anfortas337 24 days ago - 1 comment

#1438 - FA3 forward performance regression on H200

Issue - State: open - Opened by complexfilter 27 days ago - 3 comments

#1438 - FA3 forward performance regression on H200

Issue - State: open - Opened by complexfilter 27 days ago - 3 comments

#1438 - FA3 forward performance regression on H200

Issue - State: open - Opened by complexfilter 27 days ago - 7 comments

#1437 - Change version to 2.7.3

Pull Request - State: closed - Opened by ksivaman 27 days ago

#1437 - Change version to 2.7.3

Pull Request - State: closed - Opened by ksivaman 27 days ago

#1436 - Blackwell support

Pull Request - State: closed - Opened by ksivaman 27 days ago - 1 comment

#1436 - Blackwell support

Pull Request - State: closed - Opened by ksivaman 27 days ago - 1 comment

#1435 - FA3 does not work with torch.compile

Issue - State: open - Opened by nighting0le01 27 days ago

#1434 - GFX1100

Issue - State: closed - Opened by johnnynunez 27 days ago

#1434 - GFX1100

Issue - State: closed - Opened by johnnynunez 27 days ago

#1433 - Expose `zero_tensors` arg in varlen functions

Pull Request - State: closed - Opened by ksivaman 28 days ago - 1 comment

#1433 - Expose `zero_tensors` arg in varlen functions

Pull Request - State: closed - Opened by ksivaman 28 days ago - 1 comment

#1433 - Expose `zero_tensors` arg in varlen functions

Pull Request - State: closed - Opened by ksivaman 28 days ago - 1 comment

#1432 - FA3 regression on H100 80GB?

Issue - State: open - Opened by bastianhagedorn 28 days ago - 8 comments

#1432 - FA3 regression on H100 80GB?

Issue - State: open - Opened by bastianhagedorn 28 days ago - 8 comments

#1432 - FA3 regression on H100 80GB?

Issue - State: open - Opened by bastianhagedorn 28 days ago - 8 comments

#1431 - [AMD ROCm] Support variable length of page attention

Pull Request - State: closed - Opened by rocking5566 28 days ago

#1431 - [AMD ROCm] Support variable length of page attention

Pull Request - State: closed - Opened by rocking5566 28 days ago

#1430 - Fix calls to `torch.is_grad_enabled()`

Pull Request - State: closed - Opened by ksivaman 29 days ago

#1430 - Fix calls to `torch.is_grad_enabled()`

Pull Request - State: closed - Opened by ksivaman 29 days ago

#1430 - Fix calls to `torch.is_grad_enabled()`

Pull Request - State: closed - Opened by ksivaman 29 days ago

#1429 - [flash attn v2] Why V uses no-swizzle layout for registers?

Issue - State: open - Opened by phantaurus 29 days ago - 1 comment

#1429 - [flash attn v2] Why V uses no-swizzle layout for registers?

Issue - State: open - Opened by phantaurus 29 days ago - 1 comment

#1429 - [flash attn v2] Why V uses no-swizzle layout for registers?

Issue - State: open - Opened by phantaurus 29 days ago - 1 comment

#1429 - [flash attn v2] Why V uses no-swizzle layout for registers?

Issue - State: open - Opened by phantaurus 29 days ago - 1 comment

#1428 - version `GLIBCXX_3.4.29' not found

Issue - State: open - Opened by zhanghanxing2022 29 days ago

#1427 - Generalize cuda version checks for A100 and above

Pull Request - State: closed - Opened by ksivaman 30 days ago

#1427 - Generalize cuda version checks for A100 and above

Pull Request - State: closed - Opened by ksivaman 30 days ago

#1426 - [Delete]

Issue - State: closed - Opened by rebemika-amzn 30 days ago

#1426 - [Delete]

Issue - State: closed - Opened by rebemika-amzn 30 days ago

#1426 - [Delete]

Issue - State: closed - Opened by rebemika-amzn 30 days ago

#1425 - Remove unused 224 cu kernels

Pull Request - State: closed - Opened by drisspg about 1 month ago

#1425 - Remove unused 224 cu kernels

Pull Request - State: closed - Opened by drisspg about 1 month ago

#1425 - Remove unused 224 cu kernels

Pull Request - State: closed - Opened by drisspg about 1 month ago

#1425 - Remove unused 224 cu kernels

Pull Request - State: closed - Opened by drisspg about 1 month ago

#1422 - Unable to install flash_attn on H100 with CUDA 12.5

Issue - State: open - Opened by ghadiaravi13 about 1 month ago

#1422 - Unable to install flash_attn on H100 with CUDA 12.5

Issue - State: open - Opened by ghadiaravi13 about 1 month ago

#1421 - Unable to install `flash-attn` even if I first install `torch` alone

Issue - State: closed - Opened by ytxmobile98 about 1 month ago - 5 comments

#1420 - Is there a plan to support flash_attn_varlen_backward with fp8

Issue - State: open - Opened by gaodaheng about 1 month ago - 1 comment

#1419 - Add a macro for namespace

Pull Request - State: closed - Opened by drisspg about 1 month ago

#1418 - Encounter some problems when building wheel

Issue - State: open - Opened by ZarkPanda about 1 month ago

#1416 - [CK_TILE] FAv3 bwd bugfix

Pull Request - State: closed - Opened by poyenc about 1 month ago

#1415 - RuntimeError: Error compiling objects for extension

Issue - State: open - Opened by ProgramerSalar about 1 month ago - 2 comments

#1412 - UnboundLocalError: local variable 'out' referenced before assignment

Issue - State: open - Opened by chuangzhidan about 1 month ago - 3 comments

#1411 - Can't intall it

Issue - State: open - Opened by TherrenceF about 1 month ago - 1 comment

#1410 - Impact of Register Spills on FA3 Kernel Performance

Issue - State: open - Opened by ziyuhuang123 about 1 month ago - 1 comment

#1409 - FA 2.4.2 is falling unitest on A6000 and A5880

Issue - State: open - Opened by BoxiangW about 1 month ago - 5 comments

#1406 - fix bug when is_grad is false

Pull Request - State: closed - Opened by woaixiaoxiao about 2 months ago

#1405 - Add missing tests/__init__.py

Pull Request - State: open - Opened by BioGeek about 2 months ago

#1404 - 4 Failing `test_flash_attn_output_fp8` tests on H100

Issue - State: open - Opened by BioGeek about 2 months ago - 3 comments

#1403 - Does bar.sync Emit Semaphores Alongside bar.arrive?

Issue - State: closed - Opened by ziyuhuang123 about 2 months ago - 1 comment

#1402 - is flash_attn_with_kvcache() supposed to work for seqlen > 1 ?

Issue - State: closed - Opened by vince62s about 2 months ago - 1 comment

#1401 - Understanding sync and arrive in FA3 Store Function

Issue - State: open - Opened by ziyuhuang123 about 2 months ago

#1400 - Understanding the Role of arrive in NamedBarrier Synchronization

Issue - State: open - Opened by ziyuhuang123 about 2 months ago - 1 comment

#1399 - Fix incorrect torch dtype

Pull Request - State: closed - Opened by kevmo314 about 2 months ago