Dao-AILab/flash-attention issues and pull requests

#1224 - [Question]Computation and register/shared memory wasted during decoding phase?

Issue - State: open - Opened by sleepwalker2017 5 months ago

#1223 - You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour

Issue - State: open - Opened by AraratSaribekyan 5 months ago - 1 comment

#1222 - Plan to support V100

Issue - State: closed - Opened by hiker-lw 5 months ago - 2 comments

#1221 - Which file is the source code of flash_attn_varlen_qkvpacked_func?

Issue - State: open - Opened by scuizhibin 5 months ago

#1220 - Unable to install flash attention in docker

Issue - State: open - Opened by shivance 5 months ago

#1219 - Additive Bias in Flash Attention

Issue - State: open - Opened by kkh517 5 months ago

#1218 - Support sliding window attention in FA3

Issue - State: open - Opened by lin-ht 5 months ago - 3 comments

#1217 - Is the combination of var-len, paged KV and split KV supported?

Issue - State: open - Opened by masahi 5 months ago

#1216 - export onnx issue

Issue - State: open - Opened by scuizhibin 5 months ago

#1215 - Can't compile from source on ROCm 6.1.3 wtih gfx1100... error : "static assertion failed" 2.6.3

Issue - State: open - Opened by nktice 5 months ago - 9 comments

#1214 - [FA3][Varlen] bug for head_dim not in [64, 128, 256] for varlen

Issue - State: open - Opened by YLGH 5 months ago - 1 comment

#1213 - [FP8][FA3] Is there a plan to support _flash_attn_varlen_forward with fp8

Issue - State: open - Opened by baoleai 5 months ago

#1212 - Question about FA3 supporting (256, 256)

Issue - State: open - Opened by YTianZHU 5 months ago

#1211 - Failed to build installable wheels for some pyproject.toml based projects (flash-attn)

Issue - State: open - Opened by danielchang1985 5 months ago - 2 comments

#1210 - Add q, k, v descales to FA3 interface

Pull Request - State: closed - Opened by cyanguwa 5 months ago

#1209 - [rfc][torch.compile] Make custom kernels torch.compile compatible

Pull Request - State: closed - Opened by anijain2305 5 months ago - 1 comment

#1208 - CUDA error (flash-attention/hopper/flash_fwd_launch_template.h:111): invalid argument

Issue - State: open - Opened by saurabh-kataria 5 months ago - 1 comment

#1207 - Pipelining GmemCopy on kHeadDim

Issue - State: open - Opened by phantaurus 5 months ago - 6 comments

#1206 - feat: change minimal supported CUDA version to 11.7

Pull Request - State: closed - Opened by jue-jue-zi 5 months ago

#1205 - Increase TensorCore Active % for Flash Attention Kernels

Issue - State: closed - Opened by phantaurus 5 months ago - 6 comments

#1204 - TiledMMA scales KNWarps times on the M dimension

Issue - State: closed - Opened by phantaurus 5 months ago - 2 comments

#1203 - [AMD] Triton Backend for ROCm

Pull Request - State: closed - Opened by micmelesse 5 months ago - 6 comments

#1202 - Abnormal execution time / Mismatch of FLOPs obtained from Nsys / Ncu

Issue - State: closed - Opened by phantaurus 5 months ago - 3 comments

#1201 - [Question] Compatibility and Support for NVIDIA RTX™ 6000 Ada Generation GPU

Issue - State: open - Opened by surajpatil4899 5 months ago - 1 comment

#1200 - [Question] Why multiply number of SMs by 2 in num_splits_heuristic?

Issue - State: open - Opened by WanchaoYao 5 months ago

#1199 - Is bf16 datatype available for FA3?

Issue - State: closed - Opened by YTianZHU 5 months ago - 1 comment

#1198 - Support page kvcache in AMD ROCm

Pull Request - State: closed - Opened by rocking5566 5 months ago - 2 comments

#1197 - Add local attention in Hopper FAv3

Pull Request - State: closed - Opened by ipiszy 5 months ago - 1 comment

#1196 - [Question]Does training and inference use the same quantization method in FA3?

Issue - State: open - Opened by moses3017 5 months ago - 2 comments

#1195 - Bug in RotaryEmbed Kernel

Issue - State: open - Opened by tianyan01 5 months ago

#1193 - install flash-attn error windows 11

Issue - State: open - Opened by AbsoluteMode 5 months ago - 5 comments

#1192 - Fix a wrong reference to seqlen_k variable in the varlen kernel

Pull Request - State: closed - Opened by cakeng 5 months ago

#1191 - Does GPU experience load imbalance when dealing with different KVcache length decoding query?

Issue - State: open - Opened by eljrte 5 months ago - 2 comments

#1190 - Is flashattention replace of multiheadattention support in NVIDIA DRIVE ORIN?

Issue - State: open - Opened by wutheringcoo 5 months ago - 1 comment

#1189 - Sync compile flag with Ck tile for rocm6.2

Pull Request - State: closed - Opened by rocking5566 5 months ago

#1188 - flashattnvarlen support tree attention

Pull Request - State: open - Opened by efsotr 5 months ago - 4 comments

#1187 - Hi, will it have a XPU version impl?

Issue - State: closed - Opened by yasha1255 5 months ago - 1 comment

#1186 - How does flash-attention support transfusion attention mask?

Issue - State: open - Opened by YuzaChongyi 5 months ago

#1185 - FA3 runtimeError: q must be on CUDA

Issue - State: closed - Opened by GMALP 5 months ago - 1 comment

#1184 - flash-attn-with-kvcache has performance issue for torch 2.5.0

Issue - State: open - Opened by jianc99 5 months ago - 1 comment

#1183 - A question about better transformer, flash attention , Nvidia TensorRT

Issue - State: closed - Opened by bzr1 5 months ago - 1 comment

#1182 - Add seqused_q in fwd / bwd and seqused_k in bwd in hopper FA.

Pull Request - State: closed - Opened by ipiszy 5 months ago

#1177 - [Feature] FA2 support for attention mask(shape: (seq_len, seq_len))

Issue - State: closed - Opened by efsotr 5 months ago - 3 comments

#1176 - What's the expected way to take advantage of FA3 block quantization?

Issue - State: closed - Opened by goldhuang 5 months ago

#1174 - How can I ues FA3 in nemo/megateon? How to change the interface in megatron?

Issue - State: closed - Opened by Desperadoze 6 months ago - 1 comment

#1169 - FP8 for flash attention 3 and possible concerns

Issue - State: open - Opened by TheTinyTeddy 6 months ago - 8 comments

#1166 - Add support for qk hidden dim different from v hidden dim

Pull Request - State: open - Opened by smallscientist1 6 months ago - 5 comments

#1158 - FAILED: /data/flash-attention/hopper/build/temp.linux-x86_64-cpython-310/flash_fwd_hdim64_bf16_sm90.o

Issue - State: open - Opened by ArtificialZeng 6 months ago - 2 comments

#1156 - google/gemma-2-2b

Issue - State: closed - Opened by mhillebrand 6 months ago - 4 comments

#1151 - Installation is hanged when building wheel, cannot completed. No errors pop up.

Issue - State: open - Opened by Ngoson2004 6 months ago - 3 comments

#1146 - CUDA Error： no kernel image is available for execution on the device

Issue - State: closed - Opened by qiuqiu10 6 months ago - 5 comments

#1142 - How can I install with cuda12.1?

Issue - State: open - Opened by tian969 6 months ago - 2 comments

#1139 - Add custom ops for compatibility with PT Compile

Pull Request - State: closed - Opened by ani300 6 months ago - 19 comments

#1138 - Is FA3 less accurate than FA2 in bf16 computation?

Issue - State: closed - Opened by complexfilter 6 months ago - 4 comments

#1138 - Is FA3 less accurate than FA2 in bf16 computation?

Issue - State: closed - Opened by complexfilter 6 months ago - 4 comments

#1137 - How to obtain differentiable softmax_lse

Issue - State: open - Opened by albert-cwkuo 6 months ago - 8 comments

#1136 - FA3 unit test fails

Issue - State: closed - Opened by zhipeng93 6 months ago - 2 comments

#1134 - block scaling support not found

Issue - State: open - Opened by complexfilter 6 months ago - 4 comments

#1128 - Flash attn 3 has large numerical mismatches with torch spda

Issue - State: open - Opened by Fuzzkatt 6 months ago - 8 comments

#1125 - FA2' flash_attn_varlen_func is 300x slower than flash_attn_func

Issue - State: open - Opened by ex3ndr 6 months ago - 6 comments

#1125 - FA2' flash_attn_varlen_func is 300x slower than flash_attn_func

Issue - State: open - Opened by ex3ndr 6 months ago - 6 comments

#1122 - Install flash-attn 2 with cuda 12 : flash-attn is looking for cuda 11

Issue - State: open - Opened by YerongLi 6 months ago - 7 comments

#1121 - Flash Attention 3 fp8 support 4090?

Issue - State: open - Opened by huanpengchu 6 months ago - 2 comments

#1112 - Add how to import FA3 to documentation.

Pull Request - State: closed - Opened by AdamLouly 6 months ago - 1 comment

#1107 - [QST] flash_attn2: why tOrVt is no swizzle ?

Issue - State: open - Opened by itsliupeng 6 months ago - 3 comments

#1106 - [QST] How flash-attn calc the dropout？

Issue - State: closed - Opened by zhang22222 6 months ago - 3 comments

#1105 - gfx1100 installation fails due to `fatal error: 'fmha_bwd.hpp' file not found`

Issue - State: open - Opened by ZhenyaPav 6 months ago - 9 comments

#1094 - There is no cu123 but cu124 for PyTorch

Issue - State: open - Opened by nasyxx 7 months ago - 6 comments

#1075 - Changes For FP8

Pull Request - State: closed - Opened by ganeshcolfax 7 months ago

#1075 - Changes For FP8

Pull Request - State: closed - Opened by ganeshcolfax 7 months ago

#1075 - Changes For FP8

Pull Request - State: closed - Opened by ganeshcolfax 7 months ago

#1072 - Add var-seq-len to FA3 fp16 / bf16 fwd

Pull Request - State: closed - Opened by ipiszy 7 months ago - 1 comment

#1061 - /envs/Qwen/lib/python3.11/site-packages/flash_attn/flash_attn_interface.py", line 10, in <module> import flash_attn_2_cuda as flash_attn_cuda ImportError: /home/apus/mambaforge/envs/Qwen/lib/python3.11/site-packages/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda9SetDeviceEi

Issue - State: open - Opened by ArtificialZeng 7 months ago - 6 comments

#1048 - Compatibility of Flash Attention 3 FP8 Feature with L40 and A100 GPUs

Issue - State: open - Opened by feifeibear 7 months ago - 7 comments

#1043 - High memory requirements when compiling

Issue - State: open - Opened by haampie 7 months ago - 5 comments

#1043 - High memory requirements when compiling

Issue - State: open - Opened by haampie 7 months ago - 5 comments

#1043 - High memory requirements when compiling

Issue - State: open - Opened by haampie 7 months ago - 5 comments

#1043 - High memory requirements when compiling

Issue - State: open - Opened by haampie 7 months ago - 5 comments

#1039 - Why do the output results of flash attention and muti head attention differ significantly under the same parameters

Issue - State: open - Opened by Dominic23331 7 months ago - 2 comments

#1038 - Build flash-attn takes a lot of time

Issue - State: open - Opened by Sayli2000 7 months ago - 16 comments

#1036 - Windows actions

Pull Request - State: open - Opened by bdashore3 7 months ago - 3 comments

#1035 - How to debug?

Issue - State: closed - Opened by Achazwl 7 months ago - 2 comments

#1028 - Failed to build flash-attn

Issue - State: open - Opened by xiaoyerrr 7 months ago - 2 comments

#1026 - Could not build wheels for flash-attn

Issue - State: open - Opened by FiReTiTi 7 months ago - 6 comments

#1017 - build failure

Issue - State: open - Opened by alxmke 7 months ago - 9 comments

#1009 - Availability of wheel

Issue - State: open - Opened by nikonikolov 8 months ago - 2 comments

#1009 - Availability of wheel

Issue - State: open - Opened by nikonikolov 8 months ago - 2 comments

#1007 - Unable to build wheel of flash_attn

Issue - State: open - Opened by Zer0TheObserver 8 months ago - 2 comments

#1004 - flash attention is broken for cuda-12.x version

Issue - State: open - Opened by Bhagyashreet20 8 months ago - 3 comments

#1004 - flash attention is broken for cuda-12.x version

Issue - State: open - Opened by Bhagyashreet20 8 months ago - 3 comments

#992 - ImportError: libtorch_cuda_cpp.so: cannot open shared object file: No such file or directory

Issue - State: open - Opened by jxxtin 8 months ago - 2 comments

#991 - Error in Algorithm 1 of Flash Attention 2 paper

Issue - State: open - Opened by mbchang 8 months ago - 2 comments

#986 - 谁成功在jetson上使用了 flash_attn

Issue - State: open - Opened by cthulhu-tww 8 months ago - 10 comments

#982 - Error Installing FlashAttention on Windows 11 with CUDA 11.8 - "CUDA_HOME environment variable is not set"

Issue - State: open - Opened by Mr-Natural 8 months ago - 19 comments

#980 - [Draft] support qk head_dim different from vo head_dim

Pull Request - State: open - Opened by defei-coder 8 months ago - 2 comments

#978 - Fix +/-inf in LSE returned by forward

Pull Request - State: open - Opened by sgrigory 8 months ago - 3 comments

#977 - Apple Silicon Support

Issue - State: open - Opened by chigkim 8 months ago - 2 comments

#975 - ImportError: /home/linjl/anaconda3/envs/sd/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

Issue - State: open - Opened by zzc0208 8 months ago - 18 comments

#969 - how to install flash_attn in torch==2.1.0

Issue - State: open - Opened by foreverpiano 8 months ago - 5 comments

#966 - ImportError: flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda9SetDeviceEi

Issue - State: open - Opened by foreverpiano 9 months ago - 15 comments

GitHub / Dao-AILab/flash-attention issues and pull requests