Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / Dao-AILab/flash-attention issues and pull requests
#1224 - [Question]Computation and register/shared memory wasted during decoding phase?
Issue -
State: open - Opened by sleepwalker2017 5 months ago
#1223 - You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
Issue -
State: open - Opened by AraratSaribekyan 5 months ago
- 1 comment
#1222 - Plan to support V100
Issue -
State: closed - Opened by hiker-lw 5 months ago
- 2 comments
#1221 - Which file is the source code of flash_attn_varlen_qkvpacked_func?
Issue -
State: open - Opened by scuizhibin 5 months ago
#1220 - Unable to install flash attention in docker
Issue -
State: open - Opened by shivance 5 months ago
#1219 - Additive Bias in Flash Attention
Issue -
State: open - Opened by kkh517 5 months ago
#1218 - Support sliding window attention in FA3
Issue -
State: open - Opened by lin-ht 5 months ago
- 3 comments
#1217 - Is the combination of var-len, paged KV and split KV supported?
Issue -
State: open - Opened by masahi 5 months ago
#1216 - export onnx issue
Issue -
State: open - Opened by scuizhibin 5 months ago
#1215 - Can't compile from source on ROCm 6.1.3 wtih gfx1100... error : "static assertion failed" 2.6.3
Issue -
State: open - Opened by nktice 5 months ago
- 9 comments
#1214 - [FA3][Varlen] bug for head_dim not in [64, 128, 256] for varlen
Issue -
State: open - Opened by YLGH 5 months ago
- 1 comment
#1213 - [FP8][FA3] Is there a plan to support _flash_attn_varlen_forward with fp8
Issue -
State: open - Opened by baoleai 5 months ago
#1212 - Question about FA3 supporting (256, 256)
Issue -
State: open - Opened by YTianZHU 5 months ago
#1211 - Failed to build installable wheels for some pyproject.toml based projects (flash-attn)
Issue -
State: open - Opened by danielchang1985 5 months ago
- 2 comments
#1210 - Add q, k, v descales to FA3 interface
Pull Request -
State: closed - Opened by cyanguwa 5 months ago
#1209 - [rfc][torch.compile] Make custom kernels torch.compile compatible
Pull Request -
State: closed - Opened by anijain2305 5 months ago
- 1 comment
#1208 - CUDA error (flash-attention/hopper/flash_fwd_launch_template.h:111): invalid argument
Issue -
State: open - Opened by saurabh-kataria 5 months ago
- 1 comment
#1207 - Pipelining GmemCopy on kHeadDim
Issue -
State: open - Opened by phantaurus 5 months ago
- 6 comments
#1206 - feat: change minimal supported CUDA version to 11.7
Pull Request -
State: closed - Opened by jue-jue-zi 5 months ago
#1205 - Increase TensorCore Active % for Flash Attention Kernels
Issue -
State: closed - Opened by phantaurus 5 months ago
- 6 comments
#1204 - TiledMMA scales KNWarps times on the M dimension
Issue -
State: closed - Opened by phantaurus 5 months ago
- 2 comments
#1203 - [AMD] Triton Backend for ROCm
Pull Request -
State: closed - Opened by micmelesse 5 months ago
- 6 comments
#1202 - Abnormal execution time / Mismatch of FLOPs obtained from Nsys / Ncu
Issue -
State: closed - Opened by phantaurus 5 months ago
- 3 comments
#1201 - [Question] Compatibility and Support for NVIDIA RTX™ 6000 Ada Generation GPU
Issue -
State: open - Opened by surajpatil4899 5 months ago
- 1 comment
#1200 - [Question] Why multiply number of SMs by 2 in num_splits_heuristic?
Issue -
State: open - Opened by WanchaoYao 5 months ago
#1199 - Is bf16 datatype available for FA3?
Issue -
State: closed - Opened by YTianZHU 5 months ago
- 1 comment
#1198 - Support page kvcache in AMD ROCm
Pull Request -
State: closed - Opened by rocking5566 5 months ago
- 2 comments
#1197 - Add local attention in Hopper FAv3
Pull Request -
State: closed - Opened by ipiszy 5 months ago
- 1 comment
#1196 - [Question]Does training and inference use the same quantization method in FA3?
Issue -
State: open - Opened by moses3017 5 months ago
- 2 comments
#1195 - Bug in RotaryEmbed Kernel
Issue -
State: open - Opened by tianyan01 5 months ago
#1193 - install flash-attn error windows 11
Issue -
State: open - Opened by AbsoluteMode 5 months ago
- 5 comments
#1192 - Fix a wrong reference to seqlen_k variable in the varlen kernel
Pull Request -
State: closed - Opened by cakeng 5 months ago
#1191 - Does GPU experience load imbalance when dealing with different KVcache length decoding query?
Issue -
State: open - Opened by eljrte 5 months ago
- 2 comments
#1190 - Is flashattention replace of multiheadattention support in NVIDIA DRIVE ORIN?
Issue -
State: open - Opened by wutheringcoo 5 months ago
- 1 comment
#1189 - Sync compile flag with Ck tile for rocm6.2
Pull Request -
State: closed - Opened by rocking5566 5 months ago
#1188 - flashattnvarlen support tree attention
Pull Request -
State: open - Opened by efsotr 5 months ago
- 4 comments
#1187 - Hi, will it have a XPU version impl?
Issue -
State: closed - Opened by yasha1255 5 months ago
- 1 comment
#1186 - How does flash-attention support transfusion attention mask?
Issue -
State: open - Opened by YuzaChongyi 5 months ago
#1185 - FA3 runtimeError: q must be on CUDA
Issue -
State: closed - Opened by GMALP 5 months ago
- 1 comment
#1184 - flash-attn-with-kvcache has performance issue for torch 2.5.0
Issue -
State: open - Opened by jianc99 5 months ago
- 1 comment
#1183 - A question about better transformer, flash attention , Nvidia TensorRT
Issue -
State: closed - Opened by bzr1 5 months ago
- 1 comment
#1182 - Add seqused_q in fwd / bwd and seqused_k in bwd in hopper FA.
Pull Request -
State: closed - Opened by ipiszy 5 months ago
#1177 - [Feature] FA2 support for attention mask(shape: (seq_len, seq_len))
Issue -
State: closed - Opened by efsotr 5 months ago
- 3 comments
#1176 - What's the expected way to take advantage of FA3 block quantization?
Issue -
State: closed - Opened by goldhuang 5 months ago
#1174 - How can I ues FA3 in nemo/megateon? How to change the interface in megatron?
Issue -
State: closed - Opened by Desperadoze 6 months ago
- 1 comment
#1169 - FP8 for flash attention 3 and possible concerns
Issue -
State: open - Opened by TheTinyTeddy 6 months ago
- 8 comments
#1166 - Add support for qk hidden dim different from v hidden dim
Pull Request -
State: open - Opened by smallscientist1 6 months ago
- 5 comments
#1158 - FAILED: /data/flash-attention/hopper/build/temp.linux-x86_64-cpython-310/flash_fwd_hdim64_bf16_sm90.o
Issue -
State: open - Opened by ArtificialZeng 6 months ago
- 2 comments
#1156 - google/gemma-2-2b
Issue -
State: closed - Opened by mhillebrand 6 months ago
- 4 comments
#1151 - Installation is hanged when building wheel, cannot completed. No errors pop up.
Issue -
State: open - Opened by Ngoson2004 6 months ago
- 3 comments
#1146 - CUDA Error: no kernel image is available for execution on the device
Issue -
State: closed - Opened by qiuqiu10 6 months ago
- 5 comments
#1142 - How can I install with cuda12.1?
Issue -
State: open - Opened by tian969 6 months ago
- 2 comments
#1139 - Add custom ops for compatibility with PT Compile
Pull Request -
State: closed - Opened by ani300 6 months ago
- 19 comments
#1138 - Is FA3 less accurate than FA2 in bf16 computation?
Issue -
State: closed - Opened by complexfilter 6 months ago
- 4 comments
#1138 - Is FA3 less accurate than FA2 in bf16 computation?
Issue -
State: closed - Opened by complexfilter 6 months ago
- 4 comments
#1137 - How to obtain differentiable softmax_lse
Issue -
State: open - Opened by albert-cwkuo 6 months ago
- 8 comments
#1136 - FA3 unit test fails
Issue -
State: closed - Opened by zhipeng93 6 months ago
- 2 comments
#1134 - block scaling support not found
Issue -
State: open - Opened by complexfilter 6 months ago
- 4 comments
#1128 - Flash attn 3 has large numerical mismatches with torch spda
Issue -
State: open - Opened by Fuzzkatt 6 months ago
- 8 comments
#1125 - FA2' flash_attn_varlen_func is 300x slower than flash_attn_func
Issue -
State: open - Opened by ex3ndr 6 months ago
- 6 comments
#1125 - FA2' flash_attn_varlen_func is 300x slower than flash_attn_func
Issue -
State: open - Opened by ex3ndr 6 months ago
- 6 comments
#1122 - Install flash-attn 2 with cuda 12 : flash-attn is looking for cuda 11
Issue -
State: open - Opened by YerongLi 6 months ago
- 7 comments
#1121 - Flash Attention 3 fp8 support 4090?
Issue -
State: open - Opened by huanpengchu 6 months ago
- 2 comments
#1112 - Add how to import FA3 to documentation.
Pull Request -
State: closed - Opened by AdamLouly 6 months ago
- 1 comment
#1107 - [QST] flash_attn2: why tOrVt is no swizzle ?
Issue -
State: open - Opened by itsliupeng 6 months ago
- 3 comments
#1106 - [QST] How flash-attn calc the dropout?
Issue -
State: closed - Opened by zhang22222 6 months ago
- 3 comments
#1105 - gfx1100 installation fails due to `fatal error: 'fmha_bwd.hpp' file not found`
Issue -
State: open - Opened by ZhenyaPav 6 months ago
- 9 comments
#1094 - There is no cu123 but cu124 for PyTorch
Issue -
State: open - Opened by nasyxx 7 months ago
- 6 comments
#1075 - Changes For FP8
Pull Request -
State: closed - Opened by ganeshcolfax 7 months ago
#1075 - Changes For FP8
Pull Request -
State: closed - Opened by ganeshcolfax 7 months ago
#1075 - Changes For FP8
Pull Request -
State: closed - Opened by ganeshcolfax 7 months ago
#1072 - Add var-seq-len to FA3 fp16 / bf16 fwd
Pull Request -
State: closed - Opened by ipiszy 7 months ago
- 1 comment
#1061 - /envs/Qwen/lib/python3.11/site-packages/flash_attn/flash_attn_interface.py", line 10, in <module> import flash_attn_2_cuda as flash_attn_cuda ImportError: /home/apus/mambaforge/envs/Qwen/lib/python3.11/site-packages/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda9SetDeviceEi
Issue -
State: open - Opened by ArtificialZeng 7 months ago
- 6 comments
#1048 - Compatibility of Flash Attention 3 FP8 Feature with L40 and A100 GPUs
Issue -
State: open - Opened by feifeibear 7 months ago
- 7 comments
#1043 - High memory requirements when compiling
Issue -
State: open - Opened by haampie 7 months ago
- 5 comments
#1043 - High memory requirements when compiling
Issue -
State: open - Opened by haampie 7 months ago
- 5 comments
#1043 - High memory requirements when compiling
Issue -
State: open - Opened by haampie 7 months ago
- 5 comments
#1043 - High memory requirements when compiling
Issue -
State: open - Opened by haampie 7 months ago
- 5 comments
#1039 - Why do the output results of flash attention and muti head attention differ significantly under the same parameters
Issue -
State: open - Opened by Dominic23331 7 months ago
- 2 comments
#1038 - Build flash-attn takes a lot of time
Issue -
State: open - Opened by Sayli2000 7 months ago
- 16 comments
#1036 - Windows actions
Pull Request -
State: open - Opened by bdashore3 7 months ago
- 3 comments
#1035 - How to debug?
Issue -
State: closed - Opened by Achazwl 7 months ago
- 2 comments
#1028 - Failed to build flash-attn
Issue -
State: open - Opened by xiaoyerrr 7 months ago
- 2 comments
#1026 - Could not build wheels for flash-attn
Issue -
State: open - Opened by FiReTiTi 7 months ago
- 6 comments
#1017 - build failure
Issue -
State: open - Opened by alxmke 7 months ago
- 9 comments
#1009 - Availability of wheel
Issue -
State: open - Opened by nikonikolov 8 months ago
- 2 comments
#1009 - Availability of wheel
Issue -
State: open - Opened by nikonikolov 8 months ago
- 2 comments
#1007 - Unable to build wheel of flash_attn
Issue -
State: open - Opened by Zer0TheObserver 8 months ago
- 2 comments
#1004 - flash attention is broken for cuda-12.x version
Issue -
State: open - Opened by Bhagyashreet20 8 months ago
- 3 comments
#1004 - flash attention is broken for cuda-12.x version
Issue -
State: open - Opened by Bhagyashreet20 8 months ago
- 3 comments
#992 - ImportError: libtorch_cuda_cpp.so: cannot open shared object file: No such file or directory
Issue -
State: open - Opened by jxxtin 8 months ago
- 2 comments
#991 - Error in Algorithm 1 of Flash Attention 2 paper
Issue -
State: open - Opened by mbchang 8 months ago
- 2 comments
#986 - 谁成功在jetson上使用了 flash_attn
Issue -
State: open - Opened by cthulhu-tww 8 months ago
- 10 comments
#982 - Error Installing FlashAttention on Windows 11 with CUDA 11.8 - "CUDA_HOME environment variable is not set"
Issue -
State: open - Opened by Mr-Natural 8 months ago
- 19 comments
#980 - [Draft] support qk head_dim different from vo head_dim
Pull Request -
State: open - Opened by defei-coder 8 months ago
- 2 comments
#978 - Fix +/-inf in LSE returned by forward
Pull Request -
State: open - Opened by sgrigory 8 months ago
- 3 comments
#977 - Apple Silicon Support
Issue -
State: open - Opened by chigkim 8 months ago
- 2 comments
#975 - ImportError: /home/linjl/anaconda3/envs/sd/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
Issue -
State: open - Opened by zzc0208 8 months ago
- 18 comments
#969 - how to install flash_attn in torch==2.1.0
Issue -
State: open - Opened by foreverpiano 8 months ago
- 5 comments
#966 - ImportError: flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda9SetDeviceEi
Issue -
State: open - Opened by foreverpiano 9 months ago
- 15 comments