Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / Dao-AILab/flash-attention issues and pull requests
#1420 - Is there a plan to support flash_attn_varlen_backward with fp8
Issue -
State: open - Opened by gaodaheng about 1 month ago
- 1 comment
#1419 - Add a macro for namespace
Pull Request -
State: closed - Opened by drisspg about 1 month ago
#1418 - Encounter some problems when building wheel
Issue -
State: open - Opened by ZarkPanda about 1 month ago
#1417 - `flash_attn_with_kvcache` discrepancy slicing kv_cache / cache_seqlens
Issue -
State: open - Opened by jeromeku about 1 month ago
#1416 - [CK_TILE] FAv3 bwd bugfix
Pull Request -
State: closed - Opened by poyenc about 2 months ago
#1415 - RuntimeError: Error compiling objects for extension
Issue -
State: open - Opened by ProgramerSalar about 2 months ago
- 2 comments
#1414 - looking for a test to verify cache correctness in `flash_attn_with_kvcache`
Issue -
State: open - Opened by chakpongchung about 2 months ago
- 2 comments
#1413 - Performance Impact of Using Three Warps per Group (WG) in FA3 Compared to Two WGs
Issue -
State: closed - Opened by ziyuhuang123 about 2 months ago
- 1 comment
#1412 - UnboundLocalError: local variable 'out' referenced before assignment
Issue -
State: closed - Opened by chuangzhidan about 2 months ago
- 6 comments
#1411 - Can't intall it
Issue -
State: open - Opened by TherrenceF about 2 months ago
- 1 comment
#1410 - Impact of Register Spills on FA3 Kernel Performance
Issue -
State: closed - Opened by ziyuhuang123 about 2 months ago
- 1 comment
#1409 - FA 2.4.2 is falling unitest on A6000 and A5880
Issue -
State: open - Opened by BoxiangW about 2 months ago
- 5 comments
#1408 - Why Did FA3 Change SmemLayoutAtomO Definition in the New Version?
Issue -
State: closed - Opened by ziyuhuang123 about 2 months ago
#1407 - Why Does FA3 Use Registers Instead of Directly Accessing SMEM with WGMMA on SM90?
Issue -
State: closed - Opened by ziyuhuang123 about 2 months ago
- 1 comment
#1406 - fix bug when is_grad is false
Pull Request -
State: closed - Opened by woaixiaoxiao about 2 months ago
#1405 - Add missing tests/__init__.py
Pull Request -
State: open - Opened by BioGeek about 2 months ago
#1404 - 4 Failing `test_flash_attn_output_fp8` tests on H100
Issue -
State: open - Opened by BioGeek about 2 months ago
- 3 comments
#1403 - Does bar.sync Emit Semaphores Alongside bar.arrive?
Issue -
State: closed - Opened by ziyuhuang123 about 2 months ago
- 1 comment
#1402 - is flash_attn_with_kvcache() supposed to work for seqlen > 1 ?
Issue -
State: closed - Opened by vince62s about 2 months ago
- 1 comment
#1401 - Understanding sync and arrive in FA3 Store Function
Issue -
State: open - Opened by ziyuhuang123 about 2 months ago
#1400 - Understanding the Role of arrive in NamedBarrier Synchronization
Issue -
State: open - Opened by ziyuhuang123 about 2 months ago
- 1 comment
#1399 - Fix incorrect torch dtype
Pull Request -
State: closed - Opened by kevmo314 about 2 months ago
#1398 - The execution order between GEMM0 of the next iteration and GEMM1 of the current iteration in Pingpong scheduling pipeline for overlapping gemms and softmax between warpgroups
Issue -
State: open - Opened by tengdecheng about 2 months ago
#1397 - check torch.is_grad_enabled before calling customer flash atten ops
Pull Request -
State: closed - Opened by XiaobingSuper about 2 months ago
- 5 comments
#1396 - Why Doesn't FlashAttention3 Allow KV and O to Share Memory Space?
Issue -
State: open - Opened by ziyuhuang123 about 2 months ago
- 1 comment
#1396 - Why Doesn't FlashAttention3 Allow KV and O to Share Memory Space?
Issue -
State: open - Opened by ziyuhuang123 about 2 months ago
- 1 comment
#1395 - g2s K tensor when handling padding in the seq_k, clear it rather than keeping the default SMEM values.
Issue -
State: open - Opened by NVIDIA-JerryChen about 2 months ago
#1394 - Create PEP 517 build metadata
Pull Request -
State: closed - Opened by frostming about 2 months ago
- 1 comment
#1394 - Create PEP 517 build metadata
Pull Request -
State: open - Opened by frostming about 2 months ago
#1393 - Add hipBLAS/cuBLAS distinction in benchmark_gemm.py
Pull Request -
State: closed - Opened by garrettbyrd about 2 months ago
#1392 - fix a bug (issue #1390) caused by typo
Pull Request -
State: closed - Opened by liguohao96 about 2 months ago
- 1 comment
#1392 - fix a bug (issue #1390) caused by typo
Pull Request -
State: open - Opened by liguohao96 about 2 months ago
#1391 - Large loss of accuracy between flashattention and native
Issue -
State: open - Opened by fanfanaaaa about 2 months ago
- 3 comments
#1391 - Large loss of accuracy between flashattention and native
Issue -
State: open - Opened by fanfanaaaa about 2 months ago
- 4 comments
#1390 - a small typo and fix
Issue -
State: open - Opened by liguohao96 about 2 months ago
- 3 comments
#1389 - Why does NamedBarrier in epilogue use NumMmaThreads(256) + NumThreadsPerWarp(32)?
Issue -
State: open - Opened by ziyuhuang123 about 2 months ago
- 2 comments
#1388 - Windows 11 Installation Error
Issue -
State: open - Opened by 404-xianjin about 2 months ago
#1387 - FA-3 installation errors
Issue -
State: closed - Opened by asahni04 about 2 months ago
- 1 comment
#1386 - is fwd_kvcache compatible with torch.compile in 2.7.2post1 ?
Issue -
State: open - Opened by vince62s about 2 months ago
- 6 comments
#1385 - How to get actual col idx
Issue -
State: open - Opened by wenkechen 2 months ago
#1384 - Support dedicated compile[For Research]
Pull Request -
State: open - Opened by AllenDou 2 months ago
#1383 - don't save inputs/outputs buffer of FlashAttenFunc to reduce memory usage for inference mode
Pull Request -
State: closed - Opened by XiaobingSuper 2 months ago
- 3 comments
#1382 - Fix deprecation warnings
Pull Request -
State: open - Opened by rongou 2 months ago
#1381 - [ROCm] benchmark_flash_attention.py failing with Memory Access Fault
Issue -
State: open - Opened by nikhil-tensorwave 2 months ago
- 3 comments
#1380 - Validate that `git` is available and `CUDA_HOME` is set in `setup.py`
Pull Request -
State: closed - Opened by davidmezzetti 2 months ago
#1379 - Possible to install with just `torch` installed?
Issue -
State: closed - Opened by davidmezzetti 2 months ago
- 6 comments
#1378 - seq_lens variable used in the attention kernel
Issue -
State: closed - Opened by chakpongchung 2 months ago
- 1 comment
#1377 - Flash attention 3 does not use Dropout_p?
Issue -
State: open - Opened by nighting0le01 2 months ago
- 6 comments
#1376 - Accuracy Drop with Flash-Attention Reimplementation in Encoder-Decoder Architecture (ViT)
Issue -
State: closed - Opened by ImaGonEs 2 months ago
- 2 comments
#1375 - FA3 for cuda12.3 but torch only releases cuda 12.4 version
Issue -
State: closed - Opened by wplf 2 months ago
- 2 comments
#1374 - Headdim==96 in FA3
Issue -
State: closed - Opened by wplf 2 months ago
- 2 comments
#1373 - Can wgmma.async and barrier.arrive Ensure GEMM Completion Before Moving Forward?
Issue -
State: closed - Opened by ziyuhuang123 2 months ago
- 2 comments
#1372 - Why we have a third barrier::QueryEmpty arrive?
Issue -
State: open - Opened by ziyuhuang123 2 months ago
- 1 comment
#1371 - Question About Initial sync Behavior Without Prior arrive in Warpgroup Scheduling
Issue -
State: closed - Opened by ziyuhuang123 2 months ago
- 2 comments
#1370 - Question about warp_scheduler_barrier_arrive in FA3 and cutlass::arch::NamedBarrier::arrive Usage
Issue -
State: closed - Opened by ziyuhuang123 2 months ago
- 2 comments
#1369 - GLT
Issue -
State: open - Opened by deepgandu 2 months ago
#1368 - The byzantine copy of Tensor O
Issue -
State: closed - Opened by phantaurus 2 months ago
- 4 comments
#1368 - The byzantine copy of Tensor O
Issue -
State: closed - Opened by phantaurus 2 months ago
- 4 comments
#1367 - Issue Installing cuDNN Python Module via pip install cudnn
Issue -
State: open - Opened by ziyuhuang123 2 months ago
#1367 - Issue Installing cuDNN Python Module via pip install cudnn
Issue -
State: open - Opened by ziyuhuang123 2 months ago
#1366 - Sliding Window (Local Attention) possibly incorrect on newest branch
Issue -
State: open - Opened by kilianhaefeli 2 months ago
- 1 comment
#1365 - Change {q,k,v}_descale to be per-batch-element
Pull Request -
State: closed - Opened by ericauld 2 months ago
#1365 - Change {q,k,v}_descale to be per-batch-element
Pull Request -
State: closed - Opened by ericauld 2 months ago
#1364 - Is there any way to compile the codes with nvcc debug flag(-G)?
Issue -
State: open - Opened by Dev-Jahn 2 months ago
- 6 comments
#1363 - flash_bwd_kernel.h: add maybe_unused annotation to suppress compile warnings
Pull Request -
State: closed - Opened by acgessler 2 months ago
#1362 - Triton Issues for Rotary flash_attn.layers.rotary.apply_rotary_emb_qkv_
Issue -
State: open - Opened by albertotono 2 months ago
#1362 - Triton Issues for Rotary flash_attn.layers.rotary.apply_rotary_emb_qkv_
Issue -
State: open - Opened by albertotono 2 months ago
#1361 - Fix FA3 Varlen Performance regression
Pull Request -
State: closed - Opened by kadeng 2 months ago
#1360 - Need `tests/__init__.py` for `hopper/test_flash_attn.py`
Issue -
State: open - Opened by hancheolcho 3 months ago
- 2 comments
#1360 - Need `tests/__init__.py` for `hopper/test_flash_attn.py`
Issue -
State: open - Opened by hancheolcho 3 months ago
- 2 comments
#1359 - Output Discrepancy Between FlashAttention and PyTorch Attention
Issue -
State: closed - Opened by pengzhangzhi 3 months ago
- 2 comments
#1359 - Output Discrepancy Between FlashAttention and PyTorch Attention
Issue -
State: closed - Opened by pengzhangzhi 3 months ago
- 2 comments
#1358 - Add support for qk dim different from v dim in PR #1166
Issue -
State: closed - Opened by YTianZHU 3 months ago
#1357 - How to get attention score? "return_attn_probs=True" is not work.
Issue -
State: closed - Opened by UnableToUseGit 3 months ago
- 3 comments
#1357 - How to get attention score? "return_attn_probs=True" is not work.
Issue -
State: closed - Opened by UnableToUseGit 3 months ago
- 1 comment
#1356 - How to assign ROCm architecture during pip installing
Issue -
State: open - Opened by deeptimhe 3 months ago
#1356 - How to assign ROCm architecture during pip installing
Issue -
State: open - Opened by deeptimhe 3 months ago
#1355 - Does flash-attn support FP8 inference on L40-48G?
Issue -
State: open - Opened by LinJianping 3 months ago
#1354 - Flashdecoding with appendKV might incorrect
Issue -
State: open - Opened by DD-DuDa 3 months ago
#1353 - Added a Benchmark for Rotary and Improved Rotary Performance
Pull Request -
State: closed - Opened by alexkranias-amd 3 months ago
- 1 comment
#1352 - FP8 test failure on the latest 'decode' branch
Issue -
State: closed - Opened by cscyuge 3 months ago
- 1 comment
#1351 - Unable to cast Python instance of type <class 'torch._subclasses.fake_tensor.FakeTensor'> to C++ type
Issue -
State: open - Opened by zwhe99 3 months ago
#1351 - Unable to cast Python instance of type <class 'torch._subclasses.fake_tensor.FakeTensor'> to C++ type
Issue -
State: open - Opened by zwhe99 3 months ago
- 1 comment
#1350 - How could I use a query to calculate the attention with multiple k-v
Issue -
State: open - Opened by DongyuXu77 3 months ago
- 1 comment
#1350 - How could I use a query to calculate the attention with multiple k-v
Issue -
State: closed - Opened by DongyuXu77 3 months ago
- 1 comment
#1349 - Question of the equation in Flash Attention 2 Paper
Issue -
State: open - Opened by jeffrey-sunh1 3 months ago
- 5 comments
#1348 - Issue with installing flash attention ` import flash_attn_2_cuda as flash_attn_cuda`
Issue -
State: open - Opened by hahmad2008 3 months ago
- 6 comments
#1348 - Issue with installing flash attention ` import flash_attn_2_cuda as flash_attn_cuda`
Issue -
State: open - Opened by hahmad2008 3 months ago
- 1 comment
#1347 - breaking change for head size non divisble by 8
Issue -
State: closed - Opened by felix-red-panda 3 months ago
- 1 comment
#1347 - breaking change for head size non divisble by 8
Issue -
State: closed - Opened by felix-red-panda 3 months ago
- 1 comment
#1346 - RuntimeError: Error compiling objects for extension
Issue -
State: closed - Opened by beyondguo 3 months ago
- 5 comments
#1346 - RuntimeError: Error compiling objects for extension
Issue -
State: closed - Opened by beyondguo 3 months ago
- 5 comments
#1345 - [Q] why flash attention MFU is over 100% in A800
Issue -
State: closed - Opened by wonderisland 3 months ago
#1345 - [Q] why flash attention MFU is over 100% in A800
Issue -
State: closed - Opened by wonderisland 3 months ago
#1344 - [Bug] Potential hazard in epilogue when kUseVarSeqLen=true
Issue -
State: closed - Opened by QiZhangNV 3 months ago
- 2 comments
#1343 - FA3 Failed to initialize the TMA descriptor
Issue -
State: open - Opened by li-yi-dong 3 months ago
#1342 - Assistance on implementing Flash Attention 2 for Turing
Issue -
State: open - Opened by samuelzxu 3 months ago
#1342 - Assistance on implementing Flash Attention 2 for Turing
Issue -
State: open - Opened by samuelzxu 3 months ago
#1341 - [Bug]: Perf slump after updating flash-attn 2.7.0 (with torch.compile using)
Issue -
State: open - Opened by Mnb66 3 months ago
- 4 comments
#1340 - Building a wheel for torch 2.5.0-2.5.1 with Python 3.10 and CUDA 12.4 on Windows has failed.
Issue -
State: open - Opened by lldacing 3 months ago
- 2 comments