HazyResearch/flash-attention issues and pull requests

#239 - Triton implementation error

Issue - State: closed - Opened by antony-frolov over 1 year ago - 1 comment

#238 - Implement flash-attn replace for BLOOM attention in huggingface

Issue - State: open - Opened by dat-browny over 1 year ago - 1 comment

#237 - ImportError when trying to import FlashMHA from flash_attn.flash_attention （RTX 3090）

Issue - State: open - Opened by quant-cracker over 1 year ago

#236 - Miss libtorch_cuda_cu.so in torch nightly

Issue - State: open - Opened by nkflash over 1 year ago

#235 - test_flash_attn fails at T4

Issue - State: closed - Opened by stephen-youn over 1 year ago - 4 comments

#234 - Flash-attention under Triton 2.0

Issue - State: open - Opened by junjie18 over 1 year ago - 2 comments

#233 - cu118 cannont pip install flash_attn

Issue - State: open - Opened by ranck626 over 1 year ago

#232 - fix for "dot() got an unexpected keyword argument 'trans_b'" error

Pull Request - State: open - Opened by winglian over 1 year ago - 3 comments

#231 - Error install flash-attn

Issue - State: open - Opened by ryurobin1990 over 1 year ago - 3 comments

#230 - Please provide Dockerfile with working installation

Issue - State: open - Opened by ivsanro1 over 1 year ago - 1 comment

#229 - Allow adding an optional local version to the package version

Pull Request - State: closed - Opened by maxhgerlach over 1 year ago

#228 - What are the specific difficulties encountered in supporting flash attention on V100?

Issue - State: open - Opened by hujiaxin0 over 1 year ago - 5 comments

#227 - Multi-query with flash-attetion

Issue - State: closed - Opened by dongluw over 1 year ago - 4 comments

#226 - Unable to import flash_attn_cuda

Issue - State: open - Opened by HuayuWong over 1 year ago - 8 comments

#225 - Building wheel for flash-attn (pyproject.toml) did not run successfully

Issue - State: open - Opened by MilesQLi over 1 year ago - 6 comments

#224 - Building wheel for flash-attn (pyproject.toml) did not run successfully

Issue - State: closed - Opened by jesswhitts over 1 year ago - 3 comments

#223 - Prebuilt Binary? Stuck at "Building wheels for collected packages: flash-attn" for more than an hour.

Issue - State: open - Opened by chigkim over 1 year ago - 1 comment

#222 - Memory issue when using fused_dense kernel with deepspeed

Issue - State: open - Opened by yongyanrao over 1 year ago - 6 comments

#221 - (Question) Why modifying the BLOCK size will cause errors?

Issue - State: open - Opened by wuliJerry over 1 year ago - 2 comments

#220 - (Question) FMHA in FasterTransformers vs. FlashAttention

Issue - State: closed - Opened by cadedaniel over 1 year ago - 2 comments

#219 - Cuda 12.1 is not supported.

Issue - State: closed - Opened by larawehbe over 1 year ago - 7 comments

#218 - Support for Prefix LM attention

Issue - State: closed - Opened by jzhang38 over 1 year ago - 2 comments

#217 - IndexError about flash_attn_unpadded_qkvpacked_func

Issue - State: closed - Opened by bofei5675 over 1 year ago - 3 comments

#216 - Questions about prenorm config of block module

Issue - State: closed - Opened by ftgreat over 1 year ago - 5 comments

#215 - The installation process always freezes after execution.

Issue - State: closed - Opened by Minxiangliu over 1 year ago - 1 comment

#214 - ALiBi Support on flash-attetion

Issue - State: open - Opened by 2003pro over 1 year ago - 2 comments

#213 - cuda_bf16.h: No such file or directory

Issue - State: open - Opened by mendelsontau over 1 year ago - 8 comments

#212 - error installing on Databricks

Issue - State: open - Opened by opyate over 1 year ago - 3 comments

#211 - Flash-attention does not produce exact attention score?

Issue - State: closed - Opened by ron-vnai over 1 year ago - 2 comments

#210 - What's the difference between GPT3 and GPT2 in the training example?

Issue - State: closed - Opened by jzhang38 over 1 year ago - 1 comment

#209 - flash-attn (1.0.4) not supporting PEP 517 builds

Issue - State: closed - Opened by ntoxeg over 1 year ago - 1 comment

#208 - ImportError: libcudart.so.11.0: cannot open shared object file: No such file or directory

Issue - State: open - Opened by JulioZhao97 over 1 year ago - 1 comment

#207 - Is there a good way to integrate flash attention into the fast transformer?

Issue - State: open - Opened by hujiaxin0 over 1 year ago - 1 comment

#206 - [question] Can there be any benefits of fusing qkv-projection into attention (if possible at all)

Issue - State: closed - Opened by vadimkantorov over 1 year ago - 6 comments

#205 - Issue installing on ROCm system

Issue - State: closed - Opened by KristianMischke over 1 year ago - 1 comment

#204 - Problem installing package with pip

Issue - State: open - Opened by praneetreddy017 over 1 year ago - 1 comment

#203 - Issue with installing using pip FileNotFoundError

Issue - State: open - Opened by cohen3 over 1 year ago - 11 comments

#202 - [BugFix] avoid bug on ImportError

Pull Request - State: closed - Opened by fedebotu over 1 year ago - 1 comment

#201 - performance of ColumnParallelLinear and RowParallelLinear

Issue - State: closed - Opened by yingtongxiong over 1 year ago - 1 comment

#200 - Minor code edits

Pull Request - State: open - Opened by vmarkovtsev over 1 year ago

#199 - Possible to increase the head-dim limit to 96 on the sm_89 cards for flash attention?

Issue - State: open - Opened by vgoklani over 1 year ago - 1 comment

#198 - How to install dependencies for FlashBlocksparse?

Issue - State: open - Opened by chinoll over 1 year ago - 1 comment

#197 - RuntimeError: Expected q_dtype == torch::kFloat16 || ((is_sm8x || is_sm90) && q_dtype == torch::kBFloat16) to be true, but got false

Issue - State: open - Opened by Jeffwan over 1 year ago - 3 comments

#196 - BlockSparse Requirements

Issue - State: open - Opened by wdeng almost 2 years ago

#195 - [Error] When inferring llama with multiple GPU (GPU>1)

Issue - State: open - Opened by marscrazy almost 2 years ago - 2 comments

#194 - RuntimeError when run python setup.py install

Issue - State: closed - Opened by YoucanBaby almost 2 years ago - 3 comments

#193 - Use pyproject.toml to specify build dependencies

Pull Request - State: closed - Opened by anthonyhu almost 2 years ago - 2 comments

#192 - Usage for query padding mask

Issue - State: open - Opened by ComDec almost 2 years ago

#191 - Getting error compiling objects for extension

Issue - State: open - Opened by RishabhKumar777 almost 2 years ago - 4 comments

#190 - Support for NVIDIA GeForce RTX 3090 with Compute Capability 8.6

Issue - State: open - Opened by ericzhou571 almost 2 years ago - 19 comments

#189 - RuntimeError: Error compiling objects for extension

Issue - State: open - Opened by stonecropa almost 2 years ago

#188 - Error in setup.py when installing the package from a fresh conda environment

Issue - State: open - Opened by anthonyhu almost 2 years ago - 5 comments

#187 - Problems with q_dtype

Issue - State: open - Opened by StrangeTcy almost 2 years ago - 9 comments

#186 - CUDA versions and T4 vs A100

Issue - State: closed - Opened by perone almost 2 years ago - 3 comments

#185 - Make a CPU version so computers without Nvidia card can run

Issue - State: closed - Opened by by321 almost 2 years ago - 1 comment

#184 - does it support RTX4090 GPU ?

Issue - State: closed - Opened by moseshu almost 2 years ago - 1 comment

#183 - install error

Issue - State: open - Opened by landerson85 almost 2 years ago - 2 comments

#182 - The error happened in 'python setup.py install'.

Issue - State: open - Opened by zyh190507 almost 2 years ago - 2 comments

#181 - 【Feature Request】Will the codebse support relative position embedding ?

Issue - State: open - Opened by kaixinbear almost 2 years ago - 5 comments

#180 - questions about dropout_add_layer_norm

Issue - State: open - Opened by yingtongxiong almost 2 years ago - 3 comments

#179 - When will there be an ALiBi?

Issue - State: open - Opened by ScottishFold007 almost 2 years ago - 6 comments

#178 - Questions about FusedDenseFunc

Issue - State: closed - Opened by ftgreat almost 2 years ago - 1 comment

#177 - Why can rotary embeddings not be used with cu_seqlen ?

Issue - State: closed - Opened by RuABraun almost 2 years ago - 7 comments

#176 - Why triton version of flash_attn doesn't support dropout?

Issue - State: open - Opened by flymark2010 almost 2 years ago - 5 comments

#175 - [Question] What is the difference between FlashAttention and Memory Efficient Attention in xformers?

Issue - State: closed - Opened by 00INDEX almost 2 years ago - 5 comments

#174 - [WIP] Reorganise code

Pull Request - State: closed - Opened by ZhiyuanChen almost 2 years ago - 3 comments

#173 - Reorganise code in PyTorch formats

Issue - State: closed - Opened by ZhiyuanChen almost 2 years ago

#172 - Errors during compiling from the source

Issue - State: open - Opened by wuliJerry almost 2 years ago - 6 comments

#171 - Updating Triton version

Issue - State: open - Opened by vchiley almost 2 years ago - 4 comments

#170 - Missing module in `setup.py`

Pull Request - State: closed - Opened by CrustaceanJ almost 2 years ago - 2 comments

#169 - Illegal Memory Access when using Block-Sparse Flash Attention, head_dim=128

Issue - State: open - Opened by LLLLxmmm almost 2 years ago - 1 comment

#168 - Illegal Memory Access when using Block-Sparse Flash Attention, head_dim=128

Issue - State: closed - Opened by LLLLxmmm almost 2 years ago

#167 - how to enable additive bias for attention logits using flash attention?

Issue - State: closed - Opened by hiyijian almost 2 years ago - 7 comments

#166 - Enable CUDA graph capture

Pull Request - State: closed - Opened by ksivaman almost 2 years ago - 1 comment

#165 - CUDA graph capture not supported

Issue - State: closed - Opened by ksivaman almost 2 years ago

#164 - make mlp hidden_features defaults to 4*in_features

Pull Request - State: closed - Opened by ZhiyuanChen almost 2 years ago

#163 - v1.0.0 installation failed

Issue - State: closed - Opened by drcege almost 2 years ago - 5 comments

#162 - Sequence Parallelism with Flash Attention

Issue - State: closed - Opened by conceptofmind almost 2 years ago - 2 comments

#161 - training with reset-position-ids and reset-attention-mask

Issue - State: closed - Opened by toothacher17 almost 2 years ago - 1 comment

#160 - from flash_attn.layers.rotary import RotaryEmbedding

Issue - State: open - Opened by Carol-gutianle almost 2 years ago - 9 comments

#159 - Regarding attention weights

Issue - State: closed - Opened by netw0rkf10w almost 2 years ago - 1 comment

#158 - Q: Support for L4 GPUs?

Issue - State: closed - Opened by eeishaan almost 2 years ago - 4 comments

#157 - fatal error: cusolverDn.h: No such file or directory

Issue - State: closed - Opened by Godofnothing almost 2 years ago - 5 comments

#156 - Q: Release schedule?

Issue - State: closed - Opened by ksivaman almost 2 years ago - 3 comments

#155 - It seems that cuda 12.0 and above are not supported now?

Issue - State: closed - Opened by ScottishFold007 almost 2 years ago - 8 comments

#154 - add paddlepaddle in usage

Pull Request - State: closed - Opened by kuizhiqing almost 2 years ago - 1 comment

#153 - Training example using "ParallelFusedMLP" and "ParallelMHA" with 8cards.

Issue - State: closed - Opened by nbcc almost 2 years ago - 2 comments

#152 - float32

Issue - State: closed - Opened by mars1248 almost 2 years ago - 1 comment

#151 - Looking for Guidance to use flash-attention

Issue - State: closed - Opened by liuxing007 almost 2 years ago - 2 comments

#150 - pip install takes 10+ minutes on zen2 epyc

Issue - State: closed - Opened by diegomontoya almost 2 years ago - 2 comments

#149 - is S = QK^T stored in shared memory?

Issue - State: closed - Opened by YichengDWu almost 2 years ago - 4 comments

#148 - V100

Issue - State: open - Opened by gaohao-dev almost 2 years ago - 2 comments

#147 - Add option for deterministic execution

Pull Request - State: closed - Opened by ksivaman almost 2 years ago - 1 comment

#146 - v0.2.8: a little confused in training/src/utils/ddp_zero1.py

Issue - State: closed - Opened by nbcc almost 2 years ago - 1 comment

#145 - Flash attention gives different results than reference attention

Issue - State: closed - Opened by dvruette almost 2 years ago - 7 comments

#144 - supporting fused rms norm for hidden size > 6144

Issue - State: closed - Opened by lxuechen almost 2 years ago - 2 comments

#143 - Running on T4 (AWS EC2)

Issue - State: closed - Opened by baskrahmer almost 2 years ago - 4 comments

#142 - FlashAttention Triton broken

Issue - State: closed - Opened by aska-0096 almost 2 years ago - 7 comments

#141 - lse gradient?

Issue - State: closed - Opened by EricSteinberger almost 2 years ago - 1 comment

#140 - Remove unused kwargs like device in FlashAttention

Pull Request - State: closed - Opened by VikParuchuri almost 2 years ago - 1 comment

GitHub / HazyResearch/flash-attention issues and pull requests