lucidrains/performer-pytorch issues and pull requests

#97 - Find FastAttention is slower, and also with more GPU memory usage

Issue - State: open - Opened by phyllispeng123 3 months ago

#96 - Separate Transformer Encoder & Decoder modules with linear attention?

Issue - State: open - Opened by harshakmohan 5 months ago

#95 - Modify the transformer tutorial based on performer

Issue - State: open - Opened by HelloWorldLTY over 1 year ago

#94 - Cross-attention with arbitrary causal mask

Issue - State: open - Opened by BarKetPlace almost 2 years ago

#93 - Pretrained example

Issue - State: open - Opened by jubueche about 2 years ago

#92 - Performer Pytorch Slower than Expected and Please Help with Understanding Parameter Count

Issue - State: open - Opened by weihaosong about 2 years ago - 1 comment

#91 - Using replicating nn.MultiHeadAttention with multiple performer SelfAttention modules

Issue - State: open - Opened by JGittles about 2 years ago

#90 - I want to use Peroformer on MAE

Issue - State: open - Opened by Zhaoyi-Yan over 2 years ago

#89 - Question about masking

Issue - State: open - Opened by Microbiods over 2 years ago - 2 comments

#88 - Question: Is Performer order equivariant? (can it transform an unordered set of tensors)

Issue - State: open - Opened by nmakes over 2 years ago

#87 - Using Performer with GNNs

Issue - State: open - Opened by jah377 almost 3 years ago

#86 - Huge model state dict size?

Issue - State: open - Opened by Liyue1d almost 3 years ago

#85 - Attention map

Issue - State: closed - Opened by merouone almost 3 years ago - 2 comments

#84 - Performer Plain

Issue - State: open - Opened by Rachel66666 almost 3 years ago

#83 - How to test the performer architecture for training new models?

Issue - State: open - Opened by ayan-iiitd about 3 years ago - 1 comment

#82 - Output inconsistent for autoregressive performer

Issue - State: open - Opened by GanjinZero about 3 years ago - 2 comments

#81 - Rotary Position Embedding

Issue - State: open - Opened by ahmdtaha about 3 years ago

#80 - torch_tensorrt compilation fails

Issue - State: open - Opened by FredHaa about 3 years ago

#79 - way to make two elements invisible?

Issue - State: open - Opened by 1140310118 about 3 years ago

#78 - Add repetition penalty for text generation

Pull Request - State: closed - Opened by AlexandreDey over 3 years ago

#77 - Residual Connection

Issue - State: closed - Opened by jiyounglee-0523 over 3 years ago - 3 comments

#76 - torch.max(data_dash) bug

Issue - State: closed - Opened by martinpflaum over 3 years ago - 2 comments

#75 - Fix torch.qr deprecation warning

Pull Request - State: closed - Opened by Erotemic over 3 years ago - 1 comment

#74 - Some little changes

Pull Request - State: open - Opened by vasiliyeskin over 3 years ago

#73 - hyperbolic cosine based estimator

Issue - State: open - Opened by gaganbahga over 3 years ago

#72 - Relative Positional Encoding for Linear Attention Models.

Issue - State: closed - Opened by Vbansal21 over 3 years ago - 3 comments

#71 - Names `to_k`, `to_q`, `to_v`, `to_out` cause issues

Issue - State: open - Opened by JamesDeAntonis over 3 years ago

#70 - Recover attention scores

Issue - State: open - Opened by carlomarxdk over 3 years ago - 3 comments

#69 - FastAttention doesn't give results in agreement with standard attention?

Issue - State: open - Opened by simonaxelrod over 3 years ago - 7 comments

#68 - Input and Context size in CrossAttention

Issue - State: closed - Opened by caffeinetoomuch almost 4 years ago - 2 comments

#67 - Performer Benchmark

Issue - State: open - Opened by CavallucciMartina almost 4 years ago

#66 - Causal performer slower than causal regular attention

Issue - State: open - Opened by JamesDeAntonis almost 4 years ago - 3 comments

#65 - `to_out` bias

Issue - State: closed - Opened by JamesDeAntonis almost 4 years ago - 3 comments

#64 - Causal linear attention benchmark

Issue - State: closed - Opened by caffeinetoomuch almost 4 years ago - 13 comments

#63 - why is bias true in `to_<q,k,v>`?

Issue - State: closed - Opened by JamesDeAntonis almost 4 years ago - 4 comments

#62 - Decoder Mask

Issue - State: open - Opened by Muennighoff almost 4 years ago

#61 - Getting error with the check_redraw_projections when using DataParallel

Issue - State: closed - Opened by Warvito almost 4 years ago - 4 comments

#60 - context-specific embeddings from language model?

Issue - State: open - Opened by rainwala almost 4 years ago

#59 - Allow for performers to be used on cpu-only torch

Pull Request - State: closed - Opened by i404788 almost 4 years ago - 2 comments

#58 - Deterministic layers

Issue - State: open - Opened by anklebreaker almost 4 years ago - 1 comment

#57 - Saving checkpoints during training and loading

Issue - State: closed - Opened by ylhsieh almost 4 years ago - 3 comments

#56 - Extra FF when using cross attention

Issue - State: closed - Opened by gulnazaki about 4 years ago - 8 comments

#55 - FixNorm alongside ScaleNorm

Issue - State: open - Opened by gulnazaki about 4 years ago - 3 comments

#54 - Added fixed and axial positional embedding option

Pull Request - State: closed - Opened by gulnazaki about 4 years ago - 1 comment

#53 - Decoder randomly outputs NaN tensor.

Issue - State: closed - Opened by y-rokutan about 4 years ago - 5 comments

#52 - Performance gain replacing original attention to fast attention in this repo?

Issue - State: open - Opened by phypan11 about 4 years ago - 2 comments

#51 - Applying decoder input mask?

Issue - State: closed - Opened by maxmax1992 about 4 years ago - 2 comments

#50 - Bug fix in original google-research implementation

Issue - State: closed - Opened by gulnazaki about 4 years ago - 3 comments

#49 - Plain Performer, if you are working with say images or other modalities

Issue - State: open - Opened by haoshuai714 about 4 years ago - 1 comment

#48 - Replacing Attention module of Vision Transformer with SelfAttention Module of Performer?

Issue - State: open - Opened by PascalHbr about 4 years ago - 6 comments

#47 - [Feature] Adding fixed positional embeddings as an option

Issue - State: closed - Opened by gulnazaki about 4 years ago - 3 comments

#46 - SelfAttention layer seems to have large error relative to nn.MultiheadAttention?

Issue - State: open - Opened by jueseph about 4 years ago - 8 comments

#45 - Question: torch.max term used in `softmax_kernel`

Issue - State: closed - Opened by ClawangTU about 4 years ago - 4 comments

#44 - No fp16 support from fast-transformers (CausalDotProduct)

Issue - State: open - Opened by gulnazaki about 4 years ago - 75 comments

#43 - Performer Encoder Decoder architecture

Pull Request - State: closed - Opened by gulnazaki about 4 years ago

#42 - Triangular matrices ?

Issue - State: closed - Opened by jeremycochoy about 4 years ago - 10 comments

#41 - fix normalization for fast cuda version of causal

Pull Request - State: closed - Opened by lucidrains about 4 years ago

#40 - wrong implementation for autoregressive self-attention

Issue - State: closed - Opened by Sleepychord about 4 years ago - 10 comments

#39 - Use performer for finetunig task

Issue - State: open - Opened by usmanhidral about 4 years ago

#38 - [Feature] EncoderDecoder framework, similar to ReformerEncDec

Issue - State: closed - Opened by gulnazaki about 4 years ago - 22 comments

#37 - RuntimeError: CUDA error: no kernel image is available for execution on the device

Issue - State: open - Opened by james20141606 about 4 years ago - 1 comment

#36 - A small question regarding `softmax_kernel`

Issue - State: closed - Opened by boredtylin about 4 years ago - 1 comment

#35 - Input ordering is not explicitly stated

Issue - State: closed - Opened by haakom about 4 years ago - 2 comments

#34 - Difficult installing on Windows machine

Issue - State: open - Opened by rasin-tsukuba about 4 years ago

#33 - Current version seems to make saving and loading through model state dictionaries difficult

Issue - State: open - Opened by ThomasBJones2 about 4 years ago - 1 comment

#32 - Any performance comparison on standard benchmarks?

Issue - State: open - Opened by lihuiknight about 4 years ago

#31 - Causal for images

Issue - State: closed - Opened by Etzelkut about 4 years ago - 2 comments

#30 - Add feature_redraw_interval option

Pull Request - State: closed - Opened by norabelrose about 4 years ago - 8 comments

#29 - Floating point exception @ loss.backward()

Issue - State: open - Opened by AhmedCheikhRouhou about 4 years ago

#28 - There are no tests in this project, use_rezero=True is non-functional

Issue - State: closed - Opened by fcampagne about 4 years ago - 10 comments

#27 - Bug in FastAttention.forward()

Issue - State: closed - Opened by shayeboshi about 4 years ago - 5 comments

#26 - is dependency on pytorch-fast-transformers necessary?

Issue - State: closed - Opened by fcampagne about 4 years ago - 2 comments

#25 - add missing device assignment

Pull Request - State: closed - Opened by theblackcat102 about 4 years ago - 1 comment

#24 - A Concrete Example of Use Performer-Pytorch into other Model checkpoint?

Issue - State: open - Opened by ghost about 4 years ago - 4 comments

#23 - Performer Decoder

Pull Request - State: closed - Opened by qazwsxal about 4 years ago - 3 comments

#22 - Allow for no local attention heads

Pull Request - State: closed - Opened by qazwsxal over 4 years ago

#21 - Causal AutoRegressive Doubt

Issue - State: closed - Opened by HaldiramSharma over 4 years ago - 3 comments

#20 - Is it slower than original bert when training?

Issue - State: closed - Opened by yygle over 4 years ago - 3 comments

#19 - Relative position encoding

Issue - State: closed - Opened by sooheon over 4 years ago - 14 comments

#18 - definition of layer_drop()

Issue - State: closed - Opened by shi27feng over 4 years ago - 2 comments

#17 - Adding zeroes in softmax_kernel

Issue - State: closed - Opened by marhlder over 4 years ago - 2 comments

#16 - Load weights of transformer into PerformerLM

Issue - State: open - Opened by Mazgis47 over 4 years ago - 6 comments

#15 - unable to import cuda code for auto-regressive Performer

Issue - State: open - Opened by batrlatom over 4 years ago - 8 comments

#14 - Regarding DDP and reversible networks

Issue - State: closed - Opened by Parskatt over 4 years ago - 11 comments

#13 - Inverse of renormalization matrix being used?

Issue - State: closed - Opened by sidnarayanan over 4 years ago - 1 comment

#12 - use performer for image detection

Issue - State: closed - Opened by madurner over 4 years ago - 7 comments

#11 - pip install error

Issue - State: open - Opened by catechumen27 over 4 years ago - 5 comments

#10 - Results are not deterministic in eval mode

Issue - State: closed - Opened by arti32lehtonen over 4 years ago - 4 comments

#9 - Suggestion: Renormalization step for linear attention

Issue - State: closed - Opened by Parskatt over 4 years ago - 2 comments

#8 - Issue with biased estimates from QR decomposition

Issue - State: closed - Opened by Parskatt over 4 years ago - 9 comments

#7 - Causal model running on GPU

Issue - State: closed - Opened by Warvito over 4 years ago - 7 comments

#6 - Redrawing normalized samples using QR slows down training

Issue - State: closed - Opened by Parskatt over 4 years ago - 4 comments

#5 - Small issue in random matrix generation

Issue - State: closed - Opened by Parskatt over 4 years ago - 1 comment

GitHub / lucidrains/performer-pytorch issues and pull requests