Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / lucidrains/performer-pytorch issues and pull requests
#97 - Find FastAttention is slower, and also with more GPU memory usage
Issue -
State: open - Opened by phyllispeng123 3 months ago
#96 - Separate Transformer Encoder & Decoder modules with linear attention?
Issue -
State: open - Opened by harshakmohan 5 months ago
#95 - Modify the transformer tutorial based on performer
Issue -
State: open - Opened by HelloWorldLTY over 1 year ago
#94 - Cross-attention with arbitrary causal mask
Issue -
State: open - Opened by BarKetPlace almost 2 years ago
#93 - Pretrained example
Issue -
State: open - Opened by jubueche about 2 years ago
#92 - Performer Pytorch Slower than Expected and Please Help with Understanding Parameter Count
Issue -
State: open - Opened by weihaosong about 2 years ago
- 1 comment
#91 - Using replicating nn.MultiHeadAttention with multiple performer SelfAttention modules
Issue -
State: open - Opened by JGittles about 2 years ago
#90 - I want to use Peroformer on MAE
Issue -
State: open - Opened by Zhaoyi-Yan over 2 years ago
#89 - Question about masking
Issue -
State: open - Opened by Microbiods over 2 years ago
- 2 comments
#88 - Question: Is Performer order equivariant? (can it transform an unordered set of tensors)
Issue -
State: open - Opened by nmakes over 2 years ago
#87 - Using Performer with GNNs
Issue -
State: open - Opened by jah377 almost 3 years ago
#86 - Huge model state dict size?
Issue -
State: open - Opened by Liyue1d almost 3 years ago
#85 - Attention map
Issue -
State: closed - Opened by merouone almost 3 years ago
- 2 comments
#84 - Performer Plain
Issue -
State: open - Opened by Rachel66666 almost 3 years ago
#83 - How to test the performer architecture for training new models?
Issue -
State: open - Opened by ayan-iiitd about 3 years ago
- 1 comment
#82 - Output inconsistent for autoregressive performer
Issue -
State: open - Opened by GanjinZero about 3 years ago
- 2 comments
#81 - Rotary Position Embedding
Issue -
State: open - Opened by ahmdtaha about 3 years ago
#80 - torch_tensorrt compilation fails
Issue -
State: open - Opened by FredHaa about 3 years ago
#79 - way to make two elements invisible?
Issue -
State: open - Opened by 1140310118 about 3 years ago
#78 - Add repetition penalty for text generation
Pull Request -
State: closed - Opened by AlexandreDey over 3 years ago
#77 - Residual Connection
Issue -
State: closed - Opened by jiyounglee-0523 over 3 years ago
- 3 comments
#76 - torch.max(data_dash) bug
Issue -
State: closed - Opened by martinpflaum over 3 years ago
- 2 comments
#75 - Fix torch.qr deprecation warning
Pull Request -
State: closed - Opened by Erotemic over 3 years ago
- 1 comment
#74 - Some little changes
Pull Request -
State: open - Opened by vasiliyeskin over 3 years ago
#73 - hyperbolic cosine based estimator
Issue -
State: open - Opened by gaganbahga over 3 years ago
#72 - Relative Positional Encoding for Linear Attention Models.
Issue -
State: closed - Opened by Vbansal21 over 3 years ago
- 3 comments
#71 - Names `to_k`, `to_q`, `to_v`, `to_out` cause issues
Issue -
State: open - Opened by JamesDeAntonis over 3 years ago
#70 - Recover attention scores
Issue -
State: open - Opened by carlomarxdk over 3 years ago
- 3 comments
#69 - FastAttention doesn't give results in agreement with standard attention?
Issue -
State: open - Opened by simonaxelrod over 3 years ago
- 7 comments
#68 - Input and Context size in CrossAttention
Issue -
State: closed - Opened by caffeinetoomuch almost 4 years ago
- 2 comments
#67 - Performer Benchmark
Issue -
State: open - Opened by CavallucciMartina almost 4 years ago
#66 - Causal performer slower than causal regular attention
Issue -
State: open - Opened by JamesDeAntonis almost 4 years ago
- 3 comments
#65 - `to_out` bias
Issue -
State: closed - Opened by JamesDeAntonis almost 4 years ago
- 3 comments
#64 - Causal linear attention benchmark
Issue -
State: closed - Opened by caffeinetoomuch almost 4 years ago
- 13 comments
#63 - why is bias true in `to_<q,k,v>`?
Issue -
State: closed - Opened by JamesDeAntonis almost 4 years ago
- 4 comments
#62 - Decoder Mask
Issue -
State: open - Opened by Muennighoff almost 4 years ago
#61 - Getting error with the check_redraw_projections when using DataParallel
Issue -
State: closed - Opened by Warvito almost 4 years ago
- 4 comments
#60 - context-specific embeddings from language model?
Issue -
State: open - Opened by rainwala almost 4 years ago
#59 - Allow for performers to be used on cpu-only torch
Pull Request -
State: closed - Opened by i404788 almost 4 years ago
- 2 comments
#58 - Deterministic layers
Issue -
State: open - Opened by anklebreaker almost 4 years ago
- 1 comment
#57 - Saving checkpoints during training and loading
Issue -
State: closed - Opened by ylhsieh almost 4 years ago
- 3 comments
#56 - Extra FF when using cross attention
Issue -
State: closed - Opened by gulnazaki about 4 years ago
- 8 comments
#55 - FixNorm alongside ScaleNorm
Issue -
State: open - Opened by gulnazaki about 4 years ago
- 3 comments
#54 - Added fixed and axial positional embedding option
Pull Request -
State: closed - Opened by gulnazaki about 4 years ago
- 1 comment
#53 - Decoder randomly outputs NaN tensor.
Issue -
State: closed - Opened by y-rokutan about 4 years ago
- 5 comments
#52 - Performance gain replacing original attention to fast attention in this repo?
Issue -
State: open - Opened by phypan11 about 4 years ago
- 2 comments
#51 - Applying decoder input mask?
Issue -
State: closed - Opened by maxmax1992 about 4 years ago
- 2 comments
#50 - Bug fix in original google-research implementation
Issue -
State: closed - Opened by gulnazaki about 4 years ago
- 3 comments
#49 - Plain Performer, if you are working with say images or other modalities
Issue -
State: open - Opened by haoshuai714 about 4 years ago
- 1 comment
#48 - Replacing Attention module of Vision Transformer with SelfAttention Module of Performer?
Issue -
State: open - Opened by PascalHbr about 4 years ago
- 6 comments
#47 - [Feature] Adding fixed positional embeddings as an option
Issue -
State: closed - Opened by gulnazaki about 4 years ago
- 3 comments
#46 - SelfAttention layer seems to have large error relative to nn.MultiheadAttention?
Issue -
State: open - Opened by jueseph about 4 years ago
- 8 comments
#45 - Question: torch.max term used in `softmax_kernel`
Issue -
State: closed - Opened by ClawangTU about 4 years ago
- 4 comments
#44 - No fp16 support from fast-transformers (CausalDotProduct)
Issue -
State: open - Opened by gulnazaki about 4 years ago
- 75 comments
#43 - Performer Encoder Decoder architecture
Pull Request -
State: closed - Opened by gulnazaki about 4 years ago
#42 - Triangular matrices ?
Issue -
State: closed - Opened by jeremycochoy about 4 years ago
- 10 comments
#41 - fix normalization for fast cuda version of causal
Pull Request -
State: closed - Opened by lucidrains about 4 years ago
#40 - wrong implementation for autoregressive self-attention
Issue -
State: closed - Opened by Sleepychord about 4 years ago
- 10 comments
#39 - Use performer for finetunig task
Issue -
State: open - Opened by usmanhidral about 4 years ago
#38 - [Feature] EncoderDecoder framework, similar to ReformerEncDec
Issue -
State: closed - Opened by gulnazaki about 4 years ago
- 22 comments
#37 - RuntimeError: CUDA error: no kernel image is available for execution on the device
Issue -
State: open - Opened by james20141606 about 4 years ago
- 1 comment
#36 - A small question regarding `softmax_kernel`
Issue -
State: closed - Opened by boredtylin about 4 years ago
- 1 comment
#35 - Input ordering is not explicitly stated
Issue -
State: closed - Opened by haakom about 4 years ago
- 2 comments
#34 - Difficult installing on Windows machine
Issue -
State: open - Opened by rasin-tsukuba about 4 years ago
#33 - Current version seems to make saving and loading through model state dictionaries difficult
Issue -
State: open - Opened by ThomasBJones2 about 4 years ago
- 1 comment
#32 - Any performance comparison on standard benchmarks?
Issue -
State: open - Opened by lihuiknight about 4 years ago
#31 - Causal for images
Issue -
State: closed - Opened by Etzelkut about 4 years ago
- 2 comments
#30 - Add feature_redraw_interval option
Pull Request -
State: closed - Opened by norabelrose about 4 years ago
- 8 comments
#29 - Floating point exception @ loss.backward()
Issue -
State: open - Opened by AhmedCheikhRouhou about 4 years ago
#28 - There are no tests in this project, use_rezero=True is non-functional
Issue -
State: closed - Opened by fcampagne about 4 years ago
- 10 comments
#27 - Bug in FastAttention.forward()
Issue -
State: closed - Opened by shayeboshi about 4 years ago
- 5 comments
#26 - is dependency on pytorch-fast-transformers necessary?
Issue -
State: closed - Opened by fcampagne about 4 years ago
- 2 comments
#25 - add missing device assignment
Pull Request -
State: closed - Opened by theblackcat102 about 4 years ago
- 1 comment
#24 - A Concrete Example of Use Performer-Pytorch into other Model checkpoint?
Issue -
State: open - Opened by ghost about 4 years ago
- 4 comments
#23 - Performer Decoder
Pull Request -
State: closed - Opened by qazwsxal about 4 years ago
- 3 comments
#22 - Allow for no local attention heads
Pull Request -
State: closed - Opened by qazwsxal over 4 years ago
#21 - Causal AutoRegressive Doubt
Issue -
State: closed - Opened by HaldiramSharma over 4 years ago
- 3 comments
#20 - Is it slower than original bert when training?
Issue -
State: closed - Opened by yygle over 4 years ago
- 3 comments
#19 - Relative position encoding
Issue -
State: closed - Opened by sooheon over 4 years ago
- 14 comments
#18 - definition of layer_drop()
Issue -
State: closed - Opened by shi27feng over 4 years ago
- 2 comments
#17 - Adding zeroes in softmax_kernel
Issue -
State: closed - Opened by marhlder over 4 years ago
- 2 comments
#16 - Load weights of transformer into PerformerLM
Issue -
State: open - Opened by Mazgis47 over 4 years ago
- 6 comments
#15 - unable to import cuda code for auto-regressive Performer
Issue -
State: open - Opened by batrlatom over 4 years ago
- 8 comments
#14 - Regarding DDP and reversible networks
Issue -
State: closed - Opened by Parskatt over 4 years ago
- 11 comments
#13 - Inverse of renormalization matrix being used?
Issue -
State: closed - Opened by sidnarayanan over 4 years ago
- 1 comment
#12 - use performer for image detection
Issue -
State: closed - Opened by madurner over 4 years ago
- 7 comments
#11 - pip install error
Issue -
State: open - Opened by catechumen27 over 4 years ago
- 5 comments
#10 - Results are not deterministic in eval mode
Issue -
State: closed - Opened by arti32lehtonen over 4 years ago
- 4 comments
#9 - Suggestion: Renormalization step for linear attention
Issue -
State: closed - Opened by Parskatt over 4 years ago
- 2 comments
#8 - Issue with biased estimates from QR decomposition
Issue -
State: closed - Opened by Parskatt over 4 years ago
- 9 comments
#7 - Causal model running on GPU
Issue -
State: closed - Opened by Warvito over 4 years ago
- 7 comments
#6 - Redrawing normalized samples using QR slows down training
Issue -
State: closed - Opened by Parskatt over 4 years ago
- 4 comments
#5 - Small issue in random matrix generation
Issue -
State: closed - Opened by Parskatt over 4 years ago
- 1 comment
#4 - Question: Scaling down number of random features depending on number of heads?
Issue -
State: closed - Opened by Parskatt over 4 years ago
- 4 comments
#3 - Feature Request: Enable generalized attention.
Issue -
State: closed - Opened by Parskatt over 4 years ago
- 2 comments
#2 - Show what is the performance on enwiki8 is across your projects
Issue -
State: closed - Opened by bratao over 4 years ago
- 10 comments
#1 - Collaborate on Implementation?
Issue -
State: closed - Opened by calclavia over 4 years ago
- 9 comments