GitHub / lucidrains/routing-transformer issues and pull requests
#33 - How to reconstruct the full attention matrix?
Issue -
State: open - Opened by FarzanT over 2 years ago
- 2 comments
#32 - ONNX export hangs
Issue -
State: closed - Opened by genolve about 3 years ago
- 1 comment
#31 - Compound words
Issue -
State: closed - Opened by wingedsheep over 3 years ago
#30 - Enquiry about BPC Calculation
Issue -
State: open - Opened by ShiweiLiuFdu almost 4 years ago
#29 - 大佬可否细讲讲KmeansAttention?原理和动机
Issue -
State: open - Opened by guotong1988 almost 4 years ago
#28 - Could you please illustrate more about KmeansAttention? Thank you very much!
Issue -
State: open - Opened by guotong1988 almost 4 years ago
#27 - input_mask behavior
Issue -
State: open - Opened by AliOskooeiTR about 4 years ago
#26 - TPU support
Issue -
State: open - Opened by abcp4 over 4 years ago
#25 - recieves_context cause tensor mismatch error
Issue -
State: closed - Opened by WeForgot over 4 years ago
- 1 comment
#24 - results on wiki103 or enwiki8
Issue -
State: open - Opened by yelongshen over 4 years ago
#23 - README Typo
Issue -
State: closed - Opened by rainmaker712 over 4 years ago
- 1 comment
#22 - Music Routing Transformer Colab
Issue -
State: open - Opened by asigalov61 over 4 years ago
- 1 comment
#21 - Building and training a RoutingTransformerEncDec from pre-trained RoutingTransformerLMs
Issue -
State: closed - Opened by AliOskooeiTR over 4 years ago
- 7 comments
#20 - LM slower than the encoder-decoder with the same depth and max_seq_len, window size
Issue -
State: open - Opened by AliOskooeiTR over 4 years ago
- 3 comments
#19 - Issue about input shape
Issue -
State: closed - Opened by guohanyang1994 over 4 years ago
- 1 comment
#18 - Usage for image generation
Issue -
State: closed - Opened by Hosein47 over 4 years ago
- 11 comments
#17 - Sequence length limited
Issue -
State: closed - Opened by guohanyang1994 over 4 years ago
- 14 comments
#16 - Report an error in the training example of enwik8_simple
Issue -
State: closed - Opened by guokr233 over 4 years ago
- 1 comment
#15 - When running the RoutingTransformerLM's example, there is an error that the tensor dimension does not match
Issue -
State: closed - Opened by guokr233 over 4 years ago
- 1 comment
#14 - Add ReZero and ScaleNorm support
Pull Request -
State: closed - Opened by tomweingarten almost 5 years ago
- 7 comments
#13 - MoE doesn't work with reversible layers
Issue -
State: closed - Opened by tomweingarten almost 5 years ago
- 2 comments
#12 - Batch size 1
Issue -
State: closed - Opened by matthew-jurewicz almost 5 years ago
- 2 comments
#11 - Long dependencies
Issue -
State: open - Opened by matthew-jurewicz almost 5 years ago
- 2 comments
#10 - normalize queries and keys before dot product
Pull Request -
State: closed - Opened by lucidrains about 5 years ago
#9 - Why doesn't AutoregressiveWrapper sum the encoder aux loss?
Issue -
State: closed - Opened by tomweingarten about 5 years ago
- 8 comments
#8 - What does autoregressive mean?
Issue -
State: closed - Opened by matthew-jurewicz about 5 years ago
- 8 comments
#7 - One-hot encoded input?
Issue -
State: closed - Opened by matthew-jurewicz about 5 years ago
- 4 comments
#6 - Missing key(s) in state_dict
Issue -
State: closed - Opened by epetros about 5 years ago
- 4 comments
#5 - AutoregressiveWrapper expects different input lengths based on type
Issue -
State: closed - Opened by tomweingarten about 5 years ago
- 3 comments
#4 - Encoder-decoder fails at KMeans attention
Issue -
State: closed - Opened by tomweingarten about 5 years ago
- 16 comments
#3 - use_evonorm no longer supported in PKM
Issue -
State: closed - Opened by tomweingarten about 5 years ago
- 1 comment
#2 - Fix top_p to define threshold similarly to top_k and not garble output.
Pull Request -
State: closed - Opened by tomweingarten about 5 years ago
- 1 comment
#1 - top_p returns wrong values and re-orders the data
Issue -
State: closed - Opened by tomweingarten about 5 years ago
- 7 comments