Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / liyuanlucasliu/transformer-clinic issues and pull requests
#27 - Position of residual connection in PreLN architecture is wrong
Issue -
State: closed - Opened by bilzard over 1 year ago
- 1 comment
#27 - Position of residual connection in PreLN architecture is wrong
Issue -
State: closed - Opened by bilzard over 1 year ago
- 1 comment
#26 - How to get the beta_{i,j} for each residual branch?
Issue -
State: open - Opened by SefaZeng about 2 years ago
#25 - Deepnet
Pull Request -
State: closed - Opened by LiyuanLucasLiu over 2 years ago
#25 - Deepnet
Pull Request -
State: closed - Opened by LiyuanLucasLiu over 2 years ago
#24 - Admin for 100L-100L model?
Issue -
State: closed - Opened by Vincent131499 over 2 years ago
- 1 comment
#24 - Admin for 100L-100L model?
Issue -
State: closed - Opened by Vincent131499 over 2 years ago
- 1 comment
#23 - Ensemble models
Issue -
State: closed - Opened by Vincent131499 almost 3 years ago
#23 - Ensemble models
Issue -
State: closed - Opened by Vincent131499 almost 3 years ago
#22 - How to add Radam to fairseq ?
Issue -
State: closed - Opened by KelleyYin over 3 years ago
- 1 comment
#21 - argdict
Issue -
State: closed - Opened by riosempre over 3 years ago
- 1 comment
#20 - Reimplement Admin in new fairseq but get bad valid loss
Issue -
State: closed - Opened by moonscar over 3 years ago
#19 - Question about the adaptive optimizer
Issue -
State: closed - Opened by chenwydj almost 4 years ago
- 1 comment
#18 - Difference of implementation from the original paper
Issue -
State: closed - Opened by wade3han almost 4 years ago
- 1 comment
#17 - `RuntimeError: expected scalar type Float but found Half` during the eval step
Issue -
State: closed - Opened by ruiningh almost 4 years ago
- 5 comments
#16 - Scripts for Post-LN in Figure 10?
Issue -
State: closed - Opened by zhuchen03 almost 4 years ago
- 1 comment
#15 - Is wmt14en-fr.sh missing in pre-process dir?
Issue -
State: closed - Opened by lvzaihefang almost 4 years ago
- 1 comment
#14 - wmt_en_de admin: Function 'SoftmaxBackward' returned nan values in its 0th output.
Issue -
State: closed - Opened by sshleifer almost 4 years ago
- 8 comments
#13 - tmp_weight is not defined
Issue -
State: closed - Opened by sshleifer almost 4 years ago
- 4 comments
#12 - IWSLT'14 Results
Issue -
State: closed - Opened by villmow almost 4 years ago
- 1 comment
#10 - Update README.md
Pull Request -
State: closed - Opened by LiyuanLucasLiu about 4 years ago
#9 - Post-LN with 12-12 is trained ok, but 12-3 diverge
Issue -
State: closed - Opened by ZhenYangIACAS about 4 years ago
- 9 comments
#8 - How to make sure that only performing one step forward pass in profiling phase?
Issue -
State: closed - Opened by ZhenYangIACAS about 4 years ago
- 1 comment
#7 - is "tmp_weight" in transformer_layer.py useless?
Issue -
State: closed - Opened by zherowolf about 4 years ago
- 3 comments
#6 - Details of total batch size
Issue -
State: closed - Opened by luofuli about 4 years ago
- 1 comment
#5 - Do the embedding layer's layernorm parameters need to be reparameterized accordingly?
Issue -
State: closed - Opened by gotobelieve about 4 years ago
- 1 comment
#4 - Can I use a pre-trained model to initialize the model?
Issue -
State: closed - Opened by luofuli about 4 years ago
- 1 comment
#3 - Is the "attention_ratio_change" and "fc_ratio_change" trainable or not?
Issue -
State: closed - Opened by gotobelieve over 4 years ago
- 2 comments
#2 - remove debug parameter
Pull Request -
State: closed - Opened by LiyuanLucasLiu over 4 years ago
#1 - whta's the meaning of 'adaptive-scale' argument?
Issue -
State: closed - Opened by gotobelieve over 4 years ago
- 1 comment