liyuanlucasliu/transformer-clinic issues and pull requests

#27 - Position of residual connection in PreLN architecture is wrong

Issue - State: closed - Opened by bilzard almost 2 years ago - 1 comment

#27 - Position of residual connection in PreLN architecture is wrong

Issue - State: closed - Opened by bilzard almost 2 years ago - 1 comment

#26 - How to get the beta_{i,j} for each residual branch?

Issue - State: open - Opened by SefaZeng over 2 years ago

#25 - Deepnet

Pull Request - State: closed - Opened by LiyuanLucasLiu almost 3 years ago

#25 - Deepnet

Pull Request - State: closed - Opened by LiyuanLucasLiu almost 3 years ago

#24 - Admin for 100L-100L model？

Issue - State: closed - Opened by Vincent131499 almost 3 years ago - 1 comment

#24 - Admin for 100L-100L model？

Issue - State: closed - Opened by Vincent131499 almost 3 years ago - 1 comment

#23 - Ensemble models

Issue - State: closed - Opened by Vincent131499 about 3 years ago

#23 - Ensemble models

Issue - State: closed - Opened by Vincent131499 about 3 years ago

#22 - How to add Radam to fairseq ?

Issue - State: closed - Opened by KelleyYin almost 4 years ago - 1 comment

#21 - argdict

Issue - State: closed - Opened by riosempre almost 4 years ago - 1 comment

#20 - Reimplement Admin in new fairseq but get bad valid loss

Issue - State: closed - Opened by moonscar almost 4 years ago

#19 - Question about the adaptive optimizer

Issue - State: closed - Opened by chenwydj about 4 years ago - 1 comment

#18 - Difference of implementation from the original paper

Issue - State: closed - Opened by wade3han about 4 years ago - 1 comment

#17 - `RuntimeError: expected scalar type Float but found Half` during the eval step

Issue - State: closed - Opened by ruiningh about 4 years ago - 5 comments

#16 - Scripts for Post-LN in Figure 10?

Issue - State: closed - Opened by zhuchen03 about 4 years ago - 1 comment

#15 - Is wmt14en-fr.sh missing in pre-process dir?

Issue - State: closed - Opened by lvzaihefang about 4 years ago - 1 comment

#14 - wmt_en_de admin: Function 'SoftmaxBackward' returned nan values in its 0th output.

Issue - State: closed - Opened by sshleifer about 4 years ago - 8 comments

#13 - tmp_weight is not defined

Issue - State: closed - Opened by sshleifer about 4 years ago - 4 comments

#12 - IWSLT'14 Results

Issue - State: closed - Opened by villmow over 4 years ago - 1 comment

#10 - Update README.md

Pull Request - State: closed - Opened by LiyuanLucasLiu over 4 years ago

#9 - Post-LN with 12-12 is trained ok, but 12-3 diverge

Issue - State: closed - Opened by ZhenYangIACAS over 4 years ago - 9 comments

#8 - How to make sure that only performing one step forward pass in profiling phase?

Issue - State: closed - Opened by ZhenYangIACAS over 4 years ago - 1 comment

#7 - is "tmp_weight" in transformer_layer.py useless?

Issue - State: closed - Opened by zherowolf over 4 years ago - 3 comments

#6 - Details of total batch size

Issue - State: closed - Opened by luofuli over 4 years ago - 1 comment

#5 - Do the embedding layer's layernorm parameters need to be reparameterized accordingly?

Issue - State: closed - Opened by gotobelieve over 4 years ago - 1 comment

#4 - Can I use a pre-trained model to initialize the model?

Issue - State: closed - Opened by luofuli over 4 years ago - 1 comment

#3 - Is the "attention_ratio_change" and "fc_ratio_change" trainable or not?

Issue - State: closed - Opened by gotobelieve over 4 years ago - 2 comments

#2 - remove debug parameter

Pull Request - State: closed - Opened by LiyuanLucasLiu over 4 years ago

#1 - whta's the meaning of 'adaptive-scale' argument?

Issue - State: closed - Opened by gotobelieve over 4 years ago - 1 comment

Ecosyste.ms: Issues

GitHub / liyuanlucasliu/transformer-clinic issues and pull requests

#27 - Position of residual connection in PreLN architecture is wrong

#27 - Position of residual connection in PreLN architecture is wrong

#26 - How to get the beta_{i,j} for each residual branch?

#25 - Deepnet

#25 - Deepnet

#24 - Admin for 100L-100L model？

#24 - Admin for 100L-100L model？

#23 - Ensemble models

#23 - Ensemble models

#22 - How to add Radam to fairseq ?

#21 - argdict

#20 - Reimplement Admin in new fairseq but get bad valid loss

#19 - Question about the adaptive optimizer

#18 - Difference of implementation from the original paper

#17 - `RuntimeError: expected scalar type Float but found Half` during the eval step

#16 - Scripts for Post-LN in Figure 10?

#15 - Is wmt14en-fr.sh missing in pre-process dir?

#14 - wmt_en_de admin: Function 'SoftmaxBackward' returned nan values in its 0th output.

#13 - tmp_weight is not defined

#12 - IWSLT'14 Results

#10 - Update README.md

#9 - Post-LN with 12-12 is trained ok, but 12-3 diverge

#8 - How to make sure that only performing one step forward pass in profiling phase?

#7 - is "tmp_weight" in transformer_layer.py useless?

#6 - Details of total batch size

#5 - Do the embedding layer's layernorm parameters need to be reparameterized accordingly?

#4 - Can I use a pre-trained model to initialize the model?

#3 - Is the "attention_ratio_change" and "fc_ratio_change" trainable or not?

#2 - remove debug parameter

#1 - whta's the meaning of 'adaptive-scale' argument?