Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / liyuanlucasliu/transformer-clinic issues and pull requests

#27 - Position of residual connection in PreLN architecture is wrong

Issue - State: closed - Opened by bilzard over 1 year ago - 1 comment

#27 - Position of residual connection in PreLN architecture is wrong

Issue - State: closed - Opened by bilzard over 1 year ago - 1 comment

#26 - How to get the beta_{i,j} for each residual branch?

Issue - State: open - Opened by SefaZeng about 2 years ago

#25 - Deepnet

Pull Request - State: closed - Opened by LiyuanLucasLiu over 2 years ago

#25 - Deepnet

Pull Request - State: closed - Opened by LiyuanLucasLiu over 2 years ago

#24 - Admin for 100L-100L model?

Issue - State: closed - Opened by Vincent131499 over 2 years ago - 1 comment

#24 - Admin for 100L-100L model?

Issue - State: closed - Opened by Vincent131499 over 2 years ago - 1 comment

#23 - Ensemble models

Issue - State: closed - Opened by Vincent131499 almost 3 years ago

#23 - Ensemble models

Issue - State: closed - Opened by Vincent131499 almost 3 years ago

#22 - How to add Radam to fairseq ?

Issue - State: closed - Opened by KelleyYin over 3 years ago - 1 comment

#21 - argdict

Issue - State: closed - Opened by riosempre over 3 years ago - 1 comment

#20 - Reimplement Admin in new fairseq but get bad valid loss

Issue - State: closed - Opened by moonscar over 3 years ago

#19 - Question about the adaptive optimizer

Issue - State: closed - Opened by chenwydj almost 4 years ago - 1 comment

#18 - Difference of implementation from the original paper

Issue - State: closed - Opened by wade3han almost 4 years ago - 1 comment

#17 - `RuntimeError: expected scalar type Float but found Half` during the eval step

Issue - State: closed - Opened by ruiningh almost 4 years ago - 5 comments

#16 - Scripts for Post-LN in Figure 10?

Issue - State: closed - Opened by zhuchen03 almost 4 years ago - 1 comment

#15 - Is wmt14en-fr.sh missing in pre-process dir?

Issue - State: closed - Opened by lvzaihefang almost 4 years ago - 1 comment

#14 - wmt_en_de admin: Function 'SoftmaxBackward' returned nan values in its 0th output.

Issue - State: closed - Opened by sshleifer almost 4 years ago - 8 comments

#13 - tmp_weight is not defined

Issue - State: closed - Opened by sshleifer almost 4 years ago - 4 comments

#12 - IWSLT'14 Results

Issue - State: closed - Opened by villmow almost 4 years ago - 1 comment

#10 - Update README.md

Pull Request - State: closed - Opened by LiyuanLucasLiu about 4 years ago

#9 - Post-LN with 12-12 is trained ok, but 12-3 diverge

Issue - State: closed - Opened by ZhenYangIACAS about 4 years ago - 9 comments

#7 - is "tmp_weight" in transformer_layer.py useless?

Issue - State: closed - Opened by zherowolf about 4 years ago - 3 comments

#6 - Details of total batch size

Issue - State: closed - Opened by luofuli about 4 years ago - 1 comment

#4 - Can I use a pre-trained model to initialize the model?

Issue - State: closed - Opened by luofuli about 4 years ago - 1 comment

#3 - Is the "attention_ratio_change" and "fc_ratio_change" trainable or not?

Issue - State: closed - Opened by gotobelieve over 4 years ago - 2 comments

#2 - remove debug parameter

Pull Request - State: closed - Opened by LiyuanLucasLiu over 4 years ago

#1 - whta's the meaning of 'adaptive-scale' argument?

Issue - State: closed - Opened by gotobelieve over 4 years ago - 1 comment