lucidrains/self-rewarding-lm-pytorch issues and pull requests

#32 - usage demo is not working

Issue - State: open - Opened by 652994331 3 months ago

#31 - What's the reference model for DPO?

Issue - State: closed - Opened by Draconda 10 months ago - 1 comment

#30 - OSError: [Errno 22] Invalid argument: 'preference_seq.memmap.npy'

Issue - State: open - Opened by Oloup 10 months ago

#29 - Fixed deep copy, shallow copy error and label mask error.

Pull Request - State: closed - Opened by Control-derek 11 months ago - 1 comment

#28 - Solves the problem that some variables are not declared

Pull Request - State: closed - Opened by Control-derek 11 months ago - 1 comment

#27 - Solves the problem that some variables are not declared

Pull Request - State: closed - Opened by Control-derek 11 months ago - 1 comment

#26 - add self.

Pull Request - State: closed - Opened by Control-derek 11 months ago - 1 comment

#25 - ModuleNotFoundError: No module named 'x_transformers'

Issue - State: open - Opened by mayankpathaklumiq 12 months ago - 1 comment

#24 - UnboundLocalError: local variable 'self_reward_model' referenced before assignment

Issue - State: closed - Opened by UbeCc 12 months ago - 3 comments

#23 - What changes should I make to apply the method on Llama2?

Issue - State: open - Opened by Labmem009 12 months ago

#21 - I encountered the following error when trying to run usage

Issue - State: open - Opened by Yanfors 12 months ago - 1 comment

#19 - Fix TypeError for is_valid_reward in SelfRewardDPOConfig

Pull Request - State: closed - Opened by ViswanathaReddyGajjala 12 months ago - 1 comment

#18 - TypeError: tuple indices must be integers or slices, not tuple

Issue - State: open - Opened by fakerybakery about 1 year ago - 1 comment

#17 - Update self_rewarding_lm_pytorch.py

Pull Request - State: closed - Opened by unaidedelf8777 about 1 year ago - 1 comment

#15 - RuntimeError: Placeholder storage has not been allocated on MPS device!

Issue - State: closed - Opened by fakerybakery about 1 year ago - 2 comments

#14 - Multiple GPUs

Issue - State: closed - Opened by fakerybakery about 1 year ago

#13 - Update self_rewarding_lm_pytorch.py

Pull Request - State: closed - Opened by Dyke-F about 1 year ago - 1 comment

#12 - Update spin.py

Pull Request - State: closed - Opened by Dyke-F about 1 year ago - 2 comments

#11 - Why use a custom sample function instead of original HuggingFace generate() function?

Issue - State: closed - Opened by scarydemon2 about 1 year ago - 1 comment

#10 - How to use HF Transformers model

Issue - State: open - Opened by fakerybakery about 1 year ago - 3 comments

#9 - Default `iteration` about SPIN. (Reward model~Policy model)

Issue - State: closed - Opened by KyujinHan about 1 year ago - 1 comment

#8 - run spin demo

Issue - State: closed - Opened by westlongtime about 1 year ago - 3 comments

#7 - The reward prompt is weak.

Issue - State: closed - Opened by Minami-su about 1 year ago - 6 comments

#5 - Update README.md

Pull Request - State: closed - Opened by eltociear about 1 year ago - 1 comment

#4 - Is this work in progress?

Issue - State: closed - Opened by jbdatascience about 1 year ago - 4 comments

#3 - Help with Setting up and running ?

Issue - State: closed - Opened by badboysm890 about 1 year ago - 1 comment

#1 - code and dataset？

Issue - State: closed - Opened by wanghao-007 about 1 year ago

Ecosyste.ms: Issues

GitHub / lucidrains/self-rewarding-lm-pytorch issues and pull requests

#32 - usage demo is not working

#31 - What's the reference model for DPO?

#30 - OSError: [Errno 22] Invalid argument: 'preference_seq.memmap.npy'

#29 - Fixed deep copy, shallow copy error and label mask error.

#28 - Solves the problem that some variables are not declared

#27 - Solves the problem that some variables are not declared

#26 - add self.

#25 - ModuleNotFoundError: No module named 'x_transformers'

#24 - UnboundLocalError: local variable 'self_reward_model' referenced before assignment

#23 - What changes should I make to apply the method on Llama2?

#21 - I encountered the following error when trying to run usage

#19 - Fix TypeError for is_valid_reward in SelfRewardDPOConfig

#18 - TypeError: tuple indices must be integers or slices, not tuple

#17 - Update self_rewarding_lm_pytorch.py

#15 - RuntimeError: Placeholder storage has not been allocated on MPS device!

#14 - Multiple GPUs

#13 - Update self_rewarding_lm_pytorch.py

#12 - Update spin.py

#11 - Why use a custom sample function instead of original HuggingFace generate() function?

#10 - How to use HF Transformers model

#9 - Default `iteration` about SPIN. (Reward model~Policy model)

#8 - run spin demo

#7 - The reward prompt is weak.

#5 - Update README.md

#4 - Is this work in progress?

#3 - Help with Setting up and running ?

#1 - code and dataset？