Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / RLHFlow/Online-RLHF issues and pull requests
#32 - Unexpected keyword argument 'beta' in DPOTrainer initialization
Issue -
State: open - Opened by YijuGuo about 2 months ago
#31 - Inquiry about the version of alignment-handbook used in the installation guide
Issue -
State: open - Opened by YijuGuo about 2 months ago
- 1 comment
#30 - Unable to install axolotl due to missing bitsandbytes==0.45.0 dependency
Issue -
State: open - Opened by YijuGuo about 2 months ago
- 2 comments
#29 - Questions About Data chosen Strategies
Issue -
State: open - Opened by nantenT 2 months ago
- 2 comments
#28 - About the results of vanilla DPO
Issue -
State: open - Opened by lucasliunju 3 months ago
- 1 comment
#27 - Reward-KL Comparison
Issue -
State: open - Opened by vincezh2000 3 months ago
- 1 comment
#26 - Update README.md
Pull Request -
State: closed - Opened by ElegantLin 3 months ago
#25 - add v2 models
Pull Request -
State: closed - Opened by xypan0 3 months ago
#24 - SFT training objective
Issue -
State: open - Opened by ljb121002 4 months ago
- 3 comments
#23 - Negative reward when serving ArmoRM-Llama3-8B-v0.1
Issue -
State: open - Opened by maoliyuan 5 months ago
- 4 comments
#22 - Question about CUDA/NVCC setups
Issue -
State: open - Opened by rqzhangberkeley 6 months ago
- 1 comment
#21 - Question about the iteration dataset (information leakage)?
Issue -
State: closed - Opened by hhhhzzzzz 6 months ago
- 8 comments
#20 - Questions about Nectar Datasets
Issue -
State: open - Opened by XinZhao0211 6 months ago
- 4 comments
#19 - pip's dependency conflict: accelerate
Issue -
State: closed - Opened by liwd190019 6 months ago
- 2 comments
#18 - Reference policy ablations
Issue -
State: closed - Opened by yesiam-png 7 months ago
- 9 comments
#17 - Phi3 has a nearly constant DPO loss of 0.69xx
Issue -
State: open - Opened by Arnav0400 7 months ago
- 6 comments
#16 - large max_steps?
Issue -
State: closed - Opened by hunterlang 7 months ago
- 1 comment
#15 - One question about the loss function given a gold reward model
Issue -
State: closed - Opened by srzer 8 months ago
- 2 comments
#14 - numpy version and transformers version
Issue -
State: closed - Opened by WayXG 8 months ago
- 1 comment
#13 - More RLHF algorithms in the implementation
Issue -
State: closed - Opened by WayXG 8 months ago
- 1 comment
#12 - question about dpo dataset
Issue -
State: closed - Opened by LiuChen19960902 8 months ago
- 1 comment
#11 - Distributed training in stage 3.3 keeps hanging
Issue -
State: closed - Opened by srzer 8 months ago
- 2 comments
#10 - corrected max_model_len to be max_input_length
Pull Request -
State: closed - Opened by eddyliu5 8 months ago
- 2 comments
#9 - update the figure in readme
Issue -
State: closed - Opened by WayXG 8 months ago
- 1 comment
#8 - questions about dpo
Issue -
State: closed - Opened by hong-xl 8 months ago
- 5 comments
#7 - Iterative pipeline question
Issue -
State: closed - Opened by matouk98 8 months ago
- 4 comments
#6 - Model evaluation issue
Issue -
State: closed - Opened by matouk98 8 months ago
- 5 comments
#5 - Questions about training data during iterative DPO
Issue -
State: closed - Opened by hong-xl 8 months ago
- 3 comments
#4 - Fail to load weight from pair-preference-model-LLaMA3-8B
Issue -
State: open - Opened by matouk98 8 months ago
- 2 comments
#3 - Cannot Reproduce the DPO Checkpoint
Issue -
State: closed - Opened by gesy17 9 months ago
- 1 comment
#2 - How train sft on rtx4090?
Issue -
State: closed - Opened by utrobinmv 9 months ago
- 1 comment
#1 - Fix readme typo
Pull Request -
State: closed - Opened by erjanmx 9 months ago
- 1 comment