Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / RLHFlow/Online-RLHF issues and pull requests
#26 - Update README.md
Pull Request -
State: closed - Opened by ElegantLin 8 days ago
#25 - add v2 models
Pull Request -
State: closed - Opened by xypan0 8 days ago
#24 - SFT training objective
Issue -
State: open - Opened by ljb121002 25 days ago
- 3 comments
#23 - Negative reward when serving ArmoRM-Llama3-8B-v0.1
Issue -
State: open - Opened by maoliyuan 3 months ago
- 4 comments
#22 - Question about CUDA/NVCC setups
Issue -
State: open - Opened by rqzhangberkeley 3 months ago
- 1 comment
#21 - Question about the iteration dataset (information leakage)?
Issue -
State: closed - Opened by hhhhzzzzz 3 months ago
- 8 comments
#20 - Questions about Nectar Datasets
Issue -
State: open - Opened by XinZhao0211 3 months ago
- 4 comments
#19 - pip's dependency conflict: accelerate
Issue -
State: closed - Opened by liwd190019 3 months ago
- 2 comments
#18 - Reference policy ablations
Issue -
State: closed - Opened by yesiam-png 4 months ago
- 9 comments
#17 - Phi3 has a nearly constant DPO loss of 0.69xx
Issue -
State: open - Opened by Arnav0400 4 months ago
- 6 comments
#16 - large max_steps?
Issue -
State: closed - Opened by hunterlang 4 months ago
- 1 comment
#15 - One question about the loss function given a gold reward model
Issue -
State: closed - Opened by srzer 5 months ago
- 2 comments
#14 - numpy version and transformers version
Issue -
State: closed - Opened by WayXG 5 months ago
- 1 comment
#13 - More RLHF algorithms in the implementation
Issue -
State: closed - Opened by WayXG 5 months ago
- 1 comment
#12 - question about dpo dataset
Issue -
State: closed - Opened by LiuChen19960902 5 months ago
- 1 comment
#11 - Distributed training in stage 3.3 keeps hanging
Issue -
State: closed - Opened by srzer 5 months ago
- 2 comments
#10 - corrected max_model_len to be max_input_length
Pull Request -
State: closed - Opened by eddyliu5 5 months ago
- 2 comments
#9 - update the figure in readme
Issue -
State: closed - Opened by WayXG 5 months ago
- 1 comment
#8 - questions about dpo
Issue -
State: closed - Opened by hong-xl 5 months ago
- 5 comments
#7 - Iterative pipeline question
Issue -
State: closed - Opened by matouk98 5 months ago
- 4 comments
#6 - Model evaluation issue
Issue -
State: closed - Opened by matouk98 5 months ago
- 5 comments
#5 - Questions about training data during iterative DPO
Issue -
State: closed - Opened by hong-xl 6 months ago
- 3 comments
#4 - Fail to load weight from pair-preference-model-LLaMA3-8B
Issue -
State: open - Opened by matouk98 6 months ago
- 2 comments
#3 - Cannot Reproduce the DPO Checkpoint
Issue -
State: closed - Opened by gesy17 6 months ago
- 1 comment
#2 - How train sft on rtx4090?
Issue -
State: closed - Opened by utrobinmv 6 months ago
- 1 comment
#1 - Fix readme typo
Pull Request -
State: closed - Opened by erjanmx 6 months ago
- 1 comment