Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / RLHFlow/Online-RLHF issues and pull requests

#26 - Update README.md

Pull Request - State: closed - Opened by ElegantLin 8 days ago

#25 - add v2 models

Pull Request - State: closed - Opened by xypan0 8 days ago

#24 - SFT training objective

Issue - State: open - Opened by ljb121002 25 days ago - 3 comments

#23 - Negative reward when serving ArmoRM-Llama3-8B-v0.1

Issue - State: open - Opened by maoliyuan 3 months ago - 4 comments

#22 - Question about CUDA/NVCC setups

Issue - State: open - Opened by rqzhangberkeley 3 months ago - 1 comment

#21 - Question about the iteration dataset (information leakage)?

Issue - State: closed - Opened by hhhhzzzzz 3 months ago - 8 comments

#20 - Questions about Nectar Datasets

Issue - State: open - Opened by XinZhao0211 3 months ago - 4 comments

#19 - pip's dependency conflict: accelerate

Issue - State: closed - Opened by liwd190019 3 months ago - 2 comments

#18 - Reference policy ablations

Issue - State: closed - Opened by yesiam-png 4 months ago - 9 comments

#17 - Phi3 has a nearly constant DPO loss of 0.69xx

Issue - State: open - Opened by Arnav0400 4 months ago - 6 comments

#16 - large max_steps?

Issue - State: closed - Opened by hunterlang 4 months ago - 1 comment

#15 - One question about the loss function given a gold reward model

Issue - State: closed - Opened by srzer 5 months ago - 2 comments

#14 - numpy version and transformers version

Issue - State: closed - Opened by WayXG 5 months ago - 1 comment

#13 - More RLHF algorithms in the implementation

Issue - State: closed - Opened by WayXG 5 months ago - 1 comment

#12 - question about dpo dataset

Issue - State: closed - Opened by LiuChen19960902 5 months ago - 1 comment

#11 - Distributed training in stage 3.3 keeps hanging

Issue - State: closed - Opened by srzer 5 months ago - 2 comments

#10 - corrected max_model_len to be max_input_length

Pull Request - State: closed - Opened by eddyliu5 5 months ago - 2 comments

#9 - update the figure in readme

Issue - State: closed - Opened by WayXG 5 months ago - 1 comment

#8 - questions about dpo

Issue - State: closed - Opened by hong-xl 5 months ago - 5 comments

#7 - Iterative pipeline question

Issue - State: closed - Opened by matouk98 5 months ago - 4 comments

#6 - Model evaluation issue

Issue - State: closed - Opened by matouk98 5 months ago - 5 comments

#5 - Questions about training data during iterative DPO

Issue - State: closed - Opened by hong-xl 6 months ago - 3 comments

#4 - Fail to load weight from pair-preference-model-LLaMA3-8B

Issue - State: open - Opened by matouk98 6 months ago - 2 comments

#3 - Cannot Reproduce the DPO Checkpoint

Issue - State: closed - Opened by gesy17 6 months ago - 1 comment

#2 - How train sft on rtx4090?

Issue - State: closed - Opened by utrobinmv 6 months ago - 1 comment

#1 - Fix readme typo

Pull Request - State: closed - Opened by erjanmx 6 months ago - 1 comment