rlhflow/rlhf-reward-modeling issues and pull requests

#47 - Update gemma_two_head.py

Pull Request - State: closed - Opened by Lichang-Chen 2 months ago

#46 - Missing code for ODIN

Issue - State: open - Opened by maoliyuan 2 months ago - 1 comment

#45 - Update README.md

Pull Request - State: closed - Opened by Chenluye99 3 months ago

#44 - Update deepseek Top-1 acc on MATH

Pull Request - State: closed - Opened by hanningzhang 3 months ago

#43 - Update README.md of Deepseek Pass 1 acc

Pull Request - State: closed - Opened by hanningzhang 3 months ago

#42 - Rlhflow math

Pull Request - State: closed - Opened by WeiXiongUST 3 months ago

#41 - add experiment setup and results for the math prm

Pull Request - State: closed - Opened by hanningzhang 3 months ago

#40 - Rlhflow math: evaluation code and evaluation description in readme

Pull Request - State: closed - Opened by hanningzhang 3 months ago

#39 - Pixi package management; notebooks folders; quarto paper setup.

Pull Request - State: closed - Opened by professorwug 3 months ago

#38 - ODIN

Pull Request - State: closed - Opened by Lichang-Chen 3 months ago - 1 comment

#37 - Question regarding ARMO stage2-train code

Issue - State: open - Opened by RayWang-iat 4 months ago

#36 - stage1-train:RuntimeError: torch.cat(): expected a non-empty list of Tensors

Issue - State: closed - Opened by RayWang-iat 4 months ago

#35 - Armo-rm env set-up and data processing

Issue - State: open - Opened by MaxwellJryao 5 months ago - 1 comment

#34 - Add RRM augmentation

Pull Request - State: closed - Opened by TerenceLiu4444 5 months ago

#33 - Clarification on Reward Usage in DPO Training

Issue - State: open - Opened by vincezh2000 5 months ago - 1 comment

#32 - ArmoRM-Llama3-8B-v0.1's tokenizer is different from Meta-Llama-3-8B-Instruct's

Issue - State: closed - Opened by efsotr 5 months ago - 7 comments

#31 - Semi-Supervised Reward Modeling (SSRM)

Pull Request - State: closed - Opened by yifei-he 5 months ago

#30 - reproduce ArmoRM

Issue - State: closed - Opened by richhh520 5 months ago - 3 comments

#29 - preference dataset 404 not found

Issue - State: closed - Opened by wty500 6 months ago - 2 comments

#28 - Code to reproduce ArmoRM

Issue - State: closed - Opened by halfrot 6 months ago - 5 comments

#27 - Can I inquire about some training details about armo-rm？

Issue - State: closed - Opened by xiaotian917 6 months ago - 7 comments

#26 - Regarding the Gemma2 Reward Model Structure

Issue - State: open - Opened by Loong435 6 months ago - 2 comments

#25 - How to batch inference?

Issue - State: closed - Opened by AIR-hl 7 months ago

#24 - "Token pattern not found in the list" error

Issue - State: open - Opened by nshen7 7 months ago - 3 comments

#23 - How to finetune ARMO model with custom dataset?

Issue - State: closed - Opened by Helen-Cheung 7 months ago - 4 comments

#22 - Bradley-Terry model removes lm head while saving

Issue - State: open - Opened by Arnav0400 7 months ago - 1 comment

#21 - Training and evaluating for pair_pm model.

Issue - State: open - Opened by t-sifanwu 7 months ago - 5 comments

#20 - How do you implement SLic on pair_pm model?

Issue - State: open - Opened by t-sifanwu 8 months ago - 1 comment

#19 - preference_700K dataset's details?

Issue - State: closed - Opened by yechenzhi 8 months ago - 4 comments

#18 - environment set up issue

Issue - State: open - Opened by WayXG 8 months ago - 1 comment

#17 - tutorial to reproduce ArmoRM

Issue - State: closed - Opened by pluiez 8 months ago - 1 comment

#16 - question of chat templates

Issue - State: open - Opened by trueRosun 8 months ago - 6 comments

#15 - Code for Armo on Reward Bench

Issue - State: closed - Opened by philschmid 8 months ago - 4 comments

#14 - How to calculate the avg score of reward bench?

Issue - State: closed - Opened by eyuansu62 8 months ago - 2 comments

#13 - Low Safety Score for RM-Gemma-2B Model

Issue - State: closed - Opened by loss4Wang 9 months ago - 2 comments

#12 - can we say PM is better than BT?

Issue - State: closed - Opened by yechenzhi 9 months ago - 2 comments

#11 - quesion about the output

Issue - State: closed - Opened by yechenzhi 9 months ago - 1 comment

#10 - How to construct new pairs for adding to the dataset

Issue - State: closed - Opened by wlhgtc 9 months ago - 1 comment

#9 - Does pair-pm supports multi-turn conversation?

Issue - State: closed - Opened by heyzude 9 months ago - 2 comments

#8 - Cannot understant the code at README.md of pair-pm

Issue - State: closed - Opened by heyzude 9 months ago - 4 comments

#7 - Pairwise preference model dev

Pull Request - State: closed - Opened by WeiXiongUST 9 months ago

#6 - KeyError: 'input_ids_j' in training

Issue - State: closed - Opened by iseesaw 9 months ago - 2 comments

#5 - re-organize code

Pull Request - State: closed - Opened by WeiXiongUST 10 months ago

#4 - Update eval_bench_mark.py

Pull Request - State: closed - Opened by ZizhengYang 10 months ago - 2 comments

#3 - Update eval_bench_mark.py allow use bf16 or f32

Pull Request - State: closed - Opened by ZizhengYang 10 months ago

#2 - Cannot run the training script

Issue - State: closed - Opened by peter-peng-w 10 months ago - 1 comment

#1 - how to serve this model?

Issue - State: closed - Opened by jxgu1016 11 months ago - 1 comment

GitHub / rlhflow/rlhf-reward-modeling issues and pull requests