OpenLMLab/MOSS-RLHF issues and pull requests

#57 - Issue when merging llama with diff to generate English policy model

Issue - State: open - Opened by foxlf823 about 2 months ago

#56 - 关于 root square of kl divs 与 rewards 的线性关系

Issue - State: closed - Opened by shirosheep000 3 months ago

#55 - RM数据构造

Issue - State: open - Opened by tcxia 6 months ago - 1 comment

#54 - Has anyone compared this training framework to TRL?

Issue - State: open - Opened by StarrySeas1 6 months ago - 1 comment

#53 - 对第二篇论文中有些不明白的地方请教解惑

Issue - State: open - Opened by Obr00007576 6 months ago

#52 - 论文中提到在PPO流程中可以固定其他模型，先训练reward model直到value loss为0，请问这边具体是怎么进行训练的呢？

Issue - State: open - Opened by HCHCXY 7 months ago - 1 comment

#51 - Part2中meta dataset的生成

Issue - State: open - Opened by yata0 7 months ago - 1 comment

#50 - 训练集量级咨询

Issue - State: open - Opened by Macvh 7 months ago - 1 comment

#49 - PPOSFTDataset bug report和相关问题咨询

Issue - State: open - Opened by DZ9 8 months ago - 1 comment

#48 - 关于rm中lm loss计算的疑问

Issue - State: open - Opened by DZ9 8 months ago - 1 comment

#47 - adding citation of part 2

Pull Request - State: closed - Opened by fakerbaby 8 months ago

#46 - bash train_ppo_en.sh error

Issue - State: closed - Opened by robotzheng 8 months ago - 4 comments

#45 - 论文中rm对比学习训练方法疑问

Issue - State: open - Opened by yhhh777 8 months ago - 4 comments

#44 - Issues with using the released hh dataset.

Issue - State: open - Opened by jltchiu 8 months ago - 2 comments

#43 - 关于rm模型训练策略与损失函数

Issue - State: open - Opened by tonylin52 8 months ago - 12 comments

#42 - Clarification on MetaRM-optimization Implementation

Issue - State: open - Opened by Benjamin-eecs 8 months ago - 2 comments

#41 - release the code for training the reward model

Pull Request - State: closed - Opened by refrain-wbh 8 months ago

#40 - [Question] Adaptive Margin

Issue - State: closed - Opened by eyuansu62 8 months ago - 3 comments

#39 - 请问目前支持基座模型使用Mistral-7b吗

Issue - State: open - Opened by YijuGuo 8 months ago - 1 comment

#38 - 自有的底座模型，自有的SFT权重，重新训练RM，可行么

Issue - State: open - Opened by camposs1979 10 months ago - 1 comment

#37 - Why are you not releasing your reward model for english?

Issue - State: open - Opened by AmanSinghal927 10 months ago - 1 comment

#36 - Inference with SFT and Policy EN models

Issue - State: open - Opened by henrypapadatos 11 months ago - 1 comment

#35 - 请问下代码里的kl散度问题

Issue - State: open - Opened by rigorosyangffff 11 months ago - 1 comment

#34 - 合并权重问题

Issue - State: open - Opened by red-tie 12 months ago - 6 comments

#33 - 关于reward model的权重合并问题

Issue - State: open - Opened by HuipengXu 12 months ago - 1 comment

#32 - 资源占用问题

Issue - State: open - Opened by Ming-Di about 1 year ago - 3 comments

#31 - 关于reward model的部分的part 2有计划时间节点吗

Issue - State: open - Opened by SpongebBob about 1 year ago - 13 comments

#30 - Any benchmark vs SFT？

Issue - State: open - Opened by guotong1988 about 1 year ago - 2 comments

#29 - deepspeed的parameter_offload问题

Issue - State: closed - Opened by LiangZhuuu about 1 year ago - 1 comment

#28 - PPO显存占用问题

Issue - State: closed - Opened by LiangZhuuu about 1 year ago

#27 - PPO data en

Issue - State: open - Opened by borisshapa about 1 year ago - 1 comment

#26 - 关于ppo阶段，reward分数计算的问题

Issue - State: open - Opened by mengyanggithub about 1 year ago - 5 comments

#25 - typo

Issue - State: closed - Opened by chosenone75 about 1 year ago - 1 comment

#24 - 关于中文reward-model参数合并的问题

Issue - State: open - Opened by hannlp about 1 year ago - 4 comments

#23 - 关于配置环境

Issue - State: closed - Opened by zjutkarma about 1 year ago - 2 comments

#22 - reward model训练的哪些方面的能力

Issue - State: open - Opened by yuanhuachao about 1 year ago - 1 comment

#21 - 关于Reward model打分的一些疑惑

Issue - State: open - Opened by hannlp about 1 year ago - 12 comments

#20 - 英文的PPOdata

Issue - State: open - Opened by QYHcrossover about 1 year ago - 1 comment

#19 - Training on 8 Nvidia RTX A6000

Issue - State: open - Opened by Top34051 about 1 year ago - 1 comment

#18 - value model与reward model

Issue - State: open - Opened by KUANWB about 1 year ago - 2 comments

#17 - PPO训练稳定性问题

Issue - State: open - Opened by hust-kevin about 1 year ago - 5 comments

#16 - 训练reward model的脚本

Issue - State: open - Opened by wangzhao88 about 1 year ago - 3 comments

#15 - reward_model准确率

Issue - State: open - Opened by mingrenbuke about 1 year ago - 1 comment

#14 - Training script of reward model

Issue - State: closed - Opened by zwhe99 about 1 year ago - 2 comments

#13 - Technical report PART 2

Issue - State: open - Opened by snowkcon about 1 year ago - 3 comments

#12 - 内存占用大问题

Issue - State: closed - Opened by QYHcrossover about 1 year ago - 2 comments

#11 - Reward Model

Issue - State: open - Opened by Cyber-Axe about 1 year ago - 2 comments

#10 - 关于reward model

Issue - State: closed - Opened by skepsun about 1 year ago - 5 comments

#9 - support lora training

Issue - State: closed - Opened by akk-123 about 1 year ago - 1 comment

#8 - Can I run this pipeline on A100-40GB?

Issue - State: closed - Opened by zwhe99 about 1 year ago - 4 comments

#7 - 用于PPO训练的数据结构

Issue - State: closed - Opened by Arain-sh about 1 year ago - 2 comments

#6 - The release of reward model training code?

Issue - State: closed - Opened by hejujie about 1 year ago - 3 comments

#5 - Make changes to the "main" branch to ensure that it is compatible with the special prompt pattern in our English version.

Pull Request - State: closed - Opened by fakerbaby about 1 year ago

#4 - make main adapt to EN version prompt

Pull Request - State: closed - Opened by fakerbaby about 1 year ago

#3 - Release of the Chinese Dataset.

Issue - State: closed - Opened by George-Chia about 1 year ago - 1 comment

#2 - PPO-max 对比原始PPO 的效果

Issue - State: closed - Opened by hywchina about 1 year ago - 8 comments

#1 - docs: add setup in README.md

Pull Request - State: closed - Opened by CiaranZhou about 1 year ago

GitHub / OpenLMLab/MOSS-RLHF issues and pull requests