Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / OpenLMLab/MOSS-RLHF issues and pull requests
#57 - Issue when merging llama with diff to generate English policy model
Issue -
State: open - Opened by foxlf823 about 2 months ago
#56 - 关于 root square of kl divs 与 rewards 的线性关系
Issue -
State: closed - Opened by shirosheep000 3 months ago
#55 - RM数据构造
Issue -
State: open - Opened by tcxia 6 months ago
- 1 comment
#54 - Has anyone compared this training framework to TRL?
Issue -
State: open - Opened by StarrySeas1 6 months ago
- 1 comment
#53 - 对第二篇论文中有些不明白的地方请教解惑
Issue -
State: open - Opened by Obr00007576 6 months ago
#52 - 论文中提到在PPO流程中可以固定其他模型,先训练reward model直到value loss为0,请问这边具体是怎么进行训练的呢?
Issue -
State: open - Opened by HCHCXY 7 months ago
- 1 comment
#51 - Part2中meta dataset的生成
Issue -
State: open - Opened by yata0 7 months ago
- 1 comment
#50 - 训练集量级咨询
Issue -
State: open - Opened by Macvh 7 months ago
- 1 comment
#49 - PPOSFTDataset bug report和相关问题咨询
Issue -
State: open - Opened by DZ9 8 months ago
- 1 comment
#48 - 关于rm中lm loss计算的疑问
Issue -
State: open - Opened by DZ9 8 months ago
- 1 comment
#47 - adding citation of part 2
Pull Request -
State: closed - Opened by fakerbaby 8 months ago
#46 - bash train_ppo_en.sh error
Issue -
State: closed - Opened by robotzheng 8 months ago
- 4 comments
#45 - 论文中rm对比学习训练方法疑问
Issue -
State: open - Opened by yhhh777 8 months ago
- 4 comments
#44 - Issues with using the released hh dataset.
Issue -
State: open - Opened by jltchiu 8 months ago
- 2 comments
#43 - 关于rm模型训练策略与损失函数
Issue -
State: open - Opened by tonylin52 8 months ago
- 12 comments
#42 - Clarification on MetaRM-optimization Implementation
Issue -
State: open - Opened by Benjamin-eecs 8 months ago
- 2 comments
#41 - release the code for training the reward model
Pull Request -
State: closed - Opened by refrain-wbh 8 months ago
#40 - [Question] Adaptive Margin
Issue -
State: closed - Opened by eyuansu62 8 months ago
- 3 comments
#39 - 请问目前支持基座模型使用Mistral-7b吗
Issue -
State: open - Opened by YijuGuo 8 months ago
- 1 comment
#38 - 自有的底座模型,自有的SFT权重,重新训练RM,可行么
Issue -
State: open - Opened by camposs1979 10 months ago
- 1 comment
#37 - Why are you not releasing your reward model for english?
Issue -
State: open - Opened by AmanSinghal927 10 months ago
- 1 comment
#36 - Inference with SFT and Policy EN models
Issue -
State: open - Opened by henrypapadatos 11 months ago
- 1 comment
#35 - 请问下代码里的kl散度问题
Issue -
State: open - Opened by rigorosyangffff 11 months ago
- 1 comment
#34 - 合并权重问题
Issue -
State: open - Opened by red-tie 12 months ago
- 6 comments
#33 - 关于reward model的权重合并问题
Issue -
State: open - Opened by HuipengXu 12 months ago
- 1 comment
#32 - 资源占用问题
Issue -
State: open - Opened by Ming-Di about 1 year ago
- 3 comments
#31 - 关于reward model的部分的part 2有计划时间节点吗
Issue -
State: open - Opened by SpongebBob about 1 year ago
- 13 comments
#30 - Any benchmark vs SFT?
Issue -
State: open - Opened by guotong1988 about 1 year ago
- 2 comments
#29 - deepspeed的parameter_offload问题
Issue -
State: closed - Opened by LiangZhuuu about 1 year ago
- 1 comment
#28 - PPO显存占用问题
Issue -
State: closed - Opened by LiangZhuuu about 1 year ago
#27 - PPO data en
Issue -
State: open - Opened by borisshapa about 1 year ago
- 1 comment
#26 - 关于ppo阶段,reward分数计算的问题
Issue -
State: open - Opened by mengyanggithub about 1 year ago
- 5 comments
#25 - typo
Issue -
State: closed - Opened by chosenone75 about 1 year ago
- 1 comment
#24 - 关于中文reward-model参数合并的问题
Issue -
State: open - Opened by hannlp about 1 year ago
- 4 comments
#23 - 关于配置环境
Issue -
State: closed - Opened by zjutkarma about 1 year ago
- 2 comments
#22 - reward model训练的哪些方面的能力
Issue -
State: open - Opened by yuanhuachao about 1 year ago
- 1 comment
#21 - 关于Reward model打分的一些疑惑
Issue -
State: open - Opened by hannlp about 1 year ago
- 12 comments
#20 - 英文的PPOdata
Issue -
State: open - Opened by QYHcrossover about 1 year ago
- 1 comment
#19 - Training on 8 Nvidia RTX A6000
Issue -
State: open - Opened by Top34051 about 1 year ago
- 1 comment
#18 - value model与reward model
Issue -
State: open - Opened by KUANWB about 1 year ago
- 2 comments
#17 - PPO训练稳定性问题
Issue -
State: open - Opened by hust-kevin about 1 year ago
- 5 comments
#16 - 训练reward model的脚本
Issue -
State: open - Opened by wangzhao88 about 1 year ago
- 3 comments
#15 - reward_model准确率
Issue -
State: open - Opened by mingrenbuke about 1 year ago
- 1 comment
#14 - Training script of reward model
Issue -
State: closed - Opened by zwhe99 about 1 year ago
- 2 comments
#13 - Technical report PART 2
Issue -
State: open - Opened by snowkcon about 1 year ago
- 3 comments
#12 - 内存占用大问题
Issue -
State: closed - Opened by QYHcrossover about 1 year ago
- 2 comments
#11 - Reward Model
Issue -
State: open - Opened by Cyber-Axe about 1 year ago
- 2 comments
#10 - 关于reward model
Issue -
State: closed - Opened by skepsun about 1 year ago
- 5 comments
#9 - support lora training
Issue -
State: closed - Opened by akk-123 about 1 year ago
- 1 comment
#8 - Can I run this pipeline on A100-40GB?
Issue -
State: closed - Opened by zwhe99 about 1 year ago
- 4 comments
#7 - 用于PPO训练的数据结构
Issue -
State: closed - Opened by Arain-sh about 1 year ago
- 2 comments
#6 - The release of reward model training code?
Issue -
State: closed - Opened by hejujie about 1 year ago
- 3 comments
#5 - Make changes to the "main" branch to ensure that it is compatible with the special prompt pattern in our English version.
Pull Request -
State: closed - Opened by fakerbaby about 1 year ago
#4 - make main adapt to EN version prompt
Pull Request -
State: closed - Opened by fakerbaby about 1 year ago
#3 - Release of the Chinese Dataset.
Issue -
State: closed - Opened by George-Chia about 1 year ago
- 1 comment
#2 - PPO-max 对比 原始PPO 的效果
Issue -
State: closed - Opened by hywchina about 1 year ago
- 8 comments
#1 - docs: add setup in README.md
Pull Request -
State: closed - Opened by CiaranZhou about 1 year ago