OpenRLHF/OpenRLHF issues and pull requests

#542 - Questions About Reward Model

Issue - State: open - Opened by hashword0428 5 days ago - 1 comment

#541 - grad_accum with broadcast and zero3

Issue - State: open - Opened by thehir0 6 days ago

#540 - Use worker_cls when vLLM version > 0.6.4.post1

Pull Request - State: closed - Opened by HollowMan6 6 days ago - 1 comment

#539 - PPO训练中的packing

Issue - State: open - Opened by Hanqer 6 days ago

#537 - Empty all the init files + clean up unused imports

Pull Request - State: closed - Opened by HollowMan6 7 days ago - 1 comment

#536 - [Relanding] Add support when RAY_EXPERIMENTAL_NOSET_*_VISIBLE_DEVICES is set

Pull Request - State: closed - Opened by HollowMan6 7 days ago - 7 comments

#534 - Loosen dependency version requirements

Pull Request - State: closed - Opened by fzyzcjy 7 days ago - 3 comments

#533 - _broadcast_to_vllm 在大规模训练中成为瓶颈

Issue - State: open - Opened by Wraythh 7 days ago - 3 comments

#524 - Add support when RAY_EXPERIMENTAL_NOSET_*_VISIBLE_DEVICES is set

Pull Request - State: closed - Opened by HollowMan6 16 days ago - 7 comments

#523 - Dataloader Errors using HF Models

Issue - State: open - Opened by wilkincr 16 days ago - 1 comment

#523 - Dataloader Errors using HF Models

Issue - State: open - Opened by wilkincr 16 days ago - 1 comment

#522 - why prm loss use cross entropy not reward?

Issue - State: open - Opened by yiyepiaoling0715 16 days ago - 1 comment

#522 - why prm loss use cross entropy not reward?

Issue - State: open - Opened by yiyepiaoling0715 16 days ago - 1 comment

#521 - Fix serve_rm hanging on long input

Pull Request - State: closed - Opened by cemiu 16 days ago

#520 - Gemma2 Ray vllm

Issue - State: open - Opened by thehir0 17 days ago - 3 comments

#519 - OOM with Llama3-8b on 8*H100

Issue - State: open - Opened by sharptcode 19 days ago - 1 comment

#518 - Raise warning on faulty input template

Pull Request - State: closed - Opened by cemiu 20 days ago - 1 comment

#517 - reward变为0

Issue - State: open - Opened by anoxia-1 20 days ago - 1 comment

#516 - 资源不能被调度

Issue - State: closed - Opened by BeerTai 21 days ago

#515 - Support RLOO

Pull Request - State: open - Opened by zhuzilin 21 days ago - 4 comments

#514 - [DRAFT] Add group_norm in train_ppo.py and train_ppo_ray.py

Pull Request - State: open - Opened by xiaoxigua999 23 days ago - 3 comments

#513 - Add reinforce algorithm in train_ppo.py and train_ppo_ray.py

Pull Request - State: closed - Opened by xiaoxigua999 23 days ago - 1 comment

#512 - fix interactive_chat

Pull Request - State: closed - Opened by cemiu 24 days ago

#511 - Allow arbitrary number of vllm engines

Pull Request - State: closed - Opened by zhuzilin 24 days ago - 2 comments

#510 - PRM loss 疑问

Issue - State: open - Opened by EthanChen1234 25 days ago - 3 comments

#509 - PRM, loss nan

Issue - State: closed - Opened by EthanChen1234 25 days ago - 2 comments

#508 - DPO越训显存占用越大直到爆显存

Issue - State: open - Opened by Cerberous 25 days ago - 5 comments

#507 - Send all prompts to vllm to enhance performance

Pull Request - State: closed - Opened by zhuzilin 25 days ago

#506 - How to load and save best model at the end of training?

Issue - State: closed - Opened by TangJiakai 25 days ago - 1 comment

#505 - fix interactive chat

Pull Request - State: closed - Opened by LYMDLUT 26 days ago - 5 comments

#504 - Gather feature!

Issue - State: closed - Opened by TangJiakai 26 days ago - 4 comments

#503 - 请问支持对于多模态模型进行偏好训练吗？

Issue - State: open - Opened by bonre 26 days ago - 1 comment

#502 - Support PRM with soft labels and change PRM dataset format

Pull Request - State: closed - Opened by zhuzilin 26 days ago

#501 - assert state_dict_keys.issubset( [rank0]: AssertionError: mismatch keys

Issue - State: open - Opened by anoxia-1 27 days ago - 1 comment

#500 - 如何使用多机多卡训练 70B PRM？

Issue - State: open - Opened by banksy23 27 days ago - 1 comment

#499 - [RFC] Modularizing Sample Generation with Rating in PPO for Flexible RLHF Pipelines

Issue - State: closed - Opened by zhuzilin 30 days ago - 6 comments
Labels: enhancement

#498 - Support for PPO for PRM?

Issue - State: open - Opened by ljb121002 about 1 month ago - 1 comment
Labels: enhancement

#497 - 多卡加载模型速度显著变慢

Issue - State: closed - Opened by fingertap about 1 month ago - 1 comment

#496 - Qwen2-7B的输出用Qwen2-1.5B计算logp的时候报错

Issue - State: open - Opened by ZexuSun about 1 month ago - 1 comment

#495 - 关于_ds_init_train_model的疑问

Issue - State: closed - Opened by BeerTai about 1 month ago - 2 comments

#494 - 关于RM添加value head的疑问

Issue - State: closed - Opened by Gikiman about 1 month ago - 2 comments

#493 - Replace deprecated/removed transformers.deepspeed module

Pull Request - State: closed - Opened by HollowMan6 about 1 month ago

#492 - 多机多卡下开启vllm engine会卡住

Issue - State: closed - Opened by CPFLAME about 1 month ago - 2 comments

#491 - gradient accum

Issue - State: closed - Opened by longshuicui about 1 month ago - 2 comments

#490 - 请问支持 PPO 过程中使用 PRM 而不是 ORM 嘛？

Issue - State: open - Opened by banksy23 about 1 month ago - 1 comment

#489 - This discussion starts from this PR: https://github.com/OpenRLHF/OpenRLHF/pull/477

Issue - State: closed - Opened by ZetangForward about 1 month ago

#488 - Can openrlhf support using soft label during prm training process?

Issue - State: closed - Opened by banksy23 about 1 month ago - 2 comments
Labels: enhancement

#487 - [RFC] Support SGLang generation in RLHF

Issue - State: open - Opened by hijkzzz about 1 month ago - 1 comment
Labels: enhancement

#486 - The `_get_reward_model` function has issues when loading an MoE Reward Model (e.g., ArmoRM).

Issue - State: closed - Opened by Vance0124 about 1 month ago - 1 comment

#485 - PRM training supportting Qwen Model Series

Issue - State: closed - Opened by xiechengmude about 1 month ago - 2 comments

#484 - fix packing_samples in NaiveExperienceMaker

Pull Request - State: closed - Opened by zmzhang2000 about 1 month ago

#483 - support grpo training v2

Pull Request - State: closed - Opened by LSX-Sneakerprogrammer about 1 month ago - 2 comments

#482 - Revert "Merge Ring Attention into SFT Trainer"

Pull Request - State: closed - Opened by zhuzilin about 1 month ago

#480 - fix packing_samples in NaiveExperienceMaker

Pull Request - State: closed - Opened by zmzhang2000 about 1 month ago - 2 comments

#479 - 关于iterative_dpo的问题

Issue - State: closed - Opened by BeerTai about 1 month ago - 1 comment

#478 - [WIP] Add REINFORCE Leave one out (RLOO) to train_ppo_ray

Pull Request - State: closed - Opened by zhuzilin about 1 month ago - 1 comment

#477 - Merge Ring Attention into SFT Trainer

Pull Request - State: closed - Opened by ZetangForward about 1 month ago - 5 comments

#476 - Support non negative kl divergence approximation

Pull Request - State: closed - Opened by zhuzilin about 1 month ago

#475 - Add temperature config for train_ppo_ray

Pull Request - State: closed - Opened by zhuzilin about 1 month ago

#474 - Upload experience_maker perf status

Pull Request - State: closed - Opened by zhuzilin about 1 month ago

#473 - remove unnecessary softmax in prm loss

Pull Request - State: closed - Opened by catqaq about 1 month ago - 1 comment

#472 - How do you connect different models using Ray.

Issue - State: open - Opened by zpcalan about 1 month ago - 3 comments

#471 - 是否支持在Ascend上面的PPO训练

Issue - State: closed - Opened by wphtrying about 1 month ago - 1 comment

#470 - PPO training stuck for Llama-3.1

Issue - State: open - Opened by zhenghaoxu-gatech about 1 month ago - 1 comment

#469 - Generation temperature for train_ppo_ray

Issue - State: closed - Opened by zhenghaoxu-gatech about 1 month ago - 1 comment

#468 - Context Parallel Failded for Modified SFT Trainer

Issue - State: closed - Opened by ZetangForward about 1 month ago - 5 comments

#467 - Unnecessary logprob computation in actor.forward

Issue - State: open - Opened by zkshan2002 about 1 month ago - 1 comment

#466 - Separate the rollout generation and advantage calculation

Pull Request - State: closed - Opened by zhuzilin about 2 months ago - 2 comments

#465 - DPO loss mask computation

Issue - State: closed - Opened by zkshan2002 about 2 months ago - 2 comments

#464 - 请问我想将框架更改为支持PRM的多步训练，而不是ORM的单步训练，是否能实现，应该改哪些部分？

Issue - State: closed - Opened by Gikiman about 2 months ago - 3 comments

#463 - Move the n_samples_per_prompt into replay buffer

Pull Request - State: closed - Opened by zhuzilin about 2 months ago

#462 - Change pg_options param into backend_options in _new_process_group_helper for PyTorch version greater than 2.6

Pull Request - State: closed - Opened by HollowMan6 about 2 months ago - 4 comments

#461 - Support remote_rm_fn when using packing_samples in ppo

Pull Request - State: closed - Opened by zhuzilin about 2 months ago

#460 - 使用adam_offload后，训练完模型save_model时报错

Issue - State: open - Opened by pythonla about 2 months ago - 3 comments

#459 - Inconsistency between micro_train_batch_size and train_batch_size

Issue - State: closed - Opened by zkshan2002 about 2 months ago - 1 comment

#458 - [BUG] fix _max_steps not initialized bug

Pull Request - State: closed - Opened by BeingGod about 2 months ago - 1 comment

#457 - fixed the missing value_head_prefix

Pull Request - State: closed - Opened by ChenmienTan about 2 months ago - 1 comment

#456 - 知识蒸馏结果复现

Issue - State: open - Opened by jinchenyu about 2 months ago - 6 comments

#455 - Add grpo trainer

Pull Request - State: closed - Opened by LSX-Sneakerprogrammer about 2 months ago - 14 comments

#454 - Can't save model

Issue - State: closed - Opened by LZY-the-boys about 2 months ago - 3 comments

#453 - bug with max_steps

Issue - State: closed - Opened by LZY-the-boys about 2 months ago

#452 - questions on the training configuration

Issue - State: closed - Opened by WayXG 2 months ago - 1 comment

#451 - add tensorboard for local use

Pull Request - State: closed - Opened by catqaq 2 months ago - 2 comments

#450 - Feature: Concurrent support of remote RM

Issue - State: closed - Opened by catqaq 2 months ago

#449 - Support packing_samples for ppo with ray

Pull Request - State: closed - Opened by zhuzilin 2 months ago - 2 comments

#448 - fix bug in CriticModel

Pull Request - State: closed - Opened by zhuzilin 2 months ago

#447 - Fix output of packing data of RewardModel and CriticModel

Pull Request - State: closed - Opened by zhuzilin 2 months ago

#446 - add --use_linger_kernel

Pull Request - State: closed - Opened by xiaoxigua999 2 months ago

#445 - Fix lm_head.weight in save_model

Pull Request - State: closed - Opened by zmzhang2000 2 months ago

#444 - Add context parallel to reward model

Pull Request - State: closed - Opened by zhuzilin 2 months ago

#443 - support custom cls_class

Pull Request - State: closed - Opened by xiaoxigua999 2 months ago

#442 - Add PRM training with hard estimation

Pull Request - State: closed - Opened by zhuzilin 2 months ago - 2 comments

#441 - Why ZeRO-3 is only supported when vLLM enabled

Issue - State: closed - Opened by liuxsh9 2 months ago - 2 comments

#440 - I noticed a few calls to the get_tokenizer function in the code, but the return values were not being captured. What is the purpose of this function?

Issue - State: closed - Opened by pagepal666 3 months ago - 1 comment

#439 - Add context parallel to DPO

Pull Request - State: closed - Opened by zhuzilin 3 months ago

#438 - only import bitsandbytes when necessary

Pull Request - State: closed - Opened by zhuzilin 3 months ago

#437 - why advantage calculate ops [::-1]

Issue - State: closed - Opened by DavideHe 3 months ago

#436 - Lora merge error after dpo training with lora.

Issue - State: open - Opened by KaedinLian 3 months ago - 3 comments

#435 - 期望能支持序列并行（sequence_parallel）

Issue - State: closed - Opened by kangyishuai 3 months ago - 3 comments

#434 - train_knowledge_distillation.sh 脚本无法运行

Issue - State: closed - Opened by Rookie-Kai 3 months ago - 2 comments

GitHub / OpenRLHF/OpenRLHF issues and pull requests