OpenRLHF/OpenRLHF issues and pull requests

#524 - Add support when RAY_EXPERIMENTAL_NOSET_*_VISIBLE_DEVICES is set

Pull Request - State: open - Opened by HollowMan6 10 days ago - 1 comment

#523 - Dataloader Errors using HF Models

Issue - State: open - Opened by wilkincr 10 days ago - 1 comment

#523 - Dataloader Errors using HF Models

Issue - State: open - Opened by wilkincr 10 days ago - 1 comment

#522 - why prm loss use cross entropy not reward?

Issue - State: open - Opened by yiyepiaoling0715 10 days ago - 1 comment

#522 - why prm loss use cross entropy not reward?

Issue - State: open - Opened by yiyepiaoling0715 10 days ago - 1 comment

#521 - Fix serve_rm hanging on long input

Pull Request - State: closed - Opened by cemiu 10 days ago

#520 - Gemma2 Ray vllm

Issue - State: open - Opened by thehir0 11 days ago - 3 comments

#519 - OOM with Llama3-8b on 8*H100

Issue - State: open - Opened by sharptcode 13 days ago - 1 comment

#518 - Raise warning on faulty input template

Pull Request - State: closed - Opened by cemiu 14 days ago - 1 comment

#517 - reward变为0

Issue - State: open - Opened by anoxia-1 15 days ago - 1 comment

#516 - 资源不能被调度

Issue - State: closed - Opened by BeerTai 16 days ago

#515 - Support RLOO

Pull Request - State: open - Opened by zhuzilin 16 days ago - 3 comments

#514 - [DRAFT] Add group_norm in train_ppo.py and train_ppo_ray.py

Pull Request - State: open - Opened by xiaoxigua999 17 days ago - 3 comments

#513 - Add reinforce algorithm in train_ppo.py and train_ppo_ray.py

Pull Request - State: closed - Opened by xiaoxigua999 17 days ago - 1 comment

#512 - fix interactive_chat

Pull Request - State: closed - Opened by cemiu 18 days ago

#511 - Allow arbitrary number of vllm engines

Pull Request - State: closed - Opened by zhuzilin 19 days ago - 2 comments

#510 - PRM loss 疑问

Issue - State: open - Opened by EthanChen1234 19 days ago - 3 comments

#509 - PRM, loss nan

Issue - State: closed - Opened by EthanChen1234 20 days ago - 2 comments

#508 - DPO越训显存占用越大直到爆显存

Issue - State: open - Opened by Cerberous 20 days ago - 4 comments

#507 - Send all prompts to vllm to enhance performance

Pull Request - State: closed - Opened by zhuzilin 20 days ago

#506 - How to load and save best model at the end of training?

Issue - State: closed - Opened by TangJiakai 20 days ago - 1 comment

#505 - fix interactive chat

Pull Request - State: closed - Opened by LYMDLUT 20 days ago - 5 comments

#504 - Gather feature!

Issue - State: closed - Opened by TangJiakai 21 days ago - 4 comments

#503 - 请问支持对于多模态模型进行偏好训练吗？

Issue - State: open - Opened by bonre 21 days ago - 1 comment

#502 - Support PRM with soft labels and change PRM dataset format

Pull Request - State: closed - Opened by zhuzilin 21 days ago

#501 - assert state_dict_keys.issubset( [rank0]: AssertionError: mismatch keys

Issue - State: open - Opened by anoxia-1 21 days ago - 1 comment

#500 - 如何使用多机多卡训练 70B PRM？

Issue - State: open - Opened by banksy23 21 days ago - 1 comment

#499 - [RFC] Modularizing Sample Generation with Rating in PPO for Flexible RLHF Pipelines

Issue - State: closed - Opened by zhuzilin 24 days ago - 6 comments
Labels: enhancement

#498 - Support for PPO for PRM?

Issue - State: open - Opened by ljb121002 25 days ago - 1 comment
Labels: enhancement

#497 - 多卡加载模型速度显著变慢

Issue - State: closed - Opened by fingertap 26 days ago - 1 comment

#496 - Qwen2-7B的输出用Qwen2-1.5B计算logp的时候报错

Issue - State: open - Opened by ZexuSun 27 days ago - 1 comment

#495 - 关于_ds_init_train_model的疑问

Issue - State: closed - Opened by BeerTai 27 days ago - 2 comments

#494 - 关于RM添加value head的疑问

Issue - State: closed - Opened by Gikiman 27 days ago - 2 comments

#493 - Replace deprecated/removed transformers.deepspeed module

Pull Request - State: closed - Opened by HollowMan6 28 days ago

#492 - 多机多卡下开启vllm engine会卡住

Issue - State: closed - Opened by CPFLAME 28 days ago - 2 comments

#491 - gradient accum

Issue - State: closed - Opened by longshuicui 29 days ago - 2 comments

#490 - 请问支持 PPO 过程中使用 PRM 而不是 ORM 嘛？

Issue - State: open - Opened by banksy23 30 days ago - 1 comment

#489 - This discussion starts from this PR: https://github.com/OpenRLHF/OpenRLHF/pull/477

Issue - State: closed - Opened by ZetangForward 30 days ago

#488 - Can openrlhf support using soft label during prm training process?

Issue - State: closed - Opened by banksy23 30 days ago - 2 comments
Labels: enhancement

#487 - [RFC] Support SGLang generation in RLHF

Issue - State: open - Opened by hijkzzz 30 days ago - 1 comment
Labels: enhancement

#486 - The `_get_reward_model` function has issues when loading an MoE Reward Model (e.g., ArmoRM).

Issue - State: closed - Opened by Vance0124 30 days ago - 1 comment

#485 - PRM training supportting Qwen Model Series

Issue - State: closed - Opened by xiechengmude about 1 month ago - 2 comments

#484 - fix packing_samples in NaiveExperienceMaker

Pull Request - State: closed - Opened by zmzhang2000 about 1 month ago

#483 - support grpo training v2

Pull Request - State: closed - Opened by LSX-Sneakerprogrammer about 1 month ago - 2 comments

#482 - Revert "Merge Ring Attention into SFT Trainer"

Pull Request - State: closed - Opened by zhuzilin about 1 month ago

#480 - fix packing_samples in NaiveExperienceMaker

Pull Request - State: closed - Opened by zmzhang2000 about 1 month ago - 2 comments

#479 - 关于iterative_dpo的问题

Issue - State: closed - Opened by BeerTai about 1 month ago - 1 comment

#478 - [WIP] Add REINFORCE Leave one out (RLOO) to train_ppo_ray

Pull Request - State: closed - Opened by zhuzilin about 1 month ago - 1 comment

#477 - Merge Ring Attention into SFT Trainer

Pull Request - State: closed - Opened by ZetangForward about 1 month ago - 5 comments

#476 - Support non negative kl divergence approximation

Pull Request - State: closed - Opened by zhuzilin about 1 month ago

#475 - Add temperature config for train_ppo_ray

Pull Request - State: closed - Opened by zhuzilin about 1 month ago

#474 - Upload experience_maker perf status

Pull Request - State: closed - Opened by zhuzilin about 1 month ago

#473 - remove unnecessary softmax in prm loss

Pull Request - State: closed - Opened by catqaq about 1 month ago - 1 comment

#472 - How do you connect different models using Ray.

Issue - State: open - Opened by zpcalan about 1 month ago - 3 comments

#471 - 是否支持在Ascend上面的PPO训练

Issue - State: closed - Opened by wphtrying about 1 month ago - 1 comment

#470 - PPO training stuck for Llama-3.1

Issue - State: open - Opened by zhenghaoxu-gatech about 1 month ago - 1 comment

#469 - Generation temperature for train_ppo_ray

Issue - State: closed - Opened by zhenghaoxu-gatech about 1 month ago - 1 comment

#468 - Context Parallel Failded for Modified SFT Trainer

Issue - State: closed - Opened by ZetangForward about 1 month ago - 5 comments

#467 - Unnecessary logprob computation in actor.forward

Issue - State: open - Opened by zkshan2002 about 1 month ago - 1 comment

#466 - Separate the rollout generation and advantage calculation

Pull Request - State: closed - Opened by zhuzilin about 1 month ago - 2 comments

#465 - DPO loss mask computation

Issue - State: closed - Opened by zkshan2002 about 1 month ago - 2 comments

#464 - 请问我想将框架更改为支持PRM的多步训练，而不是ORM的单步训练，是否能实现，应该改哪些部分？

Issue - State: closed - Opened by Gikiman about 1 month ago - 3 comments

#463 - Move the n_samples_per_prompt into replay buffer

Pull Request - State: closed - Opened by zhuzilin about 1 month ago

#462 - Change pg_options param into backend_options in _new_process_group_helper for PyTorch version greater than 2.6

Pull Request - State: closed - Opened by HollowMan6 about 1 month ago - 4 comments

#461 - Support remote_rm_fn when using packing_samples in ppo

Pull Request - State: closed - Opened by zhuzilin about 1 month ago

#460 - 使用adam_offload后，训练完模型save_model时报错

Issue - State: open - Opened by pythonla about 1 month ago - 3 comments

#459 - Inconsistency between micro_train_batch_size and train_batch_size

Issue - State: closed - Opened by zkshan2002 about 2 months ago - 1 comment

#458 - [BUG] fix _max_steps not initialized bug

Pull Request - State: closed - Opened by BeingGod about 2 months ago - 1 comment

#457 - fixed the missing value_head_prefix

Pull Request - State: closed - Opened by ChenmienTan about 2 months ago - 1 comment

#456 - 知识蒸馏结果复现

Issue - State: open - Opened by jinchenyu about 2 months ago - 6 comments

#455 - Add grpo trainer

Pull Request - State: closed - Opened by LSX-Sneakerprogrammer about 2 months ago - 14 comments

#454 - Can't save model

Issue - State: closed - Opened by LZY-the-boys about 2 months ago - 3 comments

#453 - bug with max_steps

Issue - State: closed - Opened by LZY-the-boys about 2 months ago

#452 - questions on the training configuration

Issue - State: closed - Opened by WayXG about 2 months ago - 1 comment

#451 - add tensorboard for local use

Pull Request - State: closed - Opened by catqaq about 2 months ago - 2 comments

#450 - Feature: Concurrent support of remote RM

Issue - State: closed - Opened by catqaq about 2 months ago

#449 - Support packing_samples for ppo with ray

Pull Request - State: closed - Opened by zhuzilin 2 months ago - 2 comments

#448 - fix bug in CriticModel

Pull Request - State: closed - Opened by zhuzilin 2 months ago

#447 - Fix output of packing data of RewardModel and CriticModel

Pull Request - State: closed - Opened by zhuzilin 2 months ago

#446 - add --use_linger_kernel

Pull Request - State: closed - Opened by xiaoxigua999 2 months ago

#445 - Fix lm_head.weight in save_model

Pull Request - State: closed - Opened by zmzhang2000 2 months ago

#444 - Add context parallel to reward model

Pull Request - State: closed - Opened by zhuzilin 2 months ago

#443 - support custom cls_class

Pull Request - State: closed - Opened by xiaoxigua999 2 months ago

#442 - Add PRM training with hard estimation

Pull Request - State: closed - Opened by zhuzilin 2 months ago - 2 comments

#441 - Why ZeRO-3 is only supported when vLLM enabled

Issue - State: closed - Opened by liuxsh9 2 months ago - 2 comments

#440 - I noticed a few calls to the get_tokenizer function in the code, but the return values were not being captured. What is the purpose of this function?

Issue - State: closed - Opened by pagepal666 2 months ago - 1 comment

#439 - Add context parallel to DPO

Pull Request - State: closed - Opened by zhuzilin 2 months ago

#438 - only import bitsandbytes when necessary

Pull Request - State: closed - Opened by zhuzilin 2 months ago

#437 - why advantage calculate ops [::-1]

Issue - State: closed - Opened by DavideHe 2 months ago

#436 - Lora merge error after dpo training with lora.

Issue - State: open - Opened by KaedinLian 3 months ago - 3 comments

#435 - 期望能支持序列并行（sequence_parallel）

Issue - State: closed - Opened by kangyishuai 3 months ago - 3 comments

#434 - train_knowledge_distillation.sh 脚本无法运行

Issue - State: closed - Opened by Rookie-Kai 3 months ago - 2 comments

#433 - Support for Token-Level Rewards?

Issue - State: closed - Opened by pagepal666 3 months ago - 5 comments

#432 - Is n_samples_per_prompt actually used?

Issue - State: closed - Opened by Rosenberg37 3 months ago - 1 comment

#431 - The need of micro_rollout_batch_size

Issue - State: closed - Opened by Unfinito 3 months ago - 4 comments

#430 - Will OpenRLHF handle gradient_accumulation_steps with loss?

Issue - State: closed - Opened by mzhaoshuai 3 months ago - 4 comments

#429 - The behavior of log

Issue - State: closed - Opened by visionxyz 3 months ago - 6 comments

#428 - How to load a open-sourced model without 'value head'

Issue - State: closed - Opened by kleinzcy 3 months ago - 3 comments

#427 - batch_inference NCCL time out error

Issue - State: closed - Opened by BeyonderXX 3 months ago - 1 comment

#426 - Evaluate the PPO Process: Compatibility issues between DeepSpeed checkpoints and Transformers models

Issue - State: open - Opened by Ricardokevins 3 months ago - 1 comment

GitHub / OpenRLHF/OpenRLHF issues and pull requests