Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / OpenRLHF/OpenRLHF issues and pull requests

#509 - PRM, loss nan

Issue - State: open - Opened by EthanChen1234 4 days ago - 1 comment

#508 - DPO越训显存占用越大直到爆显存

Issue - State: open - Opened by Cerberous 4 days ago - 1 comment

#507 - Send all prompts to vllm to enhance performance

Pull Request - State: closed - Opened by zhuzilin 4 days ago

#505 - fix interactive chat

Pull Request - State: closed - Opened by LYMDLUT 5 days ago - 5 comments

#504 - Gather feature!

Issue - State: closed - Opened by TangJiakai 5 days ago - 4 comments

#503 - 请问支持对于多模态模型进行偏好训练吗?

Issue - State: open - Opened by bonre 5 days ago - 1 comment

#502 - Support PRM with soft labels and change PRM dataset format

Pull Request - State: closed - Opened by zhuzilin 5 days ago

#500 - 如何使用多机多卡训练 70B PRM?

Issue - State: open - Opened by banksy23 6 days ago - 1 comment

#499 - [RFC] Modularizing Sample Generation with Rating in PPO for Flexible RLHF Pipelines

Issue - State: open - Opened by zhuzilin 9 days ago - 5 comments
Labels: enhancement

#498 - Support for PPO for PRM?

Issue - State: open - Opened by ljb121002 10 days ago - 1 comment
Labels: enhancement

#497 - 多卡加载模型速度显著变慢

Issue - State: closed - Opened by fingertap 10 days ago - 1 comment

#496 - Qwen2-7B的输出用Qwen2-1.5B计算logp的时候报错

Issue - State: open - Opened by ZexuSun 11 days ago - 1 comment

#495 - 关于_ds_init_train_model的疑问

Issue - State: closed - Opened by BeerTai 11 days ago - 2 comments

#494 - 关于RM添加value head的疑问

Issue - State: closed - Opened by Gikiman 12 days ago - 2 comments

#493 - Replace deprecated/removed transformers.deepspeed module

Pull Request - State: closed - Opened by HollowMan6 12 days ago

#492 - 多机多卡下开启vllm engine会卡住

Issue - State: closed - Opened by CPFLAME 12 days ago - 2 comments

#491 - gradient accum

Issue - State: closed - Opened by longshuicui 13 days ago - 2 comments

#490 - 请问支持 PPO 过程中使用 PRM 而不是 ORM 嘛?

Issue - State: open - Opened by banksy23 14 days ago - 1 comment

#488 - Can openrlhf support using soft label during prm training process?

Issue - State: open - Opened by banksy23 14 days ago - 2 comments
Labels: enhancement

#487 - [RFC] Support SGLang generation in RLHF

Issue - State: open - Opened by hijkzzz 14 days ago - 1 comment
Labels: enhancement

#485 - PRM training supportting Qwen Model Series

Issue - State: closed - Opened by xiechengmude 15 days ago - 2 comments

#484 - fix packing_samples in NaiveExperienceMaker

Pull Request - State: closed - Opened by zmzhang2000 15 days ago

#483 - support grpo training v2

Pull Request - State: open - Opened by LSX-Sneakerprogrammer 16 days ago - 2 comments

#482 - Revert "Merge Ring Attention into SFT Trainer"

Pull Request - State: closed - Opened by zhuzilin 16 days ago

#480 - fix packing_samples in NaiveExperienceMaker

Pull Request - State: closed - Opened by zmzhang2000 16 days ago - 2 comments

#479 - 关于iterative_dpo的问题

Issue - State: closed - Opened by BeerTai 17 days ago - 1 comment

#478 - [WIP] Add REINFORCE Leave one out (RLOO) to train_ppo_ray

Pull Request - State: closed - Opened by zhuzilin 17 days ago - 1 comment

#477 - Merge Ring Attention into SFT Trainer

Pull Request - State: closed - Opened by ZetangForward 18 days ago - 5 comments

#476 - Support non negative kl divergence approximation

Pull Request - State: closed - Opened by zhuzilin 18 days ago

#475 - Add temperature config for train_ppo_ray

Pull Request - State: closed - Opened by zhuzilin 19 days ago

#474 - Upload experience_maker perf status

Pull Request - State: closed - Opened by zhuzilin 19 days ago

#473 - remove unnecessary softmax in prm loss

Pull Request - State: closed - Opened by catqaq 20 days ago - 1 comment

#472 - How do you connect different models using Ray.

Issue - State: open - Opened by zpcalan 20 days ago - 3 comments

#471 - 是否支持在Ascend上面的PPO训练

Issue - State: closed - Opened by wphtrying 20 days ago - 1 comment

#470 - PPO training stuck for Llama-3.1

Issue - State: open - Opened by zhenghaoxu-gatech 21 days ago - 1 comment

#469 - Generation temperature for train_ppo_ray

Issue - State: closed - Opened by zhenghaoxu-gatech 21 days ago - 1 comment

#468 - Context Parallel Failded for Modified SFT Trainer

Issue - State: closed - Opened by ZetangForward 22 days ago - 5 comments

#467 - Unnecessary logprob computation in actor.forward

Issue - State: open - Opened by zkshan2002 23 days ago - 1 comment

#466 - Separate the rollout generation and advantage calculation

Pull Request - State: closed - Opened by zhuzilin 24 days ago - 2 comments

#465 - DPO loss mask computation

Issue - State: closed - Opened by zkshan2002 25 days ago - 2 comments

#463 - Move the n_samples_per_prompt into replay buffer

Pull Request - State: closed - Opened by zhuzilin 27 days ago

#461 - Support remote_rm_fn when using packing_samples in ppo

Pull Request - State: closed - Opened by zhuzilin 28 days ago

#460 - 使用adam_offload后,训练完模型save_model时报错

Issue - State: open - Opened by pythonla 28 days ago - 3 comments

#459 - Inconsistency between micro_train_batch_size and train_batch_size

Issue - State: closed - Opened by zkshan2002 about 1 month ago - 1 comment

#458 - [BUG] fix _max_steps not initialized bug

Pull Request - State: closed - Opened by BeingGod about 1 month ago - 1 comment

#457 - fixed the missing value_head_prefix

Pull Request - State: closed - Opened by ChenmienTan about 1 month ago - 1 comment

#456 - 知识蒸馏结果复现

Issue - State: open - Opened by jinchenyu about 1 month ago - 5 comments

#455 - Add grpo trainer

Pull Request - State: closed - Opened by LSX-Sneakerprogrammer about 1 month ago - 14 comments

#454 - Can't save model

Issue - State: closed - Opened by LZY-the-boys about 1 month ago - 3 comments

#453 - bug with max_steps

Issue - State: closed - Opened by LZY-the-boys about 1 month ago

#452 - questions on the training configuration

Issue - State: closed - Opened by WayXG about 1 month ago - 1 comment

#451 - add tensorboard for local use

Pull Request - State: closed - Opened by catqaq about 1 month ago - 2 comments

#450 - Feature: Concurrent support of remote RM

Issue - State: closed - Opened by catqaq about 1 month ago

#449 - Support packing_samples for ppo with ray

Pull Request - State: closed - Opened by zhuzilin about 2 months ago - 2 comments

#448 - fix bug in CriticModel

Pull Request - State: closed - Opened by zhuzilin about 2 months ago

#447 - Fix output of packing data of RewardModel and CriticModel

Pull Request - State: closed - Opened by zhuzilin about 2 months ago

#446 - add --use_linger_kernel

Pull Request - State: closed - Opened by xiaoxigua999 about 2 months ago

#445 - Fix lm_head.weight in save_model

Pull Request - State: closed - Opened by zmzhang2000 about 2 months ago

#444 - Add context parallel to reward model

Pull Request - State: closed - Opened by zhuzilin about 2 months ago

#443 - support custom cls_class

Pull Request - State: closed - Opened by xiaoxigua999 about 2 months ago

#442 - Add PRM training with hard estimation

Pull Request - State: closed - Opened by zhuzilin about 2 months ago - 2 comments

#441 - Why ZeRO-3 is only supported when vLLM enabled

Issue - State: closed - Opened by liuxsh9 about 2 months ago - 2 comments

#439 - Add context parallel to DPO

Pull Request - State: closed - Opened by zhuzilin about 2 months ago

#438 - only import bitsandbytes when necessary

Pull Request - State: closed - Opened by zhuzilin about 2 months ago

#437 - why advantage calculate ops [::-1]

Issue - State: closed - Opened by DavideHe about 2 months ago

#436 - Lora merge error after dpo training with lora.

Issue - State: open - Opened by KaedinLian 2 months ago - 3 comments

#435 - 期望能支持序列并行(sequence_parallel)

Issue - State: closed - Opened by kangyishuai 2 months ago - 3 comments

#434 - train_knowledge_distillation.sh 脚本无法运行

Issue - State: closed - Opened by Rookie-Kai 2 months ago - 2 comments

#433 - Support for Token-Level Rewards?

Issue - State: closed - Opened by pagepal666 2 months ago - 5 comments

#432 - Is n_samples_per_prompt actually used?

Issue - State: closed - Opened by Rosenberg37 2 months ago - 1 comment

#431 - The need of micro_rollout_batch_size

Issue - State: closed - Opened by Unfinito 2 months ago - 4 comments

#430 - Will OpenRLHF handle gradient_accumulation_steps with loss?

Issue - State: closed - Opened by mzhaoshuai 2 months ago - 4 comments

#429 - The behavior of log

Issue - State: closed - Opened by visionxyz 2 months ago - 6 comments

#428 - How to load a open-sourced model without 'value head'

Issue - State: closed - Opened by kleinzcy 3 months ago - 3 comments

#427 - batch_inference NCCL time out error

Issue - State: closed - Opened by BeyonderXX 3 months ago - 1 comment

#425 - Add feature of load_from_disk to utils.py

Pull Request - State: closed - Opened by tongyx361 3 months ago

#424 - 4卡4090 训练qwen2-7b-instruct时报错CUDA out of memory

Issue - State: closed - Opened by SuiJiGuoChengSuiJiGuo 3 months ago - 2 comments

#423 - PPO训练overlap_comm打开会对训练表现造成很大影响

Issue - State: open - Opened by andylrx 3 months ago - 1 comment

#422 - add 'num_return_sequences' feature in actor

Pull Request - State: closed - Opened by 0xWelt 3 months ago - 4 comments

#421 - 请问这里循环处为什么设置dist.barrier()?

Issue - State: closed - Opened by lyz22233 3 months ago - 2 comments

#420 - PPO takes very long

Issue - State: closed - Opened by mandyyyyii 3 months ago - 10 comments

#419 - 请问有蒸馏的loss曲线可以参考吗

Issue - State: closed - Opened by Schnabel-8 3 months ago - 7 comments

#418 - flash_attn问题

Issue - State: closed - Opened by tbsxxxH 3 months ago - 3 comments

#417 - Add makedirs before writing in batch_inference

Pull Request - State: closed - Opened by tongyx361 3 months ago

#416 - ppo错误

Issue - State: closed - Opened by ldh127 3 months ago - 1 comment

#415 - AssertionError: Session name does not match persisted value

Issue - State: closed - Opened by tbsxxxH 3 months ago - 1 comment

#414 - update link to code in readme

Pull Request - State: closed - Opened by coding-famer 3 months ago

#413 - 版本冲突问题

Issue - State: closed - Opened by tbsxxxH 3 months ago - 1 comment

#411 - torch.distributed.broadcast timeout

Issue - State: closed - Opened by lyz22233 3 months ago - 2 comments

#410 - Speed Up Data Processing by Using Multi-Processing in Dataset.map

Pull Request - State: closed - Opened by Ricardokevins 3 months ago - 1 comment

#409 - Data Preprocess Speed Up

Issue - State: closed - Opened by Ricardokevins 3 months ago - 1 comment