Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / OpenRLHF/OpenRLHF issues and pull requests
#542 - Questions About Reward Model
Issue -
State: open - Opened by hashword0428 5 days ago
- 1 comment
#541 - grad_accum with broadcast and zero3
Issue -
State: open - Opened by thehir0 6 days ago
#540 - Use worker_cls when vLLM version > 0.6.4.post1
Pull Request -
State: closed - Opened by HollowMan6 6 days ago
- 1 comment
#539 - PPO训练中的packing
Issue -
State: open - Opened by Hanqer 6 days ago
#537 - Empty all the __init__ files + clean up unused imports
Pull Request -
State: closed - Opened by HollowMan6 7 days ago
- 1 comment
#536 - [Relanding] Add support when RAY_EXPERIMENTAL_NOSET_*_VISIBLE_DEVICES is set
Pull Request -
State: closed - Opened by HollowMan6 7 days ago
- 7 comments
#534 - Loosen dependency version requirements
Pull Request -
State: closed - Opened by fzyzcjy 7 days ago
- 3 comments
#533 - _broadcast_to_vllm 在大规模训练中成为瓶颈
Issue -
State: open - Opened by Wraythh 7 days ago
- 3 comments
#524 - Add support when RAY_EXPERIMENTAL_NOSET_*_VISIBLE_DEVICES is set
Pull Request -
State: closed - Opened by HollowMan6 16 days ago
- 7 comments
#523 - Dataloader Errors using HF Models
Issue -
State: open - Opened by wilkincr 16 days ago
- 1 comment
#523 - Dataloader Errors using HF Models
Issue -
State: open - Opened by wilkincr 16 days ago
- 1 comment
#522 - why prm loss use cross entropy not reward?
Issue -
State: open - Opened by yiyepiaoling0715 16 days ago
- 1 comment
#522 - why prm loss use cross entropy not reward?
Issue -
State: open - Opened by yiyepiaoling0715 16 days ago
- 1 comment
#521 - Fix serve_rm hanging on long input
Pull Request -
State: closed - Opened by cemiu 16 days ago
#520 - Gemma2 Ray vllm
Issue -
State: open - Opened by thehir0 17 days ago
- 3 comments
#519 - OOM with Llama3-8b on 8*H100
Issue -
State: open - Opened by sharptcode 19 days ago
- 1 comment
#518 - Raise warning on faulty input template
Pull Request -
State: closed - Opened by cemiu 20 days ago
- 1 comment
#517 - reward变为0
Issue -
State: open - Opened by anoxia-1 20 days ago
- 1 comment
#516 - 资源不能被调度
Issue -
State: closed - Opened by BeerTai 21 days ago
#515 - Support RLOO
Pull Request -
State: open - Opened by zhuzilin 21 days ago
- 4 comments
#514 - [DRAFT] Add group_norm in train_ppo.py and train_ppo_ray.py
Pull Request -
State: open - Opened by xiaoxigua999 23 days ago
- 3 comments
#513 - Add reinforce algorithm in train_ppo.py and train_ppo_ray.py
Pull Request -
State: closed - Opened by xiaoxigua999 23 days ago
- 1 comment
#512 - fix interactive_chat
Pull Request -
State: closed - Opened by cemiu 24 days ago
#511 - Allow arbitrary number of vllm engines
Pull Request -
State: closed - Opened by zhuzilin 24 days ago
- 2 comments
#510 - PRM loss 疑问
Issue -
State: open - Opened by EthanChen1234 25 days ago
- 3 comments
#509 - PRM, loss nan
Issue -
State: closed - Opened by EthanChen1234 25 days ago
- 2 comments
#508 - DPO越训显存占用越大直到爆显存
Issue -
State: open - Opened by Cerberous 25 days ago
- 5 comments
#507 - Send all prompts to vllm to enhance performance
Pull Request -
State: closed - Opened by zhuzilin 25 days ago
#506 - How to load and save best model at the end of training?
Issue -
State: closed - Opened by TangJiakai 25 days ago
- 1 comment
#505 - fix interactive chat
Pull Request -
State: closed - Opened by LYMDLUT 26 days ago
- 5 comments
#504 - Gather feature!
Issue -
State: closed - Opened by TangJiakai 26 days ago
- 4 comments
#503 - 请问支持对于多模态模型进行偏好训练吗?
Issue -
State: open - Opened by bonre 26 days ago
- 1 comment
#502 - Support PRM with soft labels and change PRM dataset format
Pull Request -
State: closed - Opened by zhuzilin 26 days ago
#501 - assert state_dict_keys.issubset( [rank0]: AssertionError: mismatch keys
Issue -
State: open - Opened by anoxia-1 27 days ago
- 1 comment
#500 - 如何使用多机多卡训练 70B PRM?
Issue -
State: open - Opened by banksy23 27 days ago
- 1 comment
#499 - [RFC] Modularizing Sample Generation with Rating in PPO for Flexible RLHF Pipelines
Issue -
State: closed - Opened by zhuzilin 30 days ago
- 6 comments
Labels: enhancement
#498 - Support for PPO for PRM?
Issue -
State: open - Opened by ljb121002 about 1 month ago
- 1 comment
Labels: enhancement
#497 - 多卡加载模型速度显著变慢
Issue -
State: closed - Opened by fingertap about 1 month ago
- 1 comment
#496 - Qwen2-7B的输出用Qwen2-1.5B计算logp的时候报错
Issue -
State: open - Opened by ZexuSun about 1 month ago
- 1 comment
#495 - 关于_ds_init_train_model的疑问
Issue -
State: closed - Opened by BeerTai about 1 month ago
- 2 comments
#494 - 关于RM添加value head的疑问
Issue -
State: closed - Opened by Gikiman about 1 month ago
- 2 comments
#493 - Replace deprecated/removed transformers.deepspeed module
Pull Request -
State: closed - Opened by HollowMan6 about 1 month ago
#492 - 多机多卡下开启vllm engine会卡住
Issue -
State: closed - Opened by CPFLAME about 1 month ago
- 2 comments
#491 - gradient accum
Issue -
State: closed - Opened by longshuicui about 1 month ago
- 2 comments
#490 - 请问支持 PPO 过程中使用 PRM 而不是 ORM 嘛?
Issue -
State: open - Opened by banksy23 about 1 month ago
- 1 comment
#489 - This discussion starts from this PR: https://github.com/OpenRLHF/OpenRLHF/pull/477
Issue -
State: closed - Opened by ZetangForward about 1 month ago
#488 - Can openrlhf support using soft label during prm training process?
Issue -
State: closed - Opened by banksy23 about 1 month ago
- 2 comments
Labels: enhancement
#487 - [RFC] Support SGLang generation in RLHF
Issue -
State: open - Opened by hijkzzz about 1 month ago
- 1 comment
Labels: enhancement
#486 - The `_get_reward_model` function has issues when loading an MoE Reward Model (e.g., ArmoRM).
Issue -
State: closed - Opened by Vance0124 about 1 month ago
- 1 comment
#485 - PRM training supportting Qwen Model Series
Issue -
State: closed - Opened by xiechengmude about 1 month ago
- 2 comments
#484 - fix packing_samples in NaiveExperienceMaker
Pull Request -
State: closed - Opened by zmzhang2000 about 1 month ago
#483 - support grpo training v2
Pull Request -
State: closed - Opened by LSX-Sneakerprogrammer about 1 month ago
- 2 comments
#482 - Revert "Merge Ring Attention into SFT Trainer"
Pull Request -
State: closed - Opened by zhuzilin about 1 month ago
#480 - fix packing_samples in NaiveExperienceMaker
Pull Request -
State: closed - Opened by zmzhang2000 about 1 month ago
- 2 comments
#479 - 关于iterative_dpo的问题
Issue -
State: closed - Opened by BeerTai about 1 month ago
- 1 comment
#478 - [WIP] Add REINFORCE Leave one out (RLOO) to train_ppo_ray
Pull Request -
State: closed - Opened by zhuzilin about 1 month ago
- 1 comment
#477 - Merge Ring Attention into SFT Trainer
Pull Request -
State: closed - Opened by ZetangForward about 1 month ago
- 5 comments
#476 - Support non negative kl divergence approximation
Pull Request -
State: closed - Opened by zhuzilin about 1 month ago
#475 - Add temperature config for train_ppo_ray
Pull Request -
State: closed - Opened by zhuzilin about 1 month ago
#474 - Upload experience_maker perf status
Pull Request -
State: closed - Opened by zhuzilin about 1 month ago
#473 - remove unnecessary softmax in prm loss
Pull Request -
State: closed - Opened by catqaq about 1 month ago
- 1 comment
#472 - How do you connect different models using Ray.
Issue -
State: open - Opened by zpcalan about 1 month ago
- 3 comments
#471 - 是否支持在Ascend上面的PPO训练
Issue -
State: closed - Opened by wphtrying about 1 month ago
- 1 comment
#470 - PPO training stuck for Llama-3.1
Issue -
State: open - Opened by zhenghaoxu-gatech about 1 month ago
- 1 comment
#469 - Generation temperature for train_ppo_ray
Issue -
State: closed - Opened by zhenghaoxu-gatech about 1 month ago
- 1 comment
#468 - Context Parallel Failded for Modified SFT Trainer
Issue -
State: closed - Opened by ZetangForward about 1 month ago
- 5 comments
#467 - Unnecessary logprob computation in actor.forward
Issue -
State: open - Opened by zkshan2002 about 1 month ago
- 1 comment
#466 - Separate the rollout generation and advantage calculation
Pull Request -
State: closed - Opened by zhuzilin about 2 months ago
- 2 comments
#465 - DPO loss mask computation
Issue -
State: closed - Opened by zkshan2002 about 2 months ago
- 2 comments
#464 - 请问我想将框架更改为支持PRM的多步训练,而不是ORM的单步训练,是否能实现,应该改哪些部分?
Issue -
State: closed - Opened by Gikiman about 2 months ago
- 3 comments
#463 - Move the n_samples_per_prompt into replay buffer
Pull Request -
State: closed - Opened by zhuzilin about 2 months ago
#462 - Change pg_options param into backend_options in _new_process_group_helper for PyTorch version greater than 2.6
Pull Request -
State: closed - Opened by HollowMan6 about 2 months ago
- 4 comments
#461 - Support remote_rm_fn when using packing_samples in ppo
Pull Request -
State: closed - Opened by zhuzilin about 2 months ago
#460 - 使用adam_offload后,训练完模型save_model时报错
Issue -
State: open - Opened by pythonla about 2 months ago
- 3 comments
#459 - Inconsistency between micro_train_batch_size and train_batch_size
Issue -
State: closed - Opened by zkshan2002 about 2 months ago
- 1 comment
#458 - [BUG] fix _max_steps not initialized bug
Pull Request -
State: closed - Opened by BeingGod about 2 months ago
- 1 comment
#457 - fixed the missing value_head_prefix
Pull Request -
State: closed - Opened by ChenmienTan about 2 months ago
- 1 comment
#456 - 知识蒸馏结果复现
Issue -
State: open - Opened by jinchenyu about 2 months ago
- 6 comments
#455 - Add grpo trainer
Pull Request -
State: closed - Opened by LSX-Sneakerprogrammer about 2 months ago
- 14 comments
#454 - Can't save model
Issue -
State: closed - Opened by LZY-the-boys about 2 months ago
- 3 comments
#453 - bug with max_steps
Issue -
State: closed - Opened by LZY-the-boys about 2 months ago
#452 - questions on the training configuration
Issue -
State: closed - Opened by WayXG 2 months ago
- 1 comment
#451 - add tensorboard for local use
Pull Request -
State: closed - Opened by catqaq 2 months ago
- 2 comments
#450 - Feature: Concurrent support of remote RM
Issue -
State: closed - Opened by catqaq 2 months ago
#449 - Support packing_samples for ppo with ray
Pull Request -
State: closed - Opened by zhuzilin 2 months ago
- 2 comments
#448 - fix bug in CriticModel
Pull Request -
State: closed - Opened by zhuzilin 2 months ago
#447 - Fix output of packing data of RewardModel and CriticModel
Pull Request -
State: closed - Opened by zhuzilin 2 months ago
#446 - add --use_linger_kernel
Pull Request -
State: closed - Opened by xiaoxigua999 2 months ago
#445 - Fix lm_head.weight in save_model
Pull Request -
State: closed - Opened by zmzhang2000 2 months ago
#444 - Add context parallel to reward model
Pull Request -
State: closed - Opened by zhuzilin 2 months ago
#443 - support custom cls_class
Pull Request -
State: closed - Opened by xiaoxigua999 2 months ago
#442 - Add PRM training with hard estimation
Pull Request -
State: closed - Opened by zhuzilin 2 months ago
- 2 comments
#441 - Why ZeRO-3 is only supported when vLLM enabled
Issue -
State: closed - Opened by liuxsh9 2 months ago
- 2 comments
#440 - I noticed a few calls to the get_tokenizer function in the code, but the return values were not being captured. What is the purpose of this function?
Issue -
State: closed - Opened by pagepal666 3 months ago
- 1 comment
#439 - Add context parallel to DPO
Pull Request -
State: closed - Opened by zhuzilin 3 months ago
#438 - only import bitsandbytes when necessary
Pull Request -
State: closed - Opened by zhuzilin 3 months ago
#437 - why advantage calculate ops [::-1]
Issue -
State: closed - Opened by DavideHe 3 months ago
#436 - Lora merge error after dpo training with lora.
Issue -
State: open - Opened by KaedinLian 3 months ago
- 3 comments
#435 - 期望能支持序列并行(sequence_parallel)
Issue -
State: closed - Opened by kangyishuai 3 months ago
- 3 comments
#434 - train_knowledge_distillation.sh 脚本无法运行
Issue -
State: closed - Opened by Rookie-Kai 3 months ago
- 2 comments