Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / OpenRLHF/OpenRLHF issues and pull requests
#509 - PRM, loss nan
Issue -
State: open - Opened by EthanChen1234 4 days ago
- 1 comment
#508 - DPO越训显存占用越大直到爆显存
Issue -
State: open - Opened by Cerberous 4 days ago
- 1 comment
#507 - Send all prompts to vllm to enhance performance
Pull Request -
State: closed - Opened by zhuzilin 4 days ago
#506 - How to load and save best model at the end of training?
Issue -
State: open - Opened by TangJiakai 4 days ago
#505 - fix interactive chat
Pull Request -
State: closed - Opened by LYMDLUT 5 days ago
- 5 comments
#504 - Gather feature!
Issue -
State: closed - Opened by TangJiakai 5 days ago
- 4 comments
#503 - 请问支持对于多模态模型进行偏好训练吗?
Issue -
State: open - Opened by bonre 5 days ago
- 1 comment
#502 - Support PRM with soft labels and change PRM dataset format
Pull Request -
State: closed - Opened by zhuzilin 5 days ago
#501 - assert state_dict_keys.issubset( [rank0]: AssertionError: mismatch keys
Issue -
State: open - Opened by anoxia-1 6 days ago
- 1 comment
#500 - 如何使用多机多卡训练 70B PRM?
Issue -
State: open - Opened by banksy23 6 days ago
- 1 comment
#499 - [RFC] Modularizing Sample Generation with Rating in PPO for Flexible RLHF Pipelines
Issue -
State: open - Opened by zhuzilin 9 days ago
- 5 comments
Labels: enhancement
#498 - Support for PPO for PRM?
Issue -
State: open - Opened by ljb121002 10 days ago
- 1 comment
Labels: enhancement
#497 - 多卡加载模型速度显著变慢
Issue -
State: closed - Opened by fingertap 10 days ago
- 1 comment
#496 - Qwen2-7B的输出用Qwen2-1.5B计算logp的时候报错
Issue -
State: open - Opened by ZexuSun 11 days ago
- 1 comment
#495 - 关于_ds_init_train_model的疑问
Issue -
State: closed - Opened by BeerTai 11 days ago
- 2 comments
#494 - 关于RM添加value head的疑问
Issue -
State: closed - Opened by Gikiman 12 days ago
- 2 comments
#493 - Replace deprecated/removed transformers.deepspeed module
Pull Request -
State: closed - Opened by HollowMan6 12 days ago
#492 - 多机多卡下开启vllm engine会卡住
Issue -
State: closed - Opened by CPFLAME 12 days ago
- 2 comments
#491 - gradient accum
Issue -
State: closed - Opened by longshuicui 13 days ago
- 2 comments
#490 - 请问支持 PPO 过程中使用 PRM 而不是 ORM 嘛?
Issue -
State: open - Opened by banksy23 14 days ago
- 1 comment
#489 - This discussion starts from this PR: https://github.com/OpenRLHF/OpenRLHF/pull/477
Issue -
State: closed - Opened by ZetangForward 14 days ago
#488 - Can openrlhf support using soft label during prm training process?
Issue -
State: open - Opened by banksy23 14 days ago
- 2 comments
Labels: enhancement
#487 - [RFC] Support SGLang generation in RLHF
Issue -
State: open - Opened by hijkzzz 14 days ago
- 1 comment
Labels: enhancement
#486 - The `_get_reward_model` function has issues when loading an MoE Reward Model (e.g., ArmoRM).
Issue -
State: closed - Opened by Vance0124 14 days ago
- 1 comment
#485 - PRM training supportting Qwen Model Series
Issue -
State: closed - Opened by xiechengmude 15 days ago
- 2 comments
#484 - fix packing_samples in NaiveExperienceMaker
Pull Request -
State: closed - Opened by zmzhang2000 15 days ago
#483 - support grpo training v2
Pull Request -
State: open - Opened by LSX-Sneakerprogrammer 16 days ago
- 2 comments
#482 - Revert "Merge Ring Attention into SFT Trainer"
Pull Request -
State: closed - Opened by zhuzilin 16 days ago
#480 - fix packing_samples in NaiveExperienceMaker
Pull Request -
State: closed - Opened by zmzhang2000 16 days ago
- 2 comments
#479 - 关于iterative_dpo的问题
Issue -
State: closed - Opened by BeerTai 17 days ago
- 1 comment
#478 - [WIP] Add REINFORCE Leave one out (RLOO) to train_ppo_ray
Pull Request -
State: closed - Opened by zhuzilin 17 days ago
- 1 comment
#477 - Merge Ring Attention into SFT Trainer
Pull Request -
State: closed - Opened by ZetangForward 18 days ago
- 5 comments
#476 - Support non negative kl divergence approximation
Pull Request -
State: closed - Opened by zhuzilin 18 days ago
#475 - Add temperature config for train_ppo_ray
Pull Request -
State: closed - Opened by zhuzilin 19 days ago
#474 - Upload experience_maker perf status
Pull Request -
State: closed - Opened by zhuzilin 19 days ago
#473 - remove unnecessary softmax in prm loss
Pull Request -
State: closed - Opened by catqaq 20 days ago
- 1 comment
#472 - How do you connect different models using Ray.
Issue -
State: open - Opened by zpcalan 20 days ago
- 3 comments
#471 - 是否支持在Ascend上面的PPO训练
Issue -
State: closed - Opened by wphtrying 20 days ago
- 1 comment
#470 - PPO training stuck for Llama-3.1
Issue -
State: open - Opened by zhenghaoxu-gatech 21 days ago
- 1 comment
#469 - Generation temperature for train_ppo_ray
Issue -
State: closed - Opened by zhenghaoxu-gatech 21 days ago
- 1 comment
#468 - Context Parallel Failded for Modified SFT Trainer
Issue -
State: closed - Opened by ZetangForward 22 days ago
- 5 comments
#467 - Unnecessary logprob computation in actor.forward
Issue -
State: open - Opened by zkshan2002 23 days ago
- 1 comment
#466 - Separate the rollout generation and advantage calculation
Pull Request -
State: closed - Opened by zhuzilin 24 days ago
- 2 comments
#465 - DPO loss mask computation
Issue -
State: closed - Opened by zkshan2002 25 days ago
- 2 comments
#464 - 请问我想将框架更改为支持PRM的多步训练,而不是ORM的单步训练,是否能实现,应该改哪些部分?
Issue -
State: closed - Opened by Gikiman 26 days ago
- 3 comments
#463 - Move the n_samples_per_prompt into replay buffer
Pull Request -
State: closed - Opened by zhuzilin 27 days ago
#462 - Change pg_options param into backend_options in _new_process_group_helper for PyTorch version greater than 2.6
Pull Request -
State: closed - Opened by HollowMan6 28 days ago
- 4 comments
#461 - Support remote_rm_fn when using packing_samples in ppo
Pull Request -
State: closed - Opened by zhuzilin 28 days ago
#460 - 使用adam_offload后,训练完模型save_model时报错
Issue -
State: open - Opened by pythonla 28 days ago
- 3 comments
#459 - Inconsistency between micro_train_batch_size and train_batch_size
Issue -
State: closed - Opened by zkshan2002 about 1 month ago
- 1 comment
#458 - [BUG] fix _max_steps not initialized bug
Pull Request -
State: closed - Opened by BeingGod about 1 month ago
- 1 comment
#457 - fixed the missing value_head_prefix
Pull Request -
State: closed - Opened by ChenmienTan about 1 month ago
- 1 comment
#456 - 知识蒸馏结果复现
Issue -
State: open - Opened by jinchenyu about 1 month ago
- 5 comments
#455 - Add grpo trainer
Pull Request -
State: closed - Opened by LSX-Sneakerprogrammer about 1 month ago
- 14 comments
#454 - Can't save model
Issue -
State: closed - Opened by LZY-the-boys about 1 month ago
- 3 comments
#453 - bug with max_steps
Issue -
State: closed - Opened by LZY-the-boys about 1 month ago
#452 - questions on the training configuration
Issue -
State: closed - Opened by WayXG about 1 month ago
- 1 comment
#451 - add tensorboard for local use
Pull Request -
State: closed - Opened by catqaq about 1 month ago
- 2 comments
#450 - Feature: Concurrent support of remote RM
Issue -
State: closed - Opened by catqaq about 1 month ago
#449 - Support packing_samples for ppo with ray
Pull Request -
State: closed - Opened by zhuzilin about 2 months ago
- 2 comments
#448 - fix bug in CriticModel
Pull Request -
State: closed - Opened by zhuzilin about 2 months ago
#447 - Fix output of packing data of RewardModel and CriticModel
Pull Request -
State: closed - Opened by zhuzilin about 2 months ago
#446 - add --use_linger_kernel
Pull Request -
State: closed - Opened by xiaoxigua999 about 2 months ago
#445 - Fix lm_head.weight in save_model
Pull Request -
State: closed - Opened by zmzhang2000 about 2 months ago
#444 - Add context parallel to reward model
Pull Request -
State: closed - Opened by zhuzilin about 2 months ago
#443 - support custom cls_class
Pull Request -
State: closed - Opened by xiaoxigua999 about 2 months ago
#442 - Add PRM training with hard estimation
Pull Request -
State: closed - Opened by zhuzilin about 2 months ago
- 2 comments
#441 - Why ZeRO-3 is only supported when vLLM enabled
Issue -
State: closed - Opened by liuxsh9 about 2 months ago
- 2 comments
#440 - I noticed a few calls to the get_tokenizer function in the code, but the return values were not being captured. What is the purpose of this function?
Issue -
State: closed - Opened by pagepal666 about 2 months ago
- 1 comment
#439 - Add context parallel to DPO
Pull Request -
State: closed - Opened by zhuzilin about 2 months ago
#438 - only import bitsandbytes when necessary
Pull Request -
State: closed - Opened by zhuzilin about 2 months ago
#437 - why advantage calculate ops [::-1]
Issue -
State: closed - Opened by DavideHe about 2 months ago
#436 - Lora merge error after dpo training with lora.
Issue -
State: open - Opened by KaedinLian 2 months ago
- 3 comments
#435 - 期望能支持序列并行(sequence_parallel)
Issue -
State: closed - Opened by kangyishuai 2 months ago
- 3 comments
#434 - train_knowledge_distillation.sh 脚本无法运行
Issue -
State: closed - Opened by Rookie-Kai 2 months ago
- 2 comments
#433 - Support for Token-Level Rewards?
Issue -
State: closed - Opened by pagepal666 2 months ago
- 5 comments
#432 - Is n_samples_per_prompt actually used?
Issue -
State: closed - Opened by Rosenberg37 2 months ago
- 1 comment
#431 - The need of micro_rollout_batch_size
Issue -
State: closed - Opened by Unfinito 2 months ago
- 4 comments
#430 - Will OpenRLHF handle gradient_accumulation_steps with loss?
Issue -
State: closed - Opened by mzhaoshuai 2 months ago
- 4 comments
#429 - The behavior of log
Issue -
State: closed - Opened by visionxyz 2 months ago
- 6 comments
#428 - How to load a open-sourced model without 'value head'
Issue -
State: closed - Opened by kleinzcy 3 months ago
- 3 comments
#427 - batch_inference NCCL time out error
Issue -
State: closed - Opened by BeyonderXX 3 months ago
- 1 comment
#426 - Evaluate the PPO Process: Compatibility issues between DeepSpeed checkpoints and Transformers models
Issue -
State: open - Opened by Ricardokevins 3 months ago
- 1 comment
#425 - Add feature of load_from_disk to utils.py
Pull Request -
State: closed - Opened by tongyx361 3 months ago
#424 - 4卡4090 训练qwen2-7b-instruct时报错CUDA out of memory
Issue -
State: closed - Opened by SuiJiGuoChengSuiJiGuo 3 months ago
- 2 comments
#423 - PPO训练overlap_comm打开会对训练表现造成很大影响
Issue -
State: open - Opened by andylrx 3 months ago
- 1 comment
#422 - add 'num_return_sequences' feature in actor
Pull Request -
State: closed - Opened by 0xWelt 3 months ago
- 4 comments
#421 - 请问这里循环处为什么设置dist.barrier()?
Issue -
State: closed - Opened by lyz22233 3 months ago
- 2 comments
#420 - PPO takes very long
Issue -
State: closed - Opened by mandyyyyii 3 months ago
- 10 comments
#419 - 请问有蒸馏的loss曲线可以参考吗
Issue -
State: closed - Opened by Schnabel-8 3 months ago
- 7 comments
#418 - flash_attn问题
Issue -
State: closed - Opened by tbsxxxH 3 months ago
- 3 comments
#417 - Add makedirs before writing in batch_inference
Pull Request -
State: closed - Opened by tongyx361 3 months ago
#416 - ppo错误
Issue -
State: closed - Opened by ldh127 3 months ago
- 1 comment
#415 - AssertionError: Session name does not match persisted value
Issue -
State: closed - Opened by tbsxxxH 3 months ago
- 1 comment
#414 - update link to code in readme
Pull Request -
State: closed - Opened by coding-famer 3 months ago
#413 - 版本冲突问题
Issue -
State: closed - Opened by tbsxxxH 3 months ago
- 1 comment
#412 - Speed Up Data Processing by Using Multi-Processing in Dataset.map
Pull Request -
State: closed - Opened by Ricardokevins 3 months ago
#411 - torch.distributed.broadcast timeout
Issue -
State: closed - Opened by lyz22233 3 months ago
- 2 comments
#410 - Speed Up Data Processing by Using Multi-Processing in Dataset.map
Pull Request -
State: closed - Opened by Ricardokevins 3 months ago
- 1 comment
#409 - Data Preprocess Speed Up
Issue -
State: closed - Opened by Ricardokevins 3 months ago
- 1 comment