Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / OpenRLHF/OpenRLHF issues and pull requests
#433 - Support for Token-Level Rewards?
Issue -
State: closed - Opened by pagepal666 3 months ago
- 5 comments
#432 - Is n_samples_per_prompt actually used?
Issue -
State: closed - Opened by Rosenberg37 3 months ago
- 1 comment
#431 - The need of micro_rollout_batch_size
Issue -
State: closed - Opened by Unfinito 3 months ago
- 4 comments
#430 - Will OpenRLHF handle gradient_accumulation_steps with loss?
Issue -
State: closed - Opened by mzhaoshuai 3 months ago
- 4 comments
#429 - The behavior of log
Issue -
State: closed - Opened by visionxyz 3 months ago
- 6 comments
#428 - How to load a open-sourced model without 'value head'
Issue -
State: closed - Opened by kleinzcy 3 months ago
- 3 comments
#427 - batch_inference NCCL time out error
Issue -
State: closed - Opened by BeyonderXX 3 months ago
- 1 comment
#426 - Evaluate the PPO Process: Compatibility issues between DeepSpeed checkpoints and Transformers models
Issue -
State: open - Opened by Ricardokevins 3 months ago
- 1 comment
#425 - Add feature of load_from_disk to utils.py
Pull Request -
State: closed - Opened by tongyx361 3 months ago
#424 - 4卡4090 训练qwen2-7b-instruct时报错CUDA out of memory
Issue -
State: closed - Opened by SuiJiGuoChengSuiJiGuo 3 months ago
- 2 comments
#423 - PPO训练overlap_comm打开会对训练表现造成很大影响
Issue -
State: open - Opened by andylrx 3 months ago
- 1 comment
#422 - add 'num_return_sequences' feature in actor
Pull Request -
State: closed - Opened by 0xWelt 4 months ago
- 4 comments
#421 - 请问这里循环处为什么设置dist.barrier()?
Issue -
State: closed - Opened by lyz22233 4 months ago
- 2 comments
#420 - PPO takes very long
Issue -
State: closed - Opened by mandyyyyii 4 months ago
- 10 comments
#419 - 请问有蒸馏的loss曲线可以参考吗
Issue -
State: closed - Opened by Schnabel-8 4 months ago
- 8 comments
#418 - flash_attn问题
Issue -
State: closed - Opened by tbsxxxH 4 months ago
- 3 comments
#417 - Add makedirs before writing in batch_inference
Pull Request -
State: closed - Opened by tongyx361 4 months ago
#416 - ppo错误
Issue -
State: closed - Opened by ldh127 4 months ago
- 1 comment
#415 - AssertionError: Session name does not match persisted value
Issue -
State: closed - Opened by tbsxxxH 4 months ago
- 1 comment
#414 - update link to code in readme
Pull Request -
State: closed - Opened by coding-famer 4 months ago
#413 - 版本冲突问题
Issue -
State: closed - Opened by tbsxxxH 4 months ago
- 1 comment
#412 - Speed Up Data Processing by Using Multi-Processing in Dataset.map
Pull Request -
State: closed - Opened by Ricardokevins 4 months ago
#411 - torch.distributed.broadcast timeout
Issue -
State: closed - Opened by lyz22233 4 months ago
- 2 comments
#410 - Speed Up Data Processing by Using Multi-Processing in Dataset.map
Pull Request -
State: closed - Opened by Ricardokevins 4 months ago
- 1 comment
#409 - Data Preprocess Speed Up
Issue -
State: closed - Opened by Ricardokevins 4 months ago
- 1 comment
#408 - ppo 错误
Issue -
State: closed - Opened by ldh127 4 months ago
- 7 comments
#407 - 模型参数量与显存占用关系
Issue -
State: closed - Opened by tbsxxxH 4 months ago
- 1 comment
#406 - 8卡 4090 ppo ray 训练问题
Issue -
State: closed - Opened by hehebamei 4 months ago
- 1 comment
#405 - ppo 错误
Issue -
State: closed - Opened by ldh127 4 months ago
- 2 comments
#404 - 无法用Qwen2-70b给 Qwen2-1.5b进行蒸馏 报错 tensor size 不能match
Issue -
State: closed - Opened by xiechengmude 4 months ago
- 1 comment
#403 - Qlora load model error
Issue -
State: closed - Opened by ldh127 4 months ago
- 1 comment
#402 - ValueError: invalid literal for int() with base 10: 'MIG-cfb2a8ae-864b-50df-94a5-98983023f29d'
Issue -
State: closed - Opened by liwd190019 4 months ago
- 3 comments
#401 - an unexpected error while SFT
Issue -
State: closed - Opened by liwd190019 4 months ago
- 2 comments
#400 - No module named 'vllm'
Issue -
State: closed - Opened by tbsxxxH 4 months ago
- 3 comments
#399 - 使用ray训练PPO时,需要把模型都上传到集群吗,为什么我每次运行的时候ray都会因为要传输的东西太大而断开连接
Issue -
State: closed - Opened by hehebamei 4 months ago
- 2 comments
#398 - PPO OOM 8*A100 40G
Issue -
State: closed - Opened by tbsxxxH 4 months ago
- 11 comments
#397 - train_batch_size
Issue -
State: closed - Opened by mandyyyyii 4 months ago
- 1 comment
#396 - rename wandb args in scripts
Pull Request -
State: closed - Opened by coding-famer 4 months ago
#395 - 显存占用很奇怪
Issue -
State: closed - Opened by lyz22233 4 months ago
- 4 comments
#394 - Support Checkpoint
Pull Request -
State: closed - Opened by xiaoxigua999 4 months ago
#393 - SFT dataset tokenization scheme bug when using llama3
Issue -
State: closed - Opened by ZhaofengWu 4 months ago
- 8 comments
#392 - train_ppo_ray OOM
Issue -
State: closed - Opened by syx11237744 4 months ago
- 4 comments
#391 - support remote rm
Pull Request -
State: closed - Opened by xiaoxigua999 4 months ago
#390 - Why multiplying rstd instead of dividing by rstd?
Issue -
State: closed - Opened by gohsyi 4 months ago
- 1 comment
#389 - Performance of Iterative DPO?
Issue -
State: closed - Opened by yesiam-png 4 months ago
- 1 comment
#388 - Update version.txt
Pull Request -
State: closed - Opened by xiaoxigua999 4 months ago
#387 - Zero stage 3 error
Issue -
State: closed - Opened by syx11237744 4 months ago
- 1 comment
#386 - Feature: add DPO-P
Issue -
State: closed - Opened by catqaq 4 months ago
Labels: enhancement
#385 - Online DPO 支持
Issue -
State: closed - Opened by Ashura5 4 months ago
- 5 comments
#380 - Fix loading dataset from local text files
Pull Request -
State: closed - Opened by tongyx361 4 months ago
#371 - Support RLOO
Issue -
State: closed - Opened by gohsyi 4 months ago
- 1 comment
#368 - Support training from breakpoint
Issue -
State: closed - Opened by luo-li-ba-suo 4 months ago
- 4 comments
#366 - DPO后的模型推理出的结果都是无序符号
Issue -
State: open - Opened by 2024WY 5 months ago
- 3 comments
#364 - DPO Finetuning constantly gives preference loss as 0.6931
Issue -
State: closed - Opened by mandyyyyii 5 months ago
- 9 comments
#361 - support remote rm api for ppo and ppo ray
Pull Request -
State: closed - Opened by catqaq 5 months ago
- 8 comments
Labels: enhancement, P0
#360 - A worker died or was killed while executing a task by an unexpected system error.
Issue -
State: open - Opened by lusongshuo-mt 5 months ago
- 4 comments
#354 - 请问下rm 模型训练大概需要什么级别的显卡,需要几张?
Issue -
State: closed - Opened by hehebamei 5 months ago
- 9 comments
#353 - 会不会支持异步生成训练
Issue -
State: open - Opened by syx11237744 5 months ago
- 1 comment
Labels: enhancement
#340 - [pre-commit.ci] pre-commit suggestions
Pull Request -
State: closed - Opened by pre-commit-ci[bot] 5 months ago
#332 - How much memory(RAM) is required to train a 70B Llama2 model with two 80G A800 nodes?
Issue -
State: closed - Opened by luo-li-ba-suo 5 months ago
- 8 comments
#331 - PPO加载完模型后卡在bundle_reservation_check_func这里
Issue -
State: open - Opened by lixsh6 5 months ago
- 4 comments
#329 - qwen2 72B PPO OOM
Issue -
State: closed - Opened by lixsh6 5 months ago
- 5 comments
#311 - 可以增加支持SimPO吗
Issue -
State: open - Opened by victorShawFan 6 months ago
- 3 comments
#308 - Dummy token for prompts in HH datasets
Issue -
State: closed - Opened by louieworth 6 months ago
- 2 comments
#307 - Will 2 x GPU setups be supported
Issue -
State: closed - Opened by llmlocal 6 months ago
- 1 comment
#305 - Strange Kill of Critic Model
Issue -
State: closed - Opened by Ricardokevins 6 months ago
- 6 comments
#304 - Suggestion on the configurations
Issue -
State: closed - Opened by Ricardokevins 6 months ago
- 1 comment
#300 - [Question] EOS in reward model dataset
Issue -
State: open - Opened by qwenzo 6 months ago
- 4 comments
#297 - Avoid monkey patching vLLM
Issue -
State: closed - Opened by Atry 6 months ago
- 1 comment
#295 - QLORA model loading error
Issue -
State: closed - Opened by karthik-nexusflow 7 months ago
- 5 comments
#293 - PPO采用zero 3 stage后产生time out error
Issue -
State: open - Opened by victorShawFan 7 months ago
- 5 comments
#288 - RM training loss becomes NAN when finish the first training step.
Issue -
State: open - Opened by lixsh6 7 months ago
- 2 comments
Labels: bug
#285 - Custom ExperienceMaker
Issue -
State: closed - Opened by mgerstgrasser 7 months ago
- 4 comments
#283 - fix vLLM v0.4.1
Pull Request -
State: closed - Opened by hijkzzz 7 months ago
- 3 comments
#281 - Revert "vllm 0.4.1 compatibility (#278)"
Pull Request -
State: closed - Opened by hijkzzz 7 months ago
#270 - Issue with models not using `position_ids`
Issue -
State: closed - Opened by kfertakis 8 months ago
- 2 comments
#269 - The configuration for Llama-7b on 4 RTX4090
Issue -
State: closed - Opened by LinkyLiu 8 months ago
- 5 comments
#267 - add test pipeline: use small LLM and small data
Issue -
State: closed - Opened by catqaq 8 months ago
Labels: documentation, enhancement
#266 - Documentation for using Kuberay
Issue -
State: closed - Opened by karthik-nexusflow 8 months ago
- 4 comments
#262 - How long does single LLM's tunning reuqired?
Issue -
State: closed - Opened by alphahumancoder 8 months ago
- 3 comments
#256 - Is save checkpoint not yet supported for ppo ray trainer?
Issue -
State: closed - Opened by mickel-liu 8 months ago
- 6 comments
#253 - Support ORPO
Issue -
State: closed - Opened by paulcx 9 months ago
- 1 comment
#251 - [For your information] Ways to build environment and run openrlhf codes on a slurm cluster
Issue -
State: closed - Opened by glorgao 9 months ago
- 3 comments
Labels: documentation, envs
#246 - Unexpected long actor_time when train_ppo_ray
Issue -
State: closed - Opened by LSC527 9 months ago
- 9 comments
Labels: enhancement, help wanted
#245 - enable_ema cause runtime error when running train_ppo_llama.sh
Issue -
State: open - Opened by dshnightmare 9 months ago
- 7 comments
#242 - The tokenizer of reward model and policy model.
Issue -
State: closed - Opened by eyuansu62 9 months ago
- 4 comments
#241 - Fix yi-34b tokenizer, use_fast=False
Pull Request -
State: closed - Opened by hijkzzz 9 months ago
#239 - why generate use flash-attn is slower?
Issue -
State: closed - Opened by dshnightmare 9 months ago
- 2 comments
#238 - Forced EOS token in vllm generation?
Issue -
State: open - Opened by mgerstgrasser 9 months ago
- 8 comments
#236 - adding length penalty to reward
Issue -
State: open - Opened by karthik-nexusflow 9 months ago
- 2 comments
#232 - Is left-padding in PPO strictly necessary?
Issue -
State: open - Opened by mgerstgrasser 9 months ago
- 8 comments
#230 - Actor-Critic-Model
Issue -
State: open - Opened by mgerstgrasser 9 months ago
- 6 comments
#221 - Citation or comparison to trlX and NeMo-align.
Issue -
State: closed - Opened by LouisCastricato 9 months ago
- 3 comments
#220 - Support top models stage2
Issue -
State: closed - Opened by catqaq 9 months ago
Labels: enhancement
#219 - use_right_pad
Pull Request -
State: closed - Opened by hijkzzz 9 months ago
- 1 comment
#211 - vllm +zero2 hangs
Issue -
State: closed - Opened by karthik19967829 10 months ago
- 32 comments
#209 - Loading a reward model causes ValueError: weight is on the meta device, we need a `value` to put in on 0
Issue -
State: closed - Opened by NZ99 10 months ago
- 19 comments
#205 - Improve ease of use
Issue -
State: closed - Opened by hijkzzz 10 months ago
- 1 comment
#188 - About using vLLM for generation
Issue -
State: closed - Opened by LSC527 11 months ago
- 5 comments
Labels: enhancement, help wanted
#183 - support mixtral 8*7b balancing loss
Pull Request -
State: closed - Opened by hijkzzz 11 months ago