CarperAI/trlx issues and pull requests

#603 - Does the framework support PPO training for Qwen2?

Issue - State: open - Opened by oldwangggggg about 1 month ago
Labels: feature request

#602 - reward_fn in accelerate_ppo_trainer.py

Issue - State: open - Opened by Jerrrrykun about 2 months ago

#601 - OOM error with PEFT LoRA on Llama2-7B

Issue - State: open - Opened by arpaiva 4 months ago - 1 comment
Labels: bug

#600 - Load the checkpoint fails

Issue - State: open - Opened by AfraAmini 5 months ago
Labels: bug

#599 - cannot import name 'flatten_dataclass' from 'trlx.data.ilql_types'

Issue - State: open - Opened by AfraAmini 6 months ago
Labels: bug

#598 - maybe bug in prepare & load's order

Issue - State: open - Opened by daiwk 6 months ago - 1 comment
Labels: bug

#597 - Error when running Ray Tune to launch hyperparameter sweep

Issue - State: open - Opened by Jing-L97 6 months ago - 1 comment
Labels: bug

#596 - Crash when using save_state with deepspeed: `model.state_dict` functions incompatible with new deepspeed.

Issue - State: open - Opened by JohannesAck 7 months ago
Labels: bug

#595 - Data Loader Bug when running t5_summarization_daily_cnn.py

Issue - State: open - Opened by yunanyan 8 months ago
Labels: bug

#594 - Why train dataloader is not prepared by Accelerator

Issue - State: open - Opened by Jiaxin-Wen 9 months ago
Labels: bug

#593 - TRLX Environment customization

Issue - State: open - Opened by heraldiclily 9 months ago

#591 - Issue of tensors share memory

Issue - State: open - Opened by heraldiclily 10 months ago - 2 comments
Labels: bug

#590 - [New Feature Request] Add KTO

Issue - State: open - Opened by 1485840691-eng about 1 year ago
Labels: feature request

#589 - RLHF text summarization diverges

Issue - State: open - Opened by AlisonWen about 1 year ago
Labels: bug

#588 - Integration of Self-Play Fine-Tuning (SPIN) Method for Enhancing Large Language Models

Issue - State: open - Opened by SeungyounShin about 1 year ago
Labels: feature request

#587 - Runtime error when running examples (ilql_sentiments_t5.py)

Issue - State: open - Opened by youxiho1 about 1 year ago - 2 comments
Labels: bug

#586 - Add citation info from the EMNLP paper

Pull Request - State: closed - Opened by StellaAthena about 1 year ago

#585 - MPT is not working

Issue - State: open - Opened by ouhenio about 1 year ago
Labels: bug

#584 - when i use trlx ppotrainer train a model llama 13b model, but saved huggingface mode ,but when it inference , it has some strange keys ,and the inference result did not show ,it also have no error , it seems the result disapper

Issue - State: open - Opened by ldh127 about 1 year ago - 1 comment
Labels: bug

#583 - Faster & memory-efficient logprobs calculation

Pull Request - State: open - Opened by li-plus about 1 year ago - 1 comment

#582 - Attention mask when calculating log ratio for PPO

Issue - State: open - Opened by kmy17518 about 1 year ago

#581 - Multi-GPU training errors with peft

Issue - State: open - Opened by AliengirlLiv about 1 year ago - 1 comment
Labels: bug

#580 - Issue since most recent transformers update

Issue - State: open - Opened by siddharthverma314 about 1 year ago - 1 comment
Labels: bug

#579 - update(requirements.txt): to the latest `transformers` & `deepspeed`

Pull Request - State: open - Opened by maxreciprocate about 1 year ago - 1 comment

#578 - fix(modeling_base): partial loading of a sharded checkpoint

Pull Request - State: closed - Opened by maxreciprocate over 1 year ago

#577 - resume_from_checkpoint doesn't work

Issue - State: closed - Opened by andrewsiah over 1 year ago - 1 comment
Labels: bug

#576 - fix model state_dict retrieving in zero3

Pull Request - State: closed - Opened by Jingru over 1 year ago

#575 - support parallel reward function

Pull Request - State: open - Opened by Jingru over 1 year ago - 16 comments

#574 - Support parallel reward_fn in PPO training

Issue - State: closed - Opened by Jingru over 1 year ago
Labels: feature request

#573 - support customized run_name in tracker

Pull Request - State: closed - Opened by Jingru over 1 year ago - 1 comment

#572 - Support customized run name

Pull Request - State: closed - Opened by Jingru over 1 year ago

#571 - multigpu support for summarization ppo example

Issue - State: open - Opened by sayan1101 over 1 year ago - 3 comments
Labels: bug

#570 - fix(examples/t5_summarize_cnn): move labels into `reward_fn` kwargs

Pull Request - State: closed - Opened by maxreciprocate over 1 year ago

#569 - TypeError: reward_fn() got an unexpected keyword argument 'tokenizer'

Issue - State: closed - Opened by sayan1101 over 1 year ago - 1 comment
Labels: bug

#568 - support extra model and tokenizer configs during loading by from_pretrained in accelerate trainer

Pull Request - State: closed - Opened by Jingru over 1 year ago - 1 comment

#567 - Problem with LLama training with LoRA

Issue - State: open - Opened by freQuensy23-coder over 1 year ago - 3 comments
Labels: bug

#566 - fix(modeling_base): re-order `model.forward_kwargs` initialization

Pull Request - State: closed - Opened by maxreciprocate over 1 year ago - 1 comment

#565 - Question about saving peft checkpoint

Issue - State: open - Opened by nhanph over 1 year ago - 2 comments
Labels: bug

#564 - `position_ids` error in accelerate PPO trainer

Issue - State: closed - Opened by pbarragan over 1 year ago - 3 comments
Labels: bug

#563 - [Fix] Add default config LLaMa 2 converter Nemo

Pull Request - State: closed - Opened by PhungVanDuy over 1 year ago

#562 - Add default config LLaMa 2 converter Nemo

Pull Request - State: closed - Opened by PhungVanDuy over 1 year ago

#561 - How to generate reward-labeled dataset

Issue - State: open - Opened by mikkelmedm over 1 year ago
Labels: feature request

#560 - feats: Add text enviroment examples

Pull Request - State: open - Opened by PhungVanDuy over 1 year ago

#559 - How to train LLaMA2 on the summarize_rlhf example?

Issue - State: open - Opened by missflash over 1 year ago

#557 - docs: update documentation

Pull Request - State: closed - Opened by maxreciprocate over 1 year ago - 1 comment

#556 - feat: Add support for DPO

Pull Request - State: open - Opened by sandeepchittilla over 1 year ago - 12 comments

#555 - Inference pipeline

Pull Request - State: open - Opened by Dahoas over 1 year ago - 1 comment

#554 - feat: add rejection finetuning trainer

Pull Request - State: closed - Opened by maxreciprocate over 1 year ago - 1 comment

#553 - Increasing max new tokens for generation arguments lead to errors

Issue - State: open - Opened by wise-east over 1 year ago - 3 comments
Labels: bug

#552 - fix(examples/hh): old gpt-j checkpoint loading

Pull Request - State: closed - Opened by maxreciprocate over 1 year ago

#551 - revert(ppo_trainer): keep `save_pretrained` only over the base model

Pull Request - State: closed - Opened by maxreciprocate over 1 year ago

#550 - Add trlX cite

Pull Request - State: closed - Opened by Dahoas over 1 year ago

#549 - Unable to load and run inference on finetuned Alpaca model

Issue - State: closed - Opened by doyled-it over 1 year ago - 7 comments
Labels: bug

#548 - Memory occupy with multi GPUs Training

Issue - State: open - Opened by yuanyaaa over 1 year ago - 1 comment

#547 - chore(requirements.txt): update everything to the latest

Pull Request - State: closed - Opened by maxreciprocate over 1 year ago - 1 comment

#546 - mosaicml/mpt support

Pull Request - State: closed - Opened by 50m-regent over 1 year ago

#545 - Unable to load the trained model to do the inference

Issue - State: closed - Opened by CSerxy over 1 year ago - 9 comments

#544 - RuntimeError: module must have its parameters and buffers on device

Issue - State: closed - Opened by Adaickalavan over 1 year ago - 4 comments

#543 - Freeze "output" embedding when using tied embeddings.

Pull Request - State: closed - Opened by cat-state over 1 year ago - 1 comment

#542 - Llama NeMo support

Pull Request - State: closed - Opened by cat-state over 1 year ago - 2 comments

#541 - Fix reward model state dict loading

Pull Request - State: closed - Opened by maxjeblick over 1 year ago

#540 - ILQL training batch2 tensor dimensions error

Issue - State: open - Opened by GenVr over 1 year ago - 2 comments

#539 - Fix LLaMA example (LLaMA 2)

Pull Request - State: closed - Opened by PhungVanDuy over 1 year ago - 1 comment

#538 - Add DS-Chat comparison

Pull Request - State: closed - Opened by cat-state over 1 year ago - 2 comments

#536 - Caught signal 7 (Bus error: nonexistent physical address)

Issue - State: closed - Opened by Adaickalavan over 1 year ago - 5 comments

#535 - Model does not load in the expected dtype

Issue - State: closed - Opened by AugustasMacijauskas over 1 year ago - 5 comments
Labels: bug

#533 - Add support for LLaMA2

Issue - State: closed - Opened by cvetanovskaa over 1 year ago - 1 comment
Labels: feature request

#532 - Add support for Falcon 7B/40B

Issue - State: open - Opened by cvetanovskaa over 1 year ago - 1 comment
Labels: feature request

#530 - Value branch

Pull Request - State: closed - Opened by Dahoas over 1 year ago - 7 comments

#528 - Implement BoN for training and eval

Pull Request - State: open - Opened by Dahoas over 1 year ago - 5 comments

#526 - Fix logging

Pull Request - State: closed - Opened by Dahoas over 1 year ago

#522 - Fix ordering of ppo epoch iteration

Pull Request - State: closed - Opened by RobertKirk over 1 year ago - 5 comments

#521 - Reward model negative numbers meaning

Issue - State: closed - Opened by GenVr over 1 year ago - 2 comments

#517 - Sanity check: SFT Model should be frozen (PPO)

Issue - State: closed - Opened by Apsod over 1 year ago - 2 comments
Labels: bug

#513 - 8-bit inference (#512)

Pull Request - State: open - Opened by glerzing over 1 year ago - 13 comments

#504 - Direct Policy Optimization

Issue - State: open - Opened by Reichenbachian over 1 year ago - 4 comments
Labels: feature request

#501 - strange design

Issue - State: closed - Opened by efengx over 1 year ago - 1 comment
Labels: bug

#498 - feat: support add tokens to tokenizer.

Pull Request - State: open - Opened by congchan over 1 year ago

#497 - Add llama opendelta, float layer freezing, and optional ref model + zero3

Pull Request - State: closed - Opened by Dahoas over 1 year ago - 2 comments

#489 - fix(modeling_ppo): load reference head under zero3

Pull Request - State: closed - Opened by maxreciprocate over 1 year ago - 2 comments

#485 - Training stuck generating rollouts

Issue - State: closed - Opened by javirandor over 1 year ago - 6 comments
Labels: bug

#483 - RuntimeError using Accelerate + Zero-3 to launch `ppo_sentiments_llama.py` (uninitialized LayerNorm weight in Hydra head?)

Issue - State: closed - Opened by mbalesni over 1 year ago - 7 comments
Labels: bug

#482 - fix(modeling): deepspeed checkpoint loading

Pull Request - State: closed - Opened by maxreciprocate over 1 year ago - 3 comments

#481 - RuntimeError: Error(s) in loading state_dict for GPTRewardModel

Issue - State: closed - Opened by maxjeblick over 1 year ago - 9 comments
Labels: bug

#480 - How to use checkpoint?

Issue - State: closed - Opened by mshtelma over 1 year ago - 3 comments
Labels: bug

#479 - how to use hydra train ppo model?

Issue - State: closed - Opened by akk-123 over 1 year ago - 1 comment
Labels: documentation

#476 - LLaMA sentiment example doesn't work

Issue - State: closed - Opened by mbalesni over 1 year ago - 3 comments
Labels: bug

#474 - About gpt_reward_test

Issue - State: closed - Opened by ItGirls over 1 year ago - 4 comments
Labels: bug

#466 - tokenizer of the summarization rlhf example

Issue - State: closed - Opened by DanqingZ over 1 year ago - 1 comment
Labels: bug

#461 - RunTimeError using Accelerate + Zero Stage 3 to launch ppo_sentiments.py

Issue - State: closed - Opened by alex-athanassakos almost 2 years ago - 4 comments
Labels: bug

#437 - When set the tracker to tensorboard, the following error happened.

Issue - State: closed - Opened by cdxzyc almost 2 years ago - 3 comments
Labels: bug

#433 - Avoid a few off-by-one issues which PAD and EOS tokens in the generated sequences

Pull Request - State: closed - Opened by mikljohansson almost 2 years ago

#410 - !deepspeed examples/summarize_rlhf/sft/train_gptj_summarize.py is failing

Issue - State: open - Opened by MyBruso almost 2 years ago - 8 comments

#408 - glm-10b, got size mismatch error when training ppo using zero3

Issue - State: closed - Opened by YaguangGong almost 2 years ago - 2 comments
Labels: bug

#385 - ppo trained model and checkpoints are not accesible

Issue - State: closed - Opened by arpitg1991 almost 2 years ago - 2 comments
Labels: bug

#375 - [feat] Add LLaMa Model support for PPO

Pull Request - State: closed - Opened by PhungVanDuy almost 2 years ago - 6 comments

#372 - Cuda OOM with PPO on GPT2-medium

Issue - State: closed - Opened by OleksandrKorovii almost 2 years ago - 4 comments
Labels: bug

#367 - Questions about model size and num_processes in summarize-rlhf

Issue - State: closed - Opened by agave233 almost 2 years ago - 4 comments

#301 - Pass extra information for the reward function with every sample.

Issue - State: closed - Opened by JulesGM almost 2 years ago - 1 comment
Labels: feature request

#283 - NeMo QOL improvements

Issue - State: closed - Opened by cat-state almost 2 years ago - 1 comment
Labels: feature request

GitHub / CarperAI/trlx issues and pull requests