allenai/RL4LMs issues and pull requests

#73 - Which version of pip should we use when we try to set up the environment?

Issue - State: open - Opened by Wwwendyy 13 days ago - 1 comment

#72 - An issue happened when I set up

Issue - State: open - Opened by Wwwendyy 15 days ago

#71 - Question about the classifier used for IntentAccuracyDailyDialog.

Issue - State: open - Opened by zhangjf-nlp 5 months ago

#70 - Migrate to current version of gymnasium, SB3, and other libraries.

Pull Request - State: open - Opened by Kripner 7 months ago

#69 - Upgrade to torch 2.0

Issue - State: open - Opened by agastyaseth 8 months ago - 1 comment

#68 - how to stop env parallel multi-process to debug env.step()?

Issue - State: open - Opened by invoker-LL 9 months ago

#67 - Trying to use rl4lm with more recent libraries

Pull Request - State: closed - Opened by JosefSlavicek 10 months ago

#66 - Is PPO really better than SFT (in general)? under the condition of same amount of data

Issue - State: open - Opened by allanj 10 months ago - 1 comment

#65 - Do you have any plans to apply the recently published Reinforced Self-Training (ReST)?

Issue - State: open - Opened by missflash about 1 year ago

#64 - Pip install error with gym and torch

Issue - State: open - Opened by BaleChen about 1 year ago - 3 comments

#63 - NLPO Code Error and Query About gymnasium vs gym Usage

Issue - State: open - Opened by jinyilun718 about 1 year ago

#62 - Reproducing existing results on NarrativeQA

Issue - State: open - Opened by yxk23 about 1 year ago

#61 - Memory issue in metric evals?

Issue - State: open - Opened by AnujMahajanOxf about 1 year ago

#60 - is multi-dimensional reward supported?

Issue - State: open - Opened by zabir-nabil over 1 year ago

#59 - CPU Support Minor Bug

Issue - State: open - Opened by tedmoskovitz over 1 year ago

#58 - Fix IndexError when loading checkpoints

Pull Request - State: open - Opened by Runingtime over 1 year ago

#57 - model.generate.scores returning two scores

Issue - State: open - Opened by debjitpaul over 1 year ago

#56 - 'GPT2Model' object has no attribute 'first_device'

Issue - State: open - Opened by Stephanehk over 1 year ago

#55 - Using GPT-2

Issue - State: open - Opened by oroojlooy over 1 year ago

#54 - How can I inference data with the model after PPO training?

Issue - State: open - Opened by RyanYip-Kat over 1 year ago

#53 - Bug while loading t5 base model

Issue - State: open - Opened by Sahajtomar over 1 year ago - 1 comment

#52 - Error with Accelerate integration + NLPO

Issue - State: open - Opened by avacaondata over 1 year ago - 1 comment

#51 - [Question] End-to-end example

Issue - State: open - Opened by farrokhsiar over 1 year ago

#50 - Fix nlpo configs

Pull Request - State: closed - Opened by rajcscw over 1 year ago

#49 - In the paper, what is the detail setting of supervised learning? Is SL has additional supervised data?

Issue - State: open - Opened by guotong1988 over 1 year ago

#48 - Resuming from checkpoint is potentially problematic for IMDB since the splits are resampled

Issue - State: closed - Opened by zhixuan-lin over 1 year ago - 1 comment

#47 - `train` and `val` splits are not disjoint for IMDB

Issue - State: closed - Opened by zhixuan-lin over 1 year ago - 3 comments

#46 - A question bother me a long time: What is the difference between RL-for-text-generation and delete-0-reward-model-predictions?

Issue - State: open - Opened by guotong1988 over 1 year ago

#45 - A question bother me a long time: What is the difference between RL-for-text-generation and delete-0-reward-model-predictions?

Issue - State: closed - Opened by guotong1988 over 1 year ago - 1 comment

#44 - Bloom Supporting

Issue - State: open - Opened by c-box over 1 year ago - 3 comments

#43 - Error when trying to load a checkpoint from Transformers after RL training

Issue - State: closed - Opened by avacaondata over 1 year ago - 5 comments

#42 - Metric version incompatible

Issue - State: open - Opened by c-box over 1 year ago

#41 - _pickle.UnpicklingError: pickle data was truncated

Issue - State: open - Opened by Oxtay over 1 year ago

#40 - Pip install fix

Pull Request - State: open - Opened by kolbytn over 1 year ago

#39 - Make sure transformer return past_key_values

Pull Request - State: open - Opened by DvHuang over 1 year ago

#38 - Value is not broadcastable with batch_shape+event_shape

Issue - State: open - Opened by vcvcvnvcvcvn over 1 year ago

#37 - Persistent Variance in IMDB

Issue - State: open - Opened by mnoukhov over 1 year ago - 1 comment

#36 - fix: OnPolicyAlgorithm doesnot have the parameter: create_eval_env

Pull Request - State: open - Opened by hscspring over 1 year ago - 1 comment

#35 - Gradient Accumulation feature proposal

Pull Request - State: closed - Opened by eublefar over 1 year ago

#34 - Problem with BLEURT reward function

Issue - State: open - Opened by eublefar over 1 year ago

#33 - Is it possible to release the code based on Jax

Issue - State: open - Opened by sglucas over 1 year ago

#32 - Evaluating a specific checkpoint

Issue - State: open - Opened by lovodkin93 over 1 year ago - 5 comments

#31 - UnderStand Mask model to _get_action_masks in LogitsProcessor

Issue - State: closed - Opened by xesdiny over 1 year ago

#30 - 'BartForConditionalGeneration' has no attribute 'encoder'

Issue - State: closed - Opened by keeganstoner over 1 year ago - 4 comments

#29 - Mix-Precision training

Issue - State: open - Opened by lovodkin93 over 1 year ago - 2 comments

#28 - Reproducing IMDB results

Issue - State: open - Opened by mnoukhov over 1 year ago - 4 comments

#27 - Is the construction of _value_model necessary?

Issue - State: closed - Opened by xesdiny over 1 year ago - 2 comments

#26 - passing extra variable to the forward function

Issue - State: open - Opened by lovodkin93 over 1 year ago - 1 comment

#25 - Problems with models that don't have the parallelize() function

Issue - State: open - Opened by lovodkin93 over 1 year ago - 1 comment

#24 - Changed from logging with the root logger

Pull Request - State: open - Opened by JulesGM almost 2 years ago

#23 - Off-policy RL algorithms support

Issue - State: open - Opened by Div99 almost 2 years ago - 5 comments
Labels: enhancement, help wanted

#22 - Just a warning that the package doesn't work with Transformers 4.25.1

Issue - State: open - Opened by JulesGM almost 2 years ago - 1 comment

#21 - Larger models like GPT-J and GPT-NeoX-20B

Issue - State: open - Opened by loganlebanoff almost 2 years ago - 3 comments

#20 - make logo work better on dark theme

Pull Request - State: closed - Opened by jmhessel almost 2 years ago

#19 - templates and example inputs for mechanical turk

Pull Request - State: closed - Opened by jmhessel almost 2 years ago

#18 - Implementing self-play

Issue - State: open - Opened by eublefar almost 2 years ago - 3 comments

#17 - Added ability to set the log level in a backwards compatible way

Pull Request - State: closed - Opened by JulesGM almost 2 years ago - 2 comments

#16 - 100% likely that two function parameters have been merged by accident

Issue - State: open - Opened by JulesGM almost 2 years ago - 1 comment
Labels: good first issue, code enhancement

#15 - correcting double tokenizing

Pull Request - State: closed - Opened by JulesGM almost 2 years ago - 1 comment

#14 - Fix seq2seq

Pull Request - State: closed - Opened by rajcscw almost 2 years ago

#13 - Self-designed model

Issue - State: closed - Opened by AIUSRTMP almost 2 years ago - 2 comments

#12 - OOM on summarization example

Issue - State: open - Opened by gabrielhuang almost 2 years ago - 15 comments

#11 - large difference between val and test on CommonGEN

Issue - State: closed - Opened by wenting-zhao almost 2 years ago - 2 comments

#10 - BART supervised

Issue - State: open - Opened by talent404 almost 2 years ago - 2 comments

#9 - Error encountered in running the scripts at the read me

Issue - State: closed - Opened by promiseve almost 2 years ago - 4 comments

#8 - Some questions about n_steps，n_envs and padding_side.

Issue - State: closed - Opened by drxmy almost 2 years ago - 3 comments

#7 - Top-K and Top-p sampling

Issue - State: open - Opened by boblee22 almost 2 years ago - 1 comment

#6 - fix bug for nlpo policy for mt5 model

Pull Request - State: closed - Opened by tatiana-iazykova almost 2 years ago - 1 comment

#5 - Numbeams

Issue - State: open - Opened by tatiana-iazykova almost 2 years ago - 5 comments
Labels: bug, beam_search

#4 - Any plans for Deepspeed/Accelerate integration?

Issue - State: open - Opened by Breakend almost 2 years ago - 9 comments

#3 - CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`

Issue - State: closed - Opened by tatiana-iazykova almost 2 years ago - 4 comments

#2 - Correct the path to t5_ppo.yml in Readme

Pull Request - State: closed - Opened by akifumi-wachi-4 almost 2 years ago - 1 comment

#1 - Readme update

Pull Request - State: closed - Opened by jmhessel almost 2 years ago

GitHub / allenai/RL4LMs issues and pull requests