PKU-Alignment/safe-rlhf issues and pull requests

#182 - [Question] A ValueError occurs reward.sh execution

Issue - State: closed - Opened by leezy18 16 days ago
Labels: question

#181 - Failing to train cost model (ValueError: The safer answer is not safer than the unsafer answer.)

Issue - State: closed - Opened by cemiu 2 months ago - 5 comments
Labels: question

#169 - [Question] 运行过程中出现Signals SIGKILL

Issue - State: closed - Opened by NNStrings 8 months ago
Labels: question

#133 - [Question] reward model

Issue - State: closed - Opened by kylin-zhou about 1 year ago - 7 comments
Labels: question, need information

#110 - feat(logger): save script and hyperparameters to output directory

Pull Request - State: closed - Opened by XuehaiPan over 1 year ago
Labels: enhancement, new feature

#109 - [Question] 关于reward model 与reward critic model

Issue - State: closed - Opened by zhaobinNF over 1 year ago - 4 comments
Labels: question

#108 - [Question] 使用opt1.3b作为reward model loss虽然下但是震荡很大

Issue - State: closed - Opened by zhaobinNF over 1 year ago - 5 comments
Labels: question

#107 - feat(serve): set `dtype` while loading models

Pull Request - State: closed - Opened by XuehaiPan over 1 year ago
Labels: enhancement, cuda, new feature

#106 - fix(trainers/rl_trainer): always pass `max_length` argument when loading models

Pull Request - State: closed - Opened by rockmagma02 over 1 year ago
Labels: bug

#105 - fix(trainers/rl_trainer): fix assertion for micro training batch size

Pull Request - State: closed - Opened by XuehaiPan over 1 year ago
Labels: bug, enhancement

#104 - feat(values): Score Model Normalization

Pull Request - State: closed - Opened by rockmagma02 over 1 year ago
Labels: enhancement, new feature

#103 - feat(datasets): eliminate duplicate prompts for RLHF training

Pull Request - State: closed - Opened by XuehaiPan over 1 year ago
Labels: enhancement

#102 - fix(scripts): fix error messages for unkown arguments

Pull Request - State: closed - Opened by XuehaiPan over 1 year ago

#101 - feat(dataset): add HhRLHFPreference Dataset

Pull Request - State: closed - Opened by rockmagma02 over 1 year ago

#100 - feat(datasets): support preference model and rlhf training for dialogue

Pull Request - State: closed - Opened by rockmagma02 over 1 year ago
Labels: enhancement

#99 - feat(serve): support streaming output for CLI

Pull Request - State: closed - Opened by rockmagma02 over 1 year ago
Labels: enhancement, new feature

#98 - [Question] score_model training support for baichuan model

Issue - State: closed - Opened by skepsun over 1 year ago - 2 comments
Labels: question

#96 - docs(README): add notes for Chinese support

Pull Request - State: closed - Opened by XuehaiPan over 1 year ago
Labels: documentation, enhancement

#95 - docs(README): 🎉 release checkpoints for `beaver-7b-v1.0` and its friends

Pull Request - State: closed - Opened by calico-1226 over 1 year ago
Labels: documentation

#94 - feat(scripts): randomize torch distributed master port

Pull Request - State: closed - Opened by XuehaiPan over 1 year ago
Labels: enhancement, dependency, new feature

#93 - chore(score_model): set architectures for `ScoreModel`s in `model.config`

Pull Request - State: closed - Opened by rockmagma02 over 1 year ago

#92 - [Question] rollout过程中generate太慢跟zero3有关吗

Issue - State: closed - Opened by zhaobinNF over 1 year ago - 4 comments
Labels: question

#91 - [Feature Request] To deal with hh-rlhf dialogue data

Issue - State: closed - Opened by jc-ryan over 1 year ago - 3 comments
Labels: enhancement

#90 - feat(datasets): add more raw dataset support

Pull Request - State: closed - Opened by XuehaiPan over 1 year ago
Labels: enhancement, new feature

#89 - feat(rl_trainer): add generation config for RL rollout

Pull Request - State: closed - Opened by rockmagma02 over 1 year ago
Labels: enhancement, new feature

#88 - fix(rl_trainer): fix advantage calculation (GAE) when response lengths are different

Pull Request - State: closed - Opened by XuehaiPan over 1 year ago
Labels: bug

#87 - feat(rl): log sequence-wise KL-divergence to reference model during training

Pull Request - State: closed - Opened by rockmagma02 over 1 year ago - 1 comment
Labels: enhancement, new feature

#86 - [Feature Request] log sequence-wise KL-divergence to reference model during training

Issue - State: closed - Opened by rockmagma02 over 1 year ago - 1 comment
Labels: enhancement, new feature

#85 - [Question] 请问数据集会有中文版本吗

Issue - State: closed - Opened by ghost over 1 year ago - 4 comments
Labels: question

#84 - feat(values): enhance logging for training value models

Pull Request - State: closed - Opened by calico-1226 over 1 year ago
Labels: enhancement

#83 - feat(serve): better markdown format code block rendering

Pull Request - State: closed - Opened by XuehaiPan over 1 year ago
Labels: enhancement

#82 - [Question] 如何使用pycharm调试beaver，比如sft.sh

Issue - State: closed - Opened by diehualong over 1 year ago - 3 comments
Labels: question

#81 - chore(logger): log global step during training

Pull Request - State: closed - Opened by XuehaiPan over 1 year ago
Labels: enhancement

#80 - feat(datasets): support dataset proportion > 1

Pull Request - State: closed - Opened by rockmagma02 over 1 year ago
Labels: enhancement, new feature

#79 - feat(datasets): lazy tokenization support for `TokenizedDataset`s

Pull Request - State: closed - Opened by XuehaiPan over 1 year ago
Labels: enhancement, new feature

#78 - feat(logger): enable manual logging level setting

Pull Request - State: closed - Opened by calico-1226 over 1 year ago
Labels: enhancement

#77 - [Question] 训练好的 cost 模型可以直接作为 Q+A 是否安全的判别模型使用吗？

Issue - State: closed - Opened by lierer007 over 1 year ago - 5 comments
Labels: question

#76 - fix(datasets): raise errors when got duplicate dataset names

Pull Request - State: closed - Opened by XuehaiPan over 1 year ago
Labels: bug, enhancement

#75 - feat(serve): add new special command `/reset`

Pull Request - State: closed - Opened by XuehaiPan over 1 year ago
Labels: enhancement, evaluation

#74 - chore(datasets): better error message when raw dataset class not found

Pull Request - State: closed - Opened by XuehaiPan over 1 year ago
Labels: enhancement

#73 - fix(models): handle model embeddings resizing on model parallel

Pull Request - State: closed - Opened by XuehaiPan over 1 year ago
Labels: bug, enhancement, new feature

#72 - fix(serve): handle `UnicodeDecodeError` for CJK inputs on deletion

Pull Request - State: closed - Opened by XuehaiPan over 1 year ago
Labels: bug, enhancement

#71 - [Question] ppo训练后，输出越来越长，越来越重复。

Issue - State: closed - Opened by SpongebBob over 1 year ago - 5 comments
Labels: question

#69 - [Question] 关于PPO之后存储的模型大小翻倍问题

Issue - State: closed - Opened by Tinker250 over 1 year ago - 6 comments
Labels: question

#68 - fix(datasets): check tensor size before comparing their contents

Pull Request - State: closed - Opened by XuehaiPan over 1 year ago
Labels: bug

#67 - feat(datasets): add duplication check for preference datasets

Pull Request - State: closed - Opened by XuehaiPan over 1 year ago
Labels: bug, enhancement, new feature

#66 - fix(configs/deepspeed_config): fix argument passing for evaluation batch size

Pull Request - State: closed - Opened by XuehaiPan over 1 year ago
Labels: bug

#65 - fix(configs/deepspeed_config): only support stage 0 and 3 in deepspeed config for evaluation

Pull Request - State: closed - Opened by calico-1226 over 1 year ago
Labels: bug

#64 - [Question] OSError: [Errno 12] Cannot allocate memory

Issue - State: closed - Opened by glsoon over 1 year ago - 4 comments
Labels: question

#63 - [Question] 请教下关于SFT部分的loss计算

Issue - State: closed - Opened by EthenZhang over 1 year ago - 1 comment
Labels: question

#62 - feat(datasets): add `PKU-SafeRLHF` / `BeaverTails` datasets and their friends

Pull Request - State: closed - Opened by XuehaiPan over 1 year ago
Labels: documentation, enhancement, new feature, evaluation

#61 - fix(evalute/gpt4): fix GPT-4 evalation script

Pull Request - State: closed - Opened by rockmagma02 over 1 year ago
Labels: bug

#60 - docs(README): add Beaver (1 round) preference distribution results

Pull Request - State: closed - Opened by XuehaiPan over 1 year ago
Labels: documentation, evaluation

#59 - [Question] 请问readme的效果对比图是在目前开放的10K数据，和scripts的默认配置跑的吗？

Issue - State: closed - Opened by LiuShixing over 1 year ago - 2 comments
Labels: question

#58 - [Question] 请教一下left padding的问题

Issue - State: closed - Opened by DwarfWarriors over 1 year ago - 2 comments
Labels: question

#56 - [Question] PPO 训练完的模型没有输出

Issue - State: closed - Opened by liumingzhu6060 over 1 year ago - 5 comments
Labels: question, need information

#55 - [Question] 为什么Reward critic tokenizer must be the same as actor tokenizer？

Issue - State: closed - Opened by liumingzhu6060 over 1 year ago - 1 comment
Labels: question

#54 - feat(datasets): accept local repo paths while loading datasets

Pull Request - State: closed - Opened by XuehaiPan over 1 year ago
Labels: enhancement, new feature

#53 - [Question] 数据格式对不齐

Issue - State: closed - Opened by AlexXx-Wu over 1 year ago - 4 comments
Labels: bug, question, need information

#52 - [Question] How to plot the graph after running GPT eval and obtaining a JSON file?

Issue - State: closed - Opened by yifan123 over 1 year ago - 2 comments
Labels: question, evaluation

#51 - [BUG] RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:5 and cpu!

Issue - State: closed - Opened by Yanfei-Qin over 1 year ago - 4 comments
Labels: bug, need information

#50 - feat(values): add a new sequence-wise loss for reward/cost models

Pull Request - State: closed - Opened by calico-1226 over 1 year ago
Labels: enhancement, new feature

#49 - feat(evaluate/arena): allow using different tokenizers in arena evaluation

Pull Request - State: closed - Opened by XuehaiPan over 1 year ago
Labels: enhancement, evaluation

#48 - feat(algorithms/dpo): add implementation for the DPO algorithm

Pull Request - State: closed - Opened by XuehaiPan over 1 year ago - 3 comments
Labels: enhancement, new feature

#47 - feat(trainers/rl_trainer): ensure RL dataset is exhausted when also using PTX dataset

Pull Request - State: closed - Opened by XuehaiPan over 1 year ago
Labels: enhancement

#46 - chore(scripts): update default pre-train model path

Pull Request - State: closed - Opened by XuehaiPan over 1 year ago

#45 - fix(models): temporarily disable LLaMA fast tokenizer

Pull Request - State: closed - Opened by XuehaiPan over 1 year ago
Labels: bug, upstream

#44 - style(algorithms): merge and move `torch.no_grad()` context manager to method level

Pull Request - State: closed - Opened by XuehaiPan over 1 year ago

#43 - [Question] Results of arena evaluation

Issue - State: closed - Opened by nonstopfor over 1 year ago - 8 comments
Labels: question, need information, evaluation

#42 - [Question] 数据集翻译成中文输入，会报“AssertionError: The better and worse answer are the same!”的错误

Issue - State: closed - Opened by liumingzhu6060 over 1 year ago - 5 comments
Labels: question

#41 - refactor(utils): refactor pytree registration for `ModelOutput` subclasses

Pull Request - State: closed - Opened by XuehaiPan over 1 year ago
Labels: enhancement

#40 - [Question] rollout函数generate耗费时间过长

Issue - State: closed - Opened by Mandy0016 over 1 year ago - 10 comments
Labels: question

#39 - [Question] PKU-SafeRLHF-1M 数据集使用

Issue - State: closed - Opened by zhaobinNF over 1 year ago - 4 comments
Labels: question

#38 - [BUG][Upstream] `deepspeed` failed to compile `FusedAdam` CUDA operator

Issue - State: closed - Opened by Harry-mic over 1 year ago - 6 comments
Labels: bug, dependency, installation, upstream, cuda

#37 - [Question] Question about the actor loss in RLHF training

Issue - State: closed - Opened by xyjsjruiliu over 1 year ago - 1 comment
Labels: question, need information

#35 - chore(.github): update issue template

Pull Request - State: closed - Opened by XuehaiPan over 1 year ago
Labels: documentation, enhancement

#34 - [BUG] Poor internet connection: failed to download datasets from Hugging Face

Issue - State: closed - Opened by Harry-mic over 1 year ago - 1 comment
Labels: bug, invalid

#33 - [Question] Question about dataset splitting for different training stage

Issue - State: closed - Opened by liumingzhu6060 over 1 year ago - 3 comments
Labels: question

#32 - docs(README): add `alpaca-farm` to the comparison table

Pull Request - State: closed - Opened by XuehaiPan over 1 year ago
Labels: documentation

#31 - refactor(trainers/supervised_trainer): split the eval dataset with `eval_split_ratio` argument

Pull Request - State: closed - Opened by calico-1226 over 1 year ago
Labels: enhancement

#30 - [BUG] Poor internet connection: failed to download datasets from Hugging Face

Issue - State: closed - Opened by Harry-mic over 1 year ago - 2 comments
Labels: bug, invalid

#29 - [Question] Question about the PTX Step in RLHF training

Issue - State: closed - Opened by zhaobinNF over 1 year ago - 4 comments
Labels: question

#27 - [BUG] unlimited recursion when calling tokenizer.unk_token_id

Issue - State: closed - Opened by feiliya333 over 1 year ago - 2 comments
Labels: bug, upstream

#26 - fix(algorithms): handle potential index error for empty generation

Pull Request - State: closed - Opened by XuehaiPan over 1 year ago
Labels: bug

#24 - [Question] What are the recommended hyper-parameters?

Issue - State: closed - Opened by nonstopfor over 1 year ago - 4 comments
Labels: question

#23 - fix(algorithms): skip special tokens when re-tokenizing with the reward/cost tokenizer

Pull Request - State: closed - Opened by calico-1226 over 1 year ago
Labels: bug, enhancement

#21 - [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4294967296, reducing to 2147483648

Issue - State: closed - Opened by zhaobinNF over 1 year ago - 8 comments
Labels: question, need information

#20 - [Feature Request] LoRA support for memory efficient fine-tuning

Issue - State: open - Opened by 70557dzqc over 1 year ago - 2 comments
Labels: enhancement, in progress, new feature

#18 - fix(models/pretrained): set special token ids in `model.config`

Pull Request - State: closed - Opened by calico-1226 over 1 year ago
Labels: bug, enhancement

#17 - [Question] Metric/task used to evaluate Beaver

Issue - State: closed - Opened by feiliya333 over 1 year ago - 2 comments
Labels: question, evaluation

#15 - [Feature Request] Releasing the Reward Model

Issue - State: closed - Opened by d223302 over 1 year ago - 6 comments
Labels: enhancement, question

#14 - [Feature Request] 请问后续支持chatglm的rm训练和rl训练吗？

Issue - State: closed - Opened by iamsile over 1 year ago - 2 comments
Labels: enhancement, invalid, dependency

#13 - feat(algorithms): allow reward/cost models use different tokenizers than actor tokenizer

Pull Request - State: closed - Opened by XuehaiPan over 1 year ago
Labels: enhancement

#12 - [Feature Request] loading dataset from local files

Issue - State: closed - Opened by haorannlp over 1 year ago - 5 comments
Labels: enhancement, in progress, new feature

#11 - [Feature Request] Support Actor and Reward/Cost Models using different tokenizers

Issue - State: closed - Opened by calico-1226 over 1 year ago - 1 comment
Labels: enhancement

#9 - [BUG] 运行 PPO 阶段时，出现错误：CUDA error: device-side assert triggered

Issue - State: closed - Opened by HaixHan over 1 year ago - 23 comments
Labels: bug, invalid, need information, cuda

#8 - How to setup the data in sft process,should I just make a dir Alpaca and put the data downloaded in it?

Issue - State: closed - Opened by zhaobinNF over 1 year ago
Labels: question

#7 - fix(serve): fix argument passing for chatbot generation

Pull Request - State: closed - Opened by XuehaiPan over 1 year ago
Labels: bug

#6 - deps(tokenizers): pin `tokenizers` minimum version for fast tokenizer support for LLaMA models

Pull Request - State: closed - Opened by XuehaiPan over 1 year ago
Labels: dependency, installation

#5 - [Question] Trlx doesn't support the Reward model training ?

Issue - State: closed - Opened by wqw547243068 over 1 year ago - 2 comments
Labels: question

GitHub / PKU-Alignment/safe-rlhf issues and pull requests